Race between KeepFileRestoredFromArchive() and restartpoint

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Race between KeepFileRestoredFromArchive() and restartpoint
Дата
Msg-id 20210202151416.GB3304930@rfd.leadboat.com
обсуждение исходный текст
Ответы Re: Race between KeepFileRestoredFromArchive() and restartpoint  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
A race with KeepFileRestoredFromArchive() can cause a restartpoint to fail, as
seen once on the buildfarm[1].  The attached patch adds a test case; it
applies atop the "stop events" patch[2].  We have two systems for adding
long-term pg_wal directory entries.  KeepFileRestoredFromArchive() adds them
during archive recovery, while InstallXLogFileSegment() does so at all times.
Unfortunately, InstallXLogFileSegment() happens throughout archive recovery,
via the checkpointer recycling segments and calling PreallocXlogFiles().
Multiple processes can run InstallXLogFileSegment(), which uses
ControlFileLock to represent the authority to modify the directory listing of
pg_wal.  KeepFileRestoredFromArchive() just assumes it controls pg_wal.

Recycling and preallocation are wasteful during archive recovery, because
KeepFileRestoredFromArchive() unlinks every entry in its path.  I propose to
fix the race by adding an XLogCtl flag indicating which regime currently owns
the right to add long-term pg_wal directory entries.  In the archive recovery
regime, the checkpointer will not preallocate and will unlink old segments
instead of recycling them (like wal_recycle=off).  XLogFileInit() will fail.

Notable alternatives:

- Release ControlFileLock at the end of XLogFileInit(), not at the end of
  InstallXLogFileSegment().  Add ControlFileLock acquisition to
  KeepFileRestoredFromArchive().  This provides adequate mutual exclusion, but
  XLogFileInit() could return a file descriptor for an unlinked file.  That's
  fine for PreallocXlogFiles(), but it feels wrong.

- During restartpoints, never preallocate or recycle segments.  (Just delete
  obsolete WAL.)  By denying those benefits, this presumably makes streaming
  recovery less efficient.

- Make KeepFileRestoredFromArchive() call XLogFileInit() to open a segment,
  then copy bytes.  This is simple, but it multiplies I/O.  That might be
  negligible on account of caching, or it might not be.  A variant, incurring
  extra fsyncs, would be to use durable_rename() to replace the segment we get
  from XLogFileInit().

- Make KeepFileRestoredFromArchive() rename without first unlinking.  This
  avoids checkpoint failure, but a race could trigger noise from the LOG
  message in InstallXLogFileSegment -> durable_rename_excl.

Does anyone prefer some alternative?  It's probably not worth back-patching
anything for a restartpoint failure this rare, because most restartpoint
outcomes are not user-visible.

Thanks,
nm

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-10-05%2023%3A02%3A17
[2] https://postgr.es/m/CAPpHfdtSEOHX8dSk9Qp%2BZ%2B%2Bi4BGQoffKip6JDWngEA%2Bg7Z-XmQ%40mail.gmail.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Add primary keys to system catalogs
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add primary keys to system catalogs