Re: pg_rewind WAL segments deletion pitfall

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: pg_rewind WAL segments deletion pitfall
Дата
Msg-id ZORIVnxs+lXoEAQG@paquier.xyz
обсуждение исходный текст
Ответ на Re: pg_rewind WAL segments deletion pitfall  (torikoshia <torikoshia@oss.nttdata.com>)
Ответы Re: pg_rewind WAL segments deletion pitfall
Re: pg_rewind WAL segments deletion pitfall
Список pgsql-hackers
On Fri, Aug 18, 2023 at 03:40:57PM +0900, torikoshia wrote:
> Thanks for the patch, I've marked this as ready-for-committer.
>
> BTW, this issue can be considered a bug, right?
> I think it would be appropriate to provide backpatch.

Hmm, I agree that there is a good argument in back-patching as we have
the WAL files between the redo LSN and the divergence LSN, but
pg_rewind is not smart enough to keep them around.  If the archives of
the primary were not able to catch up, the old primary is as good as
kaput, and restore_command won't help here.

I don't like much this patch.  While it takes correctly advantage of
the backward record read logic from SimpleXLogPageRead() able to
handle correctly timeline jumps, it creates a hidden dependency in the
code between the hash table from filemap.c and the page callback.
Wouldn't it be simpler to build a list of the segment names using the
information from WALOpenSegment and build this list in
findLastCheckpoint()?  Also, I am wondering if we should be smarter
with any potential conflict handling between the source and the
target, rather than just enforcing a FILE_ACTION_NONE for all these
files.  In short, could it be better to copy the WAL file from the
source if it exists there?

+       /*
+        * Some entries (WAL segments) already have an action assigned
+        * (see SimpleXLogPageRead()).
+        */
+       if (entry->action == FILE_ACTION_UNDECIDED)
+           entry->action = decide_file_action(entry);

This change makes me a bit uneasy, per se my previous comment with the
additional code dependencies.

I think that this scenario deserves a test case.  If one wants to
emulate a delay in WAL archiving, it is possible to set
archive_command to a command that we know will fail, for instance.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: David Rowley
Дата:
Сообщение: Re: PG 16 draft release notes ready