Re: [BUG] recovery of prepared transactions during promotion can fail

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: [BUG] recovery of prepared transactions during promotion can fail
Дата
Msg-id 20230619.142514.373396981484508204.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на [BUG] recovery of prepared transactions during promotion can fail  (Julian Markwort <julian.markwort@cybertec.at>)
Список pgsql-hackers
Thanks for the report, reproducer and the patches.

At Fri, 16 Jun 2023 16:27:40 +0200, Julian Markwort <julian.markwort@cybertec.at> wrote in 
> - prepare a transaction
> - crash postgresql
> - create standby.signal file
> - start postgresql, wait for recovery to finish
> - promote
..
> The promotion will fail with a FATAL error, stating that "requested WAL segment .* has already been removed".
> The FATAL error causes the startup process to exit, so postmaster shuts down again.
> 
> Here's an exemplary log output, maybe this helps people to find this issue when they search for it online:

> LOG:  redo done at 0/15D89B8
> LOG:  last completed transaction was at log time 2023-06-16 13:09:53.71118+02
> LOG:  selected new timeline ID: 2
> LOG:  archive recovery complete
> FATAL:  requested WAL segment pg_wal/000000010000000000000001 has already been removed
> LOG:  startup process (PID 1650358) exited with exit code 1

Reproduced here.

> The cause of this failure is an oversight (rather obvious in hindsight):
> The renaming of the WAL file (that was last written to before the crash happened) to .partial is done *before*
PostgreSQL
> might have to read this very file to recover prepared transactions from it.
> The relevant function calls here are durable_rename() and RecoverPreparedTransactions() in xlog.c .
> This behaviour has - apparently unintentionally - been fixed in PG 15 and upwards (see commit 811051c ), as part of
a
> general restructure and reorganization of this portion of xlog.c (see commit 6df1543 ).

I think so, the reordering might have done for some other reasons, though.

> Furthermore, it seems this behaviour does not appear in PG 12 and older, due to another possible bug:

<snip>...

> In PG 13 and newer, the XLogReaderState is reset in XLogBeginRead()
> before reading WAL in XlogReadTwoPhaseData() in twophase.c .

I arraived at the same conclusion.

> In the older releases (PG <= 12), this reset is not done, so the requested LSN containing the prepared transaction
can
> (by happy coincidence) be read from in-memory buffers, and PostgreSQL consequently manages to come up just fine (as
the
> WAL has already been read into buffers prior to the .partial rename).
> If the older releases also where to properly reset the XLogReaderState, they would also fail to find the LSN on disk,
and
> hence PostgreSQL would crash again.

From the perspective of loading WAL for prepared transactions, the
current code in those versions seems fine. Although I suspect Windows
may not like to rename currently-open segments, it's likely acceptable
as the current test set operates without issue.. (I didn't tested this.)

> I've attached patches for PG 14 and PG 13 that mimic the change in PG15 (commit 811051c ) and reorder the crucial
events,
> placing the recovery of prepared transactions *before* renaming the file.

It appears to move the correct part of the code to the proper
location, modifying the steps to align with later versions.

> I've also attached recovery test scripts for PG >= 12 and PG <= 11 that can be used to verify that promote after
recovery
> with prepared transactions works.

It effectively detects the bug, though it can't be directly used in
the tree as-is. I'm unsure whether we need this in the tree, though.

> My humble opinion is that this fix should be backpatched to PG 14 and PG 13.

I agree with you.

> It's debatable whether the fix needs to be brought back to 12 and older also, as those do not exhibit this issue, but
the
> order of renaming is still wrong.
> I'm not sure if there could be cases where the in-memory buffers of the walreader are too small to cover a whole WAL
> file.
> There could also be other issues from operations that require reading WAL that happen after the .partial rename, I
> haven't checked in depth what else happens in the affected codepath.
> Please let me know if you think this should also be fixed in PG 12 and earlier, so I can produce the patches for
those
> versions as well.

There's no immediate need to change the versions. However, I would
prefer to backpatch them to the older versions for the following
reasons.

1. Applying this eases future backpatching in this area, if any.

2. I have reservations about renaming possibly-open WAL segments.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUG] recovery of prepared transactions during promotion can fail
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Allow pg_archivecleanup to remove backup history files