Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. |
Дата | |
Msg-id | 2171536a-8227-4e53-ac47-33b69093b61d@iki.fi обсуждение исходный текст |
Ответ на | BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
|
Список | pgsql-bugs |
On 08/08/2024 14:58, Georgy Shelkovy wrote: > this is good log of previous run for comparison Thanks! That confirms my theory that the minRecoveryPoint in the "bad" case is at the LSN where the histories diverged, while in the "good" case, it's later. So pg_rewind is not wrong when it says that no rewind is required. It's still confusing though. Here's a visualization of the scenarios: Legend: TLI 1: this the WAL produced on the master TLI 2: WAL produced on standby1 *: point of divergence ^: minRecoveryPoint on standby2 Good: ----------- TLI 2 / ----------*------------ TLI ^ Bad: ----------- TLI 2 / ----------*------------ TLI 1 ^ There's a third possibility, which actually produces an assertion failure. I was able to reproduce this case by adding some sleeps in the script and in walreceiver startup code: ----------- TLI 2 / ----------*------------ TLI 1 ^ pg_rewind: Source timeline history: pg_rewind: 1: 0/0 - 0/1138F00 pg_rewind: 2: 0/1138F00 - 0/0 pg_rewind: Target timeline history: pg_rewind: 1: 0/0 - 0/0 pg_rewind: servers diverged at WAL location 0/1138F00 on timeline 1 pg_rewind: ../src/bin/pg_rewind/pg_rewind.c:443: int main(int, char **): Assertion `target_wal_endrec == divergerec' failed. Except for the assertion failure, I think that's essentially the same as the "Bad" case. On a non-assertion build, pg_rewind would report "no rewind required" which seems correct. So it seems true that rewind is not required in those cases. However, if the WAL is already written on the standby's disk, just not replayed yet, then when you restart the server, it will replay the WAL from timeline 1. That does seem surprising. Perhaps pg_rewind should just update the minRecoveryPoint and minRecoveryTLI in control file in that case, to force WAL recovery to follow the timeline switch to TLI 2. I will try to write a TAP test for the "Bad" and the assertion failure case, fix the assertion failure, and test how updating the minRecoveryPoint would behave. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-bugs по дате отправления: