Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
Дата	8 августа 2024 г. 13:07:19
Msg-id	2171536a-8227-4e53-ac47-33b69093b61d@iki.fi обсуждение исходный текст
Ответ на	BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. (PG Bug reporting form <noreply@postgresql.org>)
Ответы	Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
Список	pgsql-bugs

Дерево обсуждения

On 08/08/2024 14:58, Georgy Shelkovy wrote:
> this is good log of previous run for comparison
Thanks! That confirms my theory that the minRecoveryPoint in the "bad" 
case is at the LSN where the histories diverged, while in the "good" 
case, it's later. So pg_rewind is not wrong when it says that no rewind 
is required. It's still confusing though. Here's a visualization of the 
scenarios:

Legend:

TLI 1: this the WAL produced on the master
TLI 2: WAL produced on standby1
*: point of divergence
^: minRecoveryPoint on standby2

Good:

             -----------  TLI 2
            /
----------*------------  TLI
                    ^
Bad:

             -----------  TLI 2
            /
----------*------------ TLI 1
           ^

There's a third possibility, which actually produces an assertion 
failure. I was able to reproduce this case by adding some sleeps in the 
script and in walreceiver startup code:


             -----------  TLI 2
            /
----------*------------ TLI 1
     ^

pg_rewind: Source timeline history:
pg_rewind: 1: 0/0 - 0/1138F00
pg_rewind: 2: 0/1138F00 - 0/0
pg_rewind: Target timeline history:
pg_rewind: 1: 0/0 - 0/0
pg_rewind: servers diverged at WAL location 0/1138F00 on timeline 1
pg_rewind: ../src/bin/pg_rewind/pg_rewind.c:443: int main(int, char **): 
Assertion `target_wal_endrec == divergerec' failed.

Except for the assertion failure, I think that's essentially the same as 
the "Bad" case. On a non-assertion build, pg_rewind would report "no 
rewind required" which seems correct.


So it seems true that rewind is not required in those cases. However, if 
the WAL is already written on the standby's disk, just not replayed yet, 
then when you restart the server, it will replay the WAL from timeline 
1. That does seem surprising. Perhaps pg_rewind should just update the 
minRecoveryPoint and minRecoveryTLI in control file in that case, to 
force WAL recovery to follow the timeline switch to TLI 2.

I will try to write a TAP test for the "Bad" and the assertion failure 
case, fix the assertion failure, and test how updating the 
minRecoveryPoint would behave.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.