Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
Дата
Msg-id e08235e3-b216-4bf3-8a9a-8ea819ae105e@iki.fi
обсуждение исходный текст
Ответ на BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.  (PG Bug reporting form <noreply@postgresql.org>)
Список pgsql-bugs
On 08/08/2024 10:57, Georgy Shelkovy wrote:
> Unfortunately, the playback is not very stable, but sometimes it shoots. 
> I added some commands to show last WAL rows

Thanks. I still haven't been able to reproduce it, but here's a theory:

When determining whether the target needs rewinding, pg_rewind looks at 
the target's last checkpoint record, or if it's a standby, its 
minRecoveryPoint. It's possible that standby2's minRecoveryPoint is 
indeed before the point of divergence. That means it has replayed the 
340 insert records, but all the changes are still only sitting in the 
shared buffer cache. When you shut it down, those 340 inserts are gone 
on standby2. When you restart it, they will be applied again from the WAL.

In that case, pg_rewind's conclusion that no rewind is needed is 
correct. standby2 is strictly behind standby1, and could catch up 
directly to it. However, when you restart standby2, it will first replay 
the WAL it had streamed from master.

Can you show the full output of pg_controldata on all the servers, 
please? In your latest snippet, you showed just the checkpoint 
locations, but if just remove the "grep checkpoint | grep location" 
filters, it would print the whole thing. I'm particularly interested in 
the minRecoveryPoint on standby2, in the cases when it works and when it 
doesn't.

I'm not sure what the right behavior would be if that's the issue. 
Perhaps pg_rewind should truncate the WAL in standby2/pg_wal/ in that 
case, so that when you start it up again, it would not replay the local 
WAL but would connect to standby2 directly. Also, perhaps a fast 
shutdown of a standby server should update minRecoveryPoint before exiting.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




В списке pgsql-bugs по дате отправления: