Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. |
Дата | |
Msg-id | e08235e3-b216-4bf3-8a9a-8ea819ae105e@iki.fi обсуждение исходный текст |
Ответ на | BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
On 08/08/2024 10:57, Georgy Shelkovy wrote: > Unfortunately, the playback is not very stable, but sometimes it shoots. > I added some commands to show last WAL rows Thanks. I still haven't been able to reproduce it, but here's a theory: When determining whether the target needs rewinding, pg_rewind looks at the target's last checkpoint record, or if it's a standby, its minRecoveryPoint. It's possible that standby2's minRecoveryPoint is indeed before the point of divergence. That means it has replayed the 340 insert records, but all the changes are still only sitting in the shared buffer cache. When you shut it down, those 340 inserts are gone on standby2. When you restart it, they will be applied again from the WAL. In that case, pg_rewind's conclusion that no rewind is needed is correct. standby2 is strictly behind standby1, and could catch up directly to it. However, when you restart standby2, it will first replay the WAL it had streamed from master. Can you show the full output of pg_controldata on all the servers, please? In your latest snippet, you showed just the checkpoint locations, but if just remove the "grep checkpoint | grep location" filters, it would print the whole thing. I'm particularly interested in the minRecoveryPoint on standby2, in the cases when it works and when it doesn't. I'm not sure what the right behavior would be if that's the issue. Perhaps pg_rewind should truncate the WAL in standby2/pg_wal/ in that case, so that when you start it up again, it would not replay the local WAL but would connect to standby2 directly. Also, perhaps a fast shutdown of a standby server should update minRecoveryPoint before exiting. -- Heikki Linnakangas Neon (https://neon.tech)
В списке pgsql-bugs по дате отправления: