Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done.
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. |
Дата | |
Msg-id | fe7af87d-b9bf-481a-902c-089c49aa911d@iki.fi обсуждение исходный текст |
Ответ на | Re: BUG #18575: Sometimes pg_rewind mistakenly assumes that nothing needs to be done. (Heikki Linnakangas <hlinnaka@iki.fi>) |
Список | pgsql-bugs |
On 08/08/2024 16:07, Heikki Linnakangas wrote: > So it seems true that rewind is not required in those cases. However, if > the WAL is already written on the standby's disk, just not replayed yet, > then when you restart the server, it will replay the WAL from timeline > 1. That does seem surprising. Perhaps pg_rewind should just update the > minRecoveryPoint and minRecoveryTLI in control file in that case, to > force WAL recovery to follow the timeline switch to TLI 2. > > I will try to write a TAP test for the "Bad" and the assertion failure > case, fix the assertion failure, and test how updating the > minRecoveryPoint would behave. I've divided this into three separate issues: 1. I think the assertion in pg_rewind is simply bogus, and we should remove it. Attached 0002-Remove-bogus-assertion-in-pg_rewind.patch does that, and adds a test case to cover it. 2. Independently of pg_rewind: When you start PostgreSQL, it will first try to recover all the WAL it has locally in pg_wal. That goes wrong if you have set a recovery target TLI. For example, imagine this situation: - Recovery target TLI is 2, set explicitly in postgresql.conf - The switchpoint from TLI 1 to 2 happened at WAL position 0/1510198 (the switchpoint is found in 00000002.history) - There is a WAL file 000000010000000000000001 under pg_wal, which contains valid WAL up to 0/1590000 When you start the server, it will first recover all the WAL from 000000010000000000000001, up to 0/1590000. Then it will connect to the primary to fetch mor WAL, but it will fail to make any progress because it already blew past the switch point. It's obviously wrong to replay the WAL from timeline 1 beyond the 1->2 switchpoint, when the recovery target is TLI 2. The attached 0003-Don-t-read-past-current-TLI-during-archive-recovery.patch fixes that. However, the logic to find the right WAL segment file and read the WAL is extremely complicated, and I don't feel comfortable that I got all the cases right. Review would be highly appreciated. The patch includes a test case to demonstrate the case, with no pg_rewind. It does include one "manual" step to copy a timeline history file into pg_wal, marked with XXX, however. So I'm not sure how possible this scenario is in production setups . 3. When pg_rewind has nothing to do, the target server is left unmodified, in a state such that when you restart it, it will replay all the WAL it has locally in pg_wal first, before connecting to the primary. Even though the target is a direct ancestor of the source and hence it *can* follow the WAL to the source's position without rewinding, it doesn't mean that it *will* actually do so. The attached changes it so that it updates the control file in that case, setting minRecoveryPoint and minRecoveryPointTLI to point to the source's current WAL position. That way, when you start it up, it will follow the timeline history to reach that point. (This requires fixing issue 2, because otherwise it still won't follow the history correctly to reach the minRecoveryPointTLI) To make the test pass, it actually would be sufficient to copy the timeline history file into pg_wal (which the patch also does). But updating the minRecoveryPoint seems good to ensure that it follows the right timeline. Otherwise it relies on the logic at startup to find the latest timeline, and that the latest timeline is the one you tried to rewind to. I think it would go wrong if there was another higher-numbered history file present in pg_wal for some reason. All in all, I don't feel very confident about all this. The assertion seems straightforward, so barring objections I'll commit and backpatch that. The timeline-following at startup (issue 2) seems pretty clearly wrong, but I'm not sure it's worth the risk to backpatch. Similarly issue 3 might not be worth the risk to backpatch, especially if we don't also backpatch 2. I would love to hear comments on those. Georgy, if you have the possibility to test this patches with your repro script, that would be highly appreciated. -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
В списке pgsql-bugs по дате отправления: