Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
От | Alexander Korotkov |
---|---|
Тема | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Дата | |
Msg-id | CAPpHfdvk5RxdKZuFDFgDet6ZAzVW0ojxP-pjjqZPFZUW2N5gEA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Thu, Jun 19, 2025 at 1:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Jun 18, 2025 at 10:17 PM Alexander Korotkov > <aekorotkov@gmail.com> wrote: > > > > On Wed, Jun 18, 2025 at 6:50 PM Vitaly Davydov <v.davydov@postgrespro.ru> wrote: > > > > I think, it is a good idea. Once we do not use the generated data, it is ok > > > > just to generate WAL segments using the proposed function. I've tested this > > > > function. The tests worked as expected with and without the fix. The attached > > > > patch does the change. > > > > > > Sorry, forgot to attach the patch. It is created on the current master branch. > > > It may conflict with your corrections. I hope, it could be useful. > > > > Thank you. I've integrated this into a patch to improve these tests. > > > > Regarding assertion failure, I've found that assert in > > PhysicalConfirmReceivedLocation() conflicts with restart_lsn > > previously set by ReplicationSlotReserveWal(). As I can see, > > ReplicationSlotReserveWal() just picks fresh XLogCtl->RedoRecPtr lsn. > > So, it doesn't seems there is a guarantee that restart_lsn never goes > > backward. The commit in ReplicationSlotReserveWal() even states there > > is a "chance that we have to retry". > > > > I don't see how this theory can lead to a restart_lsn of a slot going > backwards. The retry mentioned there is just a retry to reserve the > slot's position again if the required WAL is already removed. Such a > retry can only get the position later than the previous restart_lsn. Yes, if retry is needed, then the new position must be later for sure. What I mean is that ReplicationSlotReserveWal() can reserve something later than what standby is going to read (and correspondingly report with PhysicalConfirmReceivedLocation()). > > Thus, I propose to remove the > > assertion introduced by ca307d5cec90. > > > > If what I said above is correct, then the following part of the commit > message will be incorrect: > "As stated in the ReplicationSlotReserveWal() comment, this is not > always true. Additionally, this issue has been spotted by some > buildfarm > members." I agree, this comment needs improvement in terms of clarity. Meanwhile I've pushed the patch for TAP tests, which I think didn't get any objections. ------ Regards, Alexander Korotkov Supabase
В списке pgsql-hackers по дате отправления: