Re: A failure in t/038_save_logical_slots_shutdown.pl

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: A failure in t/038_save_logical_slots_shutdown.pl
Дата
Msg-id CAA4eK1+F5P9CEVNquD-fFd0-sxEkOPBGigM5iEcs8d8Yv+2HUg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A failure in t/038_save_logical_slots_shutdown.pl  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: A failure in t/038_save_logical_slots_shutdown.pl  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
RE: A failure in t/038_save_logical_slots_shutdown.pl  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
On Fri, Jan 12, 2024 at 3:36 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 12, 2024 at 9:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jan 11, 2024 at 10:03 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Thu, Jan 11, 2024 at 4:35 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > On further analysis, it was found that in the failing test,
> > > > CHECKPOINT_SHUTDOWN was started in a new page, so there was the WAL
> > > > page header present just before the CHECKPOINT_SHUTDOWN which was
> > > > causing the failure. We could alternatively reproduce the issue by
> > > > switching the WAL file before restarting the server like in the
> > > > attached test change patch.
> > > > There are a couple of ways to fix this issue a) one by switching the
> > > > WAL before the insertion of records so that the CHECKPOINT_SHUTDOWN
> > > > does not get inserted in a new page as in the attached test_fix.patch
> > > > b) by using pg_walinspect to check that the next WAL record is
> > > > CHECKPOINT_SHUTDOWN. I have to try this approach.
> > > >
> > > > Thanks to Bharath and Kuroda-san for offline discussions and helping
> > > > in getting to the root cause.
> > >
> > > IIUC, the problem the commit e0b2eed tries to solve is to ensure there
> > > are no left-over decodable WAL records between confirmed_flush LSN and
> > > a shutdown checkpoint, which is what it is expected from the
> > > t/038_save_logical_slots_shutdown.pl. How about we have a PG function
> > > returning true if there are any decodable WAL records between the
> > > given start_lsn and end_lsn?
> > >
> >
> > But, we already test this in 003_logical_slot during a successful
> > upgrade. Having an explicit test to do the same thing has some merits
> > but not sure if it is worth it.
>
> If the code added by commit e0b2eed is covered by the new upgrade
> test, why not remove 038_save_logical_slots_shutdown.pl altogether?
>

This is a more strict check because it is possible that even if the
latest confirmed_flush location is not persisted there is no
meaningful decodable WAL between whatever the last confirmed_flush
location saved on disk and the shutdown_checkpoint record.
Kuroda-San/Vignesh, do you have any suggestion on this one?

> > The current test tries to ensure that
> > during shutdown after we shutdown walsender and ensures that it sends
> > all the wal records and receipts an ack for the same, there is no
> > other WAL except shutdown_checkpoint. Vignesh's suggestion (a) makes
> > the test robust enough that it shouldn't show spurious failures like
> > the current one reported by you, so shall we try to proceed with that?
>
> Do you mean something like [1]? It ensures the test passes unless any
> writes are added (in future) before the publisher restarts in the test
> which can again make the tests flaky. How do we ensure no one adds
> anything in before the publisher restart
> 038_save_logical_slots_shutdown.pl? A note before the restart perhaps?
>

I am fine with adding the note.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: speed up a logical replica setup
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Invalidate the subscription worker in cases where a user loses their superuser status