Re: A failure in t/038_save_logical_slots_shutdown.pl

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: A failure in t/038_save_logical_slots_shutdown.pl
Дата
Msg-id CALj2ACVpJMQ11dQHvocAMpLUr4kFh=cKF7HjqnMLcgvAR1sqkw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A failure in t/038_save_logical_slots_shutdown.pl  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: A failure in t/038_save_logical_slots_shutdown.pl  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Jan 12, 2024 at 9:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jan 11, 2024 at 10:03 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Jan 11, 2024 at 4:35 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On further analysis, it was found that in the failing test,
> > > CHECKPOINT_SHUTDOWN was started in a new page, so there was the WAL
> > > page header present just before the CHECKPOINT_SHUTDOWN which was
> > > causing the failure. We could alternatively reproduce the issue by
> > > switching the WAL file before restarting the server like in the
> > > attached test change patch.
> > > There are a couple of ways to fix this issue a) one by switching the
> > > WAL before the insertion of records so that the CHECKPOINT_SHUTDOWN
> > > does not get inserted in a new page as in the attached test_fix.patch
> > > b) by using pg_walinspect to check that the next WAL record is
> > > CHECKPOINT_SHUTDOWN. I have to try this approach.
> > >
> > > Thanks to Bharath and Kuroda-san for offline discussions and helping
> > > in getting to the root cause.
> >
> > IIUC, the problem the commit e0b2eed tries to solve is to ensure there
> > are no left-over decodable WAL records between confirmed_flush LSN and
> > a shutdown checkpoint, which is what it is expected from the
> > t/038_save_logical_slots_shutdown.pl. How about we have a PG function
> > returning true if there are any decodable WAL records between the
> > given start_lsn and end_lsn?
> >
>
> But, we already test this in 003_logical_slot during a successful
> upgrade. Having an explicit test to do the same thing has some merits
> but not sure if it is worth it.

If the code added by commit e0b2eed is covered by the new upgrade
test, why not remove 038_save_logical_slots_shutdown.pl altogether?

> The current test tries to ensure that
> during shutdown after we shutdown walsender and ensures that it sends
> all the wal records and receipts an ack for the same, there is no
> other WAL except shutdown_checkpoint. Vignesh's suggestion (a) makes
> the test robust enough that it shouldn't show spurious failures like
> the current one reported by you, so shall we try to proceed with that?

Do you mean something like [1]? It ensures the test passes unless any
writes are added (in future) before the publisher restarts in the test
which can again make the tests flaky. How do we ensure no one adds
anything in before the publisher restart
038_save_logical_slots_shutdown.pl? A note before the restart perhaps?
I might be okay with a simple solution like [1] with a note before the
restart instead of other complicated ones.

[1]
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
index 5a4f5dc1d4..493fdbce2f 100644
--- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -60,6 +60,14 @@ $node_subscriber->start;
 $node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");

+# On some machines, it was detected that the shutdown checkpoint WAL record
+# that gets generated as part of the publisher restart below falls exactly in
+# the new page in the WAL file. Due to this, the latest checkpoint location and
+# confirmed flush check in compare_confirmed_flush() was failing. Hence, we
+# advance WAL by 1 segment before generating some data so that the shutdown
+# checkpoint doesn't fall exactly in the new WAL file page.
+$node_publisher->advance_wal(1);
+
 # Insert some data
 $node_publisher->safe_psql('postgres',
        "INSERT INTO test_tbl VALUES (generate_series(1, 5));");

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: plpgsql memory leaks
Следующее
От: Christoph Berg
Дата:
Сообщение: plperl and perl 5.38