Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Дата
Msg-id 20210719.111318.2042379313472032754.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-bugs
At Sat, 17 Jul 2021 10:28:09 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Jul-16, Alvaro Herrera wrote:
> 
> > The buildfarm has remained green so far, but clearly this is something
> > we need to fix.  Maybe it's as simple as adding the loop we use below,
> > starting at line 219.
> 
> There are a few failures of this on buildfarm now,
..
> I am running the test in a loop with the attached; if it doesn't fail in
> a few more rounds I'll push it.
> 
> There are two instances of a different failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14
> 
> #   Failed test 'check that segments have been removed'
> #   at t/019_replslot_limit.pl line 213.
> #          got: '000000010000000000000021'
> #     expected: '000000010000000000000022'
> # Looks like you failed 1 test of 19.
> [23:55:14] t/019_replslot_limit.pl .............. 
> Dubious, test returned 1 (wstat 256, 0x100)
> 
> I'm afraid about this not being something we can fix with some
> additional wait points ...

Sorry for the mistake.  It seems to me the cause the above is that
segment removal happens *after* invalidation. Since (at least
currently) the "slot is invalidated" warning issued only at the time
just before WAL-removal, we should expect that old segments are gone
after the checkpoint-ending log, which should be seen after
WAL-removal.  If not, that shows that there's a bug.

What do you think about the attached?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From c52d7931e95cc24804f9aac4c9bf3a388c7e461f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 19 Jul 2021 10:58:01 +0900
Subject: [PATCH v1] Remove possible instability of new replication slot test
 code

The last fix for the same left another possible timing unstability
between actual segment removal and the invalidation log. Make it
steady by waiting for checkpoint-ending log, which is issued after the
segment removal.
---
 src/test/recovery/t/019_replslot_limit.pl | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 026da02ff1..a5d8140807 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => $TestLib::windows_os ? 15 : 19;
+use Test::More tests => $TestLib::windows_os ? 16 : 20;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -201,6 +201,19 @@ $result = $node_primary->safe_psql(
 is($result, "rep1|f|t|lost|",
     'check that the slot became inactive and the state "lost" persists');
 
+# Make sure the current checkpoint ended
+my $checkpoint_ended = 0;
+for (my $i = 0; $i < 10000; $i++)
+{
+    if (find_in_log($node_primary, "checkpoint complete: ", $logstart))
+    {
+        $checkpoint_ended = 1;
+        last;
+    }
+    usleep(100_000);
+}
+ok($checkpoint_ended, 'make sure checkpoint ended');
+
 # The invalidated slot shouldn't keep the old-segment horizon back;
 # see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
 # Test for this by creating a new slot and comparing its restart LSN
-- 
2.27.0


В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17113: Assert failed on calling a function fixed after an extension reload
Следующее
От: Andrey Borodin
Дата:
Сообщение: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data