Re: [HACKERS] Restricting maximum keep segments by repslots

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: [HACKERS] Restricting maximum keep segments by repslots
Дата
Msg-id 20200428.171815.1687900483771598932.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Restricting maximum keep segments by repslots  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: [HACKERS] Restricting maximum keep segments by repslots  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
At Mon, 27 Apr 2020 19:40:07 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-Apr-08, Kyotaro Horiguchi wrote:
> 
> > I understand how it happens.
> > 
> > The latch triggered by checkpoint request by CHECKPOINT command has
> > been absorbed by ConditionVariableSleep() in
> > InvalidateObsoleteReplicationSlots.  The attached allows checkpointer
> > use MyLatch for other than checkpoint request while a checkpoint is
> > running.
> 
> Hmm, that explanation makes sense, but I couldn't reproduce it with the
> steps you provided.  Perhaps I'm missing something.

Sorry for the incomplete reproducer. A checkpoint needs to be running
simultaneously for the manual checkpoint to hang up.  The following is
the complete sequence.

1. Build a primary database cluster with the following setup, then start it.
   max_slot_wal_keep_size=0
   max_wal_size=32MB
   min_wal_size=32MB

2. Build a replica from the primary creating a slot, then start it.

   $ pg_basebackup -R -C -S s1 -D...
   
3. Try the following commands. Try several times if it succeeds.
  =# create table tt(); drop table tt; select pg_switch_wal();checkpoint;

It is evidently stochastic, but it works quite reliably for me.

> Anyway I think this patch should fix it also -- instead of adding a new
> flag, we just rely on the existing flags (since do_checkpoint must have
> been set correctly from the flags earlier in that block.)

Since the added (!do_checkpoint) check is reached with
do_checkpoint=false at server start and at archive_timeout intervals,
the patch makes checkpointer run a busy-loop at that timings, and that
loop lasts until a checkpoint is actually executed.

What we need to do here is not forgetting the fact that the latch has
been set even if the latch itself gets reset before reaching to
WaitLatch.

> I think it'd be worth to verify this bugfix in a new test.  Would you
> have time to produce that?  I could try in a couple of days ...

The attached patch on 019_replslot_limit.pl does the commands above
automatically. It sometimes succeed but fails in most cases, at least
for me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 32dce54522..c8ec4bb363 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -8,7 +8,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => 13;
+use Test::More tests => 14;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -181,6 +181,36 @@ ok($failed, 'check that replication has been broken');
 
 $node_standby->stop;
 
+my $node_master2 = get_new_node('master2');
+$node_master2->init(allows_streaming => 1);
+$node_master2->append_conf('postgresql.conf', qq(
+min_wal_size = 32MB
+max_wal_size = 32MB
+log_checkpoints = yes
+));
+$node_master2->start;
+$node_master2->safe_psql('postgres', "SELECT pg_create_physical_replication_slot('rep1')");
+$backup_name = 'my_backup2';
+$node_master2->backup($backup_name);
+
+$node_master2->stop;
+$node_master2->append_conf('postgresql.conf', qq(
+max_slot_wal_keep_size = 0
+));
+$node_master2->start;
+
+$node_standby = get_new_node('standby_2');
+$node_standby->init_from_backup($node_master2, $backup_name, has_streaming => 1);
+$node_standby->append_conf('postgresql.conf', "primary_slot_name = 'rep1'");
+$node_standby->start;
+my @result =
+  split('\n', $node_master2->safe_psql('postgres', "
+                                       CREATE TABLE tt(); DROP TABLE tt;
+                                       SELECT pg_switch_wal();
+                                       CHECKPOINT;
+                                       SELECT 'finished';", timeout=>'5'));
+is($result[1], 'finished', 'check if checkpoint command is not blocked');
+
 #####################################
 # Advance WAL of $node by $n segments
 sub advance_wal

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Fixes for two separate bugs in nbtree VACUUM's page deletion
Следующее
От: Masahiro Ikeda
Дата:
Сообщение: Re: Why are wait events not reported even though it reads/writes atimeline history file?