Re: pgsql: Improve runtime and output of tests for replication slots checkp
От | Tom Lane |
---|---|
Тема | Re: pgsql: Improve runtime and output of tests for replication slots checkp |
Дата | |
Msg-id | 2542023.1750456985@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: pgsql: Improve runtime and output of tests for replication slots checkp (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-committers |
Melanie Plageman <melanieplageman@gmail.com> writes: > Quite a few animals have started failing since this commit (for example > [1]) . I haven't looked into why, but I suspect something is wrong. It looks to me like it's being triggered by this questionable bit in 046_checkpoint_logical_slot.pl: # Continue the checkpoint. $node->safe_psql('postgres', q{select injection_points_wakeup('checkpoint-before-old-wal-removal')}); # Abruptly stop the server (1 second should be enough for the checkpoint # to finish; it would be better). $node->stop('immediate'); That second comment is pretty unintelligible, but I think it's expecting that we'd give the checkpoint 1 second to complete, which the code is *not* doing. On my own machine it looks like the checkpoint does manage to complete within about 1ms, just barely before the shutdown arrives: 2025-06-20 17:52:25.599 EDT [2538690] 046_checkpoint_logical_slot.pl LOG: statement: select pg_replication_slot_advance('slot_physical',pg_current_wal_lsn()) 2025-06-20 17:52:25.602 EDT [2538692] 046_checkpoint_logical_slot.pl LOG: statement: select injection_points_wakeup('checkpoint-before-old-wal-removal') 2025-06-20 17:52:25.603 EDT [2538557] LOG: checkpoint complete: wrote 1 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s)added, 0 removed, 0 recycled; write=0.003 s, sync=0.001 s, total=1.074 s; sync files=0, longest=0.000 s, average=0.000s; distance=327688 kB, estimate=327688 kB; lsn=0/290020C0, redo lsn=0/29002068 2025-06-20 17:52:25.604 EDT [2538553] LOG: received immediate shutdown request But in the buildfarm failures I don't see any 'checkpoint complete' before the shutdown. If this is an accurate diagnosis then it indicates both a test bug (it should delay here, or else the comment needs fixed to explain what we're actually testing) and a backend bug, because an immediate stop a/k/a crash before completing the checkpoint should not lead to failure to function after the next restart. regards, tom lane
В списке pgsql-committers по дате отправления: