On Mon, Feb 26, 2024 at 02:01:45PM +0000, Bertrand Drouvot wrote:
> Though [1] mentioned up-thread is not pushed yet, I'm Sharing the POC patch now
> (see the attached).
I have looked at what you have here.
First, in a build where 818fefd8fd is included, this makes the test
script a lot slower. Most of the logic is quick, but we're spending
10s or so checking that catalog_xmin has advanced. Could it be
possible to make that faster?
A second issue is the failure mode when 818fefd8fd is reverted. The
test is getting stuck when we are waiting on the standby to catch up,
until a timeout decides to kick in to fail the test, and all the
previous tests pass. Could it be possible to make that more
responsive? I assume that in the failure mode we would get an
incorrect conflict_reason for injection_inactiveslot, succeeding in
checking the failure.
+ my $terminated = 0;
+ for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+ {
+ if ($node_standby->log_contains(
+ 'terminating process .* to release replication slot \"injection_activeslot\"', $logstart))
+ {
+ $terminated = 1;
+ last;
+ }
+ usleep(100_000);
+ }
+ ok($terminated, 'terminating process holding the active slot is logged with injection point');
The LOG exists when we are sure that the startup process is waiting
in the injection point, so this loop could be replaced with something
like:
+ $node_standby->wait_for_event('startup', 'TerminateProcessHoldingSlot');
+ ok( $node_standby->log_contains('terminating process .* .. ', 'termin .. ';)
Nit: the name of the injection point should be
terminate-process-holding-slot rather than
TerminateProcessHoldingSlot, to be consistent with the other ones.
--
Michael