Обсуждение: Test 031_recovery_conflict.pl is not immune to autovacuum

Поиск
Список
Период
Сортировка

Test 031_recovery_conflict.pl is not immune to autovacuum

От
Alexander Lakhin
Дата:
Hello hackers,

Among many recoveryCheck (more concretely, 027_stream_regress) failures
occurred on a number of buildfarm animals after switching to meson, which
can be explained by timeouts, I saw a different failure on adder:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-03-18%2023%3A43%3A00
[23:48:52.521](9.831s) ok 13 - startup deadlock: cursor holding conflicting pin, also waiting for lock, established
[23:55:13.749](381.228s) # poll_query_until timed out executing this query:
#
# SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
#
# expecting this output:
# waiting
# last actual query output:
#
# with stderr:
[23:55:13.763](0.013s) not ok 14 - startup deadlock: lock acquisition is waiting

and I suspect that it might be caused by autovacuum.

I've managed to reproduced it locally (running 10 tests in parallel on a
2-core VM with disk bandwidth limited to 80MB/sec I get failures on
iterations 10, 1, 3) and observed the following (with wal_debug = on):
031_recovery_conflict_standby.log:
2024-03-20 04:12:06.519 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: DECLARE test_recovery_conflict_cursor
CURSOR
 
FOR SELECT a FROM test_recovery_conflict_table1;
2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: FETCH FORWARD FROM 
test_recovery_conflict_cursor;
2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: SELECT * FROM
test_recovery_conflict_table2;
...
2024-03-20 04:12:07.073 UTC|||65fa620d.111ec8|LOG:  REDO @ 0/3438360; LSN 0/3438460: prev 0/3438338; xid 0; len 9; 
blkref #0: rel 1663/16385/16392, blk 0 - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, 
isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,

47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,

77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]
2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|LOG:  recovery still waiting after 11.241 ms: recovery conflict on buffer
pin
2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|CONTEXT:  WAL redo at 0/3438360 for Heap2/PRUNE: snapshotConflictHorizon:

0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9,

10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,

40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,

70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,

100, 101]; blkref #0: rel 1663/16385/16392, blk 0
2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|ERROR: canceling statement due to conflict with recovery at

character 15
2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|DETAIL: User transaction caused buffer deadlock with
recovery.
...
2024-03-20 04:12:08.093 UTC|vagrant|postgres|65fa6216.111f1a|LOG: statement: SELECT 'waiting' FROM pg_locks WHERE 
locktype = 'relation' AND NOT granted;

031_recovery_conflict_primary.log:
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG:  Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes,

cost_limit=200, cost_delay=2 active=yes failsafe=no)
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG:  Autovacuum VacuumUpdateCosts(db=16385, rel=16392, dobalance=yes,

cost_limit=200, cost_delay=2 active=yes failsafe=no)
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|LOG:  INSERT @ 0/3438460:  - Heap2/PRUNE: snapshotConflictHorizon: 0, 
nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10,

11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,

41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,

71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100,
 
101]
2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|CONTEXT:  while scanning block 0 of relation 
"public.test_recovery_conflict_table1"
...
2024-03-20 04:12:05.981 UTC|||65fa6215.111f02|LOG:  automatic vacuum of table 
"test_db.public.test_recovery_conflict_table1": index scans: 0

The corresponding fragment of 031_recovery_conflict.pl:
$res = $psql_standby->query_until(
     qr/^1$/m, qq[
     BEGIN;
     -- hold pin
     DECLARE $cursor1 CURSOR FOR SELECT a FROM $table1;
     FETCH FORWARD FROM $cursor1;
     -- wait for lock held by prepared transaction
     SELECT * FROM $table2;
     ]);
ok(1,
     "$sect: cursor holding conflicting pin, also waiting for lock, established"
);

# just to make sure we're waiting for lock already
ok( $node_standby->poll_query_until(
         'postgres', qq[
SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
], 'waiting'),
     "$sect: lock acquisition is waiting");

# VACUUM FREEZE will prune away rows, causing a buffer pin conflict, while
# standby psql is waiting on lock
$node_primary->safe_psql($test_db, qq[VACUUM FREEZE $table1;]);

So if autovacuum happens to process "$table1" before SELECT ... FROM
pg_locks, a buffer pin conflict occurs before the manual VACUUM FREEZE
and poll_query_until() fails.

With autovacuum = off in TEMP_CONFIG 50 iterations passed for me in
the same environment.

Best regards,
Alexander



Re: Test 031_recovery_conflict.pl is not immune to autovacuum

От
Melanie Plageman
Дата:
On Wed, Mar 20, 2024 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello hackers,
>
> Among many recoveryCheck (more concretely, 027_stream_regress) failures
> occurred on a number of buildfarm animals after switching to meson, which
> can be explained by timeouts, I saw a different failure on adder:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-03-18%2023%3A43%3A00
> [23:48:52.521](9.831s) ok 13 - startup deadlock: cursor holding conflicting pin, also waiting for lock, established
> [23:55:13.749](381.228s) # poll_query_until timed out executing this query:
> #
> # SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
> #
> # expecting this output:
> # waiting
> # last actual query output:
> #
> # with stderr:
> [23:55:13.763](0.013s) not ok 14 - startup deadlock: lock acquisition is waiting
>
> and I suspect that it might be caused by autovacuum.
>
> I've managed to reproduced it locally (running 10 tests in parallel on a
> 2-core VM with disk bandwidth limited to 80MB/sec I get failures on
> iterations 10, 1, 3) and observed the following (with wal_debug = on):
> 031_recovery_conflict_standby.log:
> 2024-03-20 04:12:06.519 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: DECLARE test_recovery_conflict_cursor
CURSOR
> FOR SELECT a FROM test_recovery_conflict_table1;
> 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: FETCH FORWARD FROM
> test_recovery_conflict_cursor;
> 2024-03-20 04:12:06.520 UTC|vagrant|test_db|65fa6214.111ede|LOG: statement: SELECT * FROM
test_recovery_conflict_table2;
> ...
> 2024-03-20 04:12:07.073 UTC|||65fa620d.111ec8|LOG:  REDO @ 0/3438360; LSN 0/3438460: prev 0/3438338; xid 0; len 9;
> blkref #0: rel 1663/16385/16392, blk 0 - Heap2/PRUNE: snapshotConflictHorizon: 0, nredirected: 0, ndead: 0,
> isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
> 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46,
> 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
76,
> 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]
> 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|LOG:  recovery still waiting after 11.241 ms: recovery conflict on
bufferpin 
> 2024-03-20 04:12:07.084 UTC|||65fa620d.111ec8|CONTEXT:  WAL redo at 0/3438360 for Heap2/PRUNE:
snapshotConflictHorizon:
> 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8,
9,
> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39,
> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
69,
> 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
99,
> 100, 101]; blkref #0: rel 1663/16385/16392, blk 0
> 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|ERROR: canceling statement due to conflict with recovery
at
> character 15
> 2024-03-20 04:12:07.095 UTC|vagrant|test_db|65fa6214.111ede|DETAIL: User transaction caused buffer deadlock with
recovery.
> ...
> 2024-03-20 04:12:08.093 UTC|vagrant|postgres|65fa6216.111f1a|LOG: statement: SELECT 'waiting' FROM pg_locks WHERE
> locktype = 'relation' AND NOT granted;
>
> 031_recovery_conflict_primary.log:
> 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG:  Autovacuum VacuumUpdateCosts(db=16385, rel=16392,
dobalance=yes,
> cost_limit=200, cost_delay=2 active=yes failsafe=no)
> 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|DEBUG:  Autovacuum VacuumUpdateCosts(db=16385, rel=16392,
dobalance=yes,
> cost_limit=200, cost_delay=2 active=yes failsafe=no)
> 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|LOG:  INSERT @ 0/3438460:  - Heap2/PRUNE: snapshotConflictHorizon: 0,
> nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9,
10,
> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40,
> 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70,
> 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100,
> 101]
> 2024-03-20 04:12:05.980 UTC|||65fa6215.111f02|CONTEXT:  while scanning block 0 of relation
> "public.test_recovery_conflict_table1"
> ...
> 2024-03-20 04:12:05.981 UTC|||65fa6215.111f02|LOG:  automatic vacuum of table
> "test_db.public.test_recovery_conflict_table1": index scans: 0
>
> The corresponding fragment of 031_recovery_conflict.pl:
> $res = $psql_standby->query_until(
>      qr/^1$/m, qq[
>      BEGIN;
>      -- hold pin
>      DECLARE $cursor1 CURSOR FOR SELECT a FROM $table1;
>      FETCH FORWARD FROM $cursor1;
>      -- wait for lock held by prepared transaction
>      SELECT * FROM $table2;
>      ]);
> ok(1,
>      "$sect: cursor holding conflicting pin, also waiting for lock, established"
> );
>
> # just to make sure we're waiting for lock already
> ok( $node_standby->poll_query_until(
>          'postgres', qq[
> SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
> ], 'waiting'),
>      "$sect: lock acquisition is waiting");
>
> # VACUUM FREEZE will prune away rows, causing a buffer pin conflict, while
> # standby psql is waiting on lock
> $node_primary->safe_psql($test_db, qq[VACUUM FREEZE $table1;]);
>
> So if autovacuum happens to process "$table1" before SELECT ... FROM
> pg_locks, a buffer pin conflict occurs before the manual VACUUM FREEZE
> and poll_query_until() fails.
>
> With autovacuum = off in TEMP_CONFIG 50 iterations passed for me in
> the same environment.

Hmmm. Thanks for finding this and taking the time to reproduce it. I
don't know why I didn't think of this.

Seems like we could just add autovacuum_enabled=false to the table like this:

diff --git a/src/test/recovery/t/031_recovery_conflict.pl
b/src/test/recovery/t/031_recovery_conflict.pl
index d87efa823fd..65bc858c02d 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1";
 my $table2 = "test_recovery_conflict_table2";
 $node_primary->safe_psql(
    $test_db, qq[
-CREATE TABLE ${table1}(a int, b int);
+CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false);
 INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i;
 CREATE TABLE ${table2}(a int, b int);
 ]);

- Melanie



Re: Test 031_recovery_conflict.pl is not immune to autovacuum

От
Alexander Lakhin
Дата:
Hello Melanie,

20.03.2024 16:15, Melanie Plageman wrote:
> Seems like we could just add autovacuum_enabled=false to the table like this:
> diff --git a/src/test/recovery/t/031_recovery_conflict.pl
> b/src/test/recovery/t/031_recovery_conflict.pl
> index d87efa823fd..65bc858c02d 100644
> --- a/src/test/recovery/t/031_recovery_conflict.pl
> +++ b/src/test/recovery/t/031_recovery_conflict.pl
> @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1";
>   my $table2 = "test_recovery_conflict_table2";
>   $node_primary->safe_psql(
>      $test_db, qq[
> -CREATE TABLE ${table1}(a int, b int);
> +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false);
>   INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i;
>   CREATE TABLE ${table2}(a int, b int);
>   ]);

Thanks for paying attention to it!

With such modification applied I've got another failure (on iteration 2):
[13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting
Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary
done
timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl

line 318.
# Postmaster PID for node "primary" is 1523036
### Stopping node "primary" using mode immediate

031_recovery_conflict_standby.log really doesn't contain the expected
message. I can share log files from a successful and failed test runs, if
they can be helpful, or I'll investigate this case today/tomorrow.

Best regards,
Alexander



Re: Test 031_recovery_conflict.pl is not immune to autovacuum

От
Melanie Plageman
Дата:
On Wed, Mar 20, 2024 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello Melanie,
>
> 20.03.2024 16:15, Melanie Plageman wrote:
> > Seems like we could just add autovacuum_enabled=false to the table like this:
> > diff --git a/src/test/recovery/t/031_recovery_conflict.pl
> > b/src/test/recovery/t/031_recovery_conflict.pl
> > index d87efa823fd..65bc858c02d 100644
> > --- a/src/test/recovery/t/031_recovery_conflict.pl
> > +++ b/src/test/recovery/t/031_recovery_conflict.pl
> > @@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1";
> >   my $table2 = "test_recovery_conflict_table2";
> >   $node_primary->safe_psql(
> >      $test_db, qq[
> > -CREATE TABLE ${table1}(a int, b int);
> > +CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false);
> >   INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i;
> >   CREATE TABLE ${table2}(a int, b int);
> >   ]);
>
> Thanks for paying attention to it!
>
> With such modification applied I've got another failure (on iteration 2):

Thanks for trying it out!

> [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting
> Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary
> done
> timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at
t/031_recovery_conflict.pl
> line 318.
> # Postmaster PID for node "primary" is 1523036
> ### Stopping node "primary" using mode immediate
>
> 031_recovery_conflict_standby.log really doesn't contain the expected
> message. I can share log files from a successful and failed test runs, if
> they can be helpful, or I'll investigate this case today/tomorrow.

Hmm. The log file from the failed test run with
(autovacuum_enabled=false) would be helpful. I can't tell without the
log if it hit a different type of conflict. Unfortunately it was very
difficult to trigger the specific type of recovery conflict we were
trying to test and not hit another of the recovery conflict types
first. It'll take me some time to swap this back in my head, though.

- Melanie



Re: Test 031_recovery_conflict.pl is not immune to autovacuum

От
Alexander Lakhin
Дата:
20.03.2024 22:24, Melanie Plageman wrote:
>> [13:27:39.034](2.317s) ok 14 - startup deadlock: lock acquisition is waiting
>> Waiting for replication conn standby's replay_lsn to pass 0/343E6D0 on primary
>> done
>> timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at
t/031_recovery_conflict.pl
>> line 318.
>> # Postmaster PID for node "primary" is 1523036
>> ### Stopping node "primary" using mode immediate
>>
>> 031_recovery_conflict_standby.log really doesn't contain the expected
>> message. I can share log files from a successful and failed test runs, if
>> they can be helpful, or I'll investigate this case today/tomorrow.
> Hmm. The log file from the failed test run with
> (autovacuum_enabled=false) would be helpful. I can't tell without the
> log if it hit a different type of conflict. Unfortunately it was very
> difficult to trigger the specific type of recovery conflict we were
> trying to test and not hit another of the recovery conflict types
> first. It'll take me some time to swap this back in my head, though.

Please look at the attached logs. For a successful run, I see in 031_recovery_conflict_standby.log:
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|CONTEXT:  WAL redo at 0/347F9B8 for Heap2/PRUNE: snapshotConflictHorizon:

0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9,

10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,

40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,

70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,

100, 101]; blkref #0: rel 1663/16385/16396, blk 0
2024-03-20 13:28:08.084 UTC|vagrant|test_db|65fae467.174503|ERROR: canceling statement due to conflict with recovery at

character 15
2024-03-20 13:28:08.084 UTC|vagrant|test_db|65fae467.174503|DETAIL: User transaction caused buffer deadlock with
recovery.
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|LOG:  recovery finished waiting after 10.432 ms: recovery conflict on 
buffer pin
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|CONTEXT:  WAL redo at 0/347F9B8 for Heap2/PRUNE: snapshotConflictHorizon:

0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused: [2, 3, 4, 5, 6, 7, 8, 9,

10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,

40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,

70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,

100, 101]; blkref #0: rel 1663/16385/16396, blk 0
2024-03-20 13:28:08.084 UTC|||65fae466.1744d7|LOG:  REDO @ 0/347FAB8; LSN 0/347FB00: prev 0/347F9B8; xid 0; len 7; 
blkref #0: rel 1663/16385/16396, blk 0 - Heap2/FREEZE_PAGE: snapshotConflictHorizon: 762, nplans: 1, isCatalogRel: F, 
plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1, offsets: [1] }]

And the corresponding fragment from 031_recovery_conflict_primary.log:
2024-03-20 13:28:07.846 UTC|vagrant|test_db|65fae467.174549|LOG: xlog flush request 0/347DCB0; write 0/0; flush 0/0
2024-03-20 13:28:07.846 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  writing block 16 of relation base/16385/1249
2024-03-20 13:28:07.847 UTC|vagrant|test_db|65fae467.174549|LOG: statement: VACUUM FREEZE
test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/347FAB8:  - Heap2/PRUNE: 
snapshotConflictHorizon: 0, nredirected: 0, ndead: 0, isCatalogRel: F, nunused: 100, redirected: [], dead: [], unused:

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33,
 
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,

64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,

94, 95, 96, 97, 98, 99, 100, 101]
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning block 0 of relation 
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/347FB00:  - Heap2/FREEZE_PAGE: 
snapshotConflictHorizon: 762, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1,

offsets: [1] }]
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning block 0 of relation 
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3481B50:  - XLOG/FPI_FOR_HINT:
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning block 0 of relation 
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3483BA0:  - XLOG/FPI_FOR_HINT:
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning relation 
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485BF0:  - XLOG/FPI_FOR_HINT:
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|CONTEXT:  while scanning relation 
"public.test_recovery_conflict_table1"
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485CB0:  - Heap/INPLACE: off: 4
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|LOG: INSERT @ 0/3485D10:  - Standby/INVALIDATIONS: ; inval

msgs: catcache 55 catcache 54 relcache 16396
2024-03-20 13:28:07.848 UTC|vagrant|test_db|65fae467.174549|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;

Whilst 031_recovery_conflict_primary.log for the failed run contains:
2024-03-20 13:27:39.042 UTC|vagrant|test_db|65fae44b.17419a|LOG: statement: VACUUM FREEZE
test_recovery_conflict_table1;
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E718:  - Heap2/FREEZE_PAGE: 
snapshotConflictHorizon: 751, nplans: 1, isCatalogRel: F, plans: [{ xmax: 0, infomask: 2817, infomask2: 2, ntuples: 1,

offsets: [1] }]
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|CONTEXT:  while scanning block 0 of relation 
"public.test_recovery_conflict_table1"
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E7D8:  - Heap/INPLACE: off: 4
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|LOG: INSERT @ 0/343E838:  - Standby/INVALIDATIONS: ; inval

msgs: catcache 55 catcache 54 relcache 16392
2024-03-20 13:27:39.043 UTC|vagrant|test_db|65fae44b.17419a|STATEMENT:  VACUUM FREEZE test_recovery_conflict_table1;

(there is no Heap2/PRUNE record)

I've modified the test as follows:
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -59,7 +59,7 @@ my $table1 = "test_recovery_conflict_table1";
  my $table2 = "test_recovery_conflict_table2";
  $node_primary->safe_psql(
         $test_db, qq[
-CREATE TABLE ${table1}(a int, b int);
+CREATE TABLE ${table1}(a int, b int) WITH (autovacuum_enabled = false);
  INSERT INTO $table1 SELECT i % 3, 0 FROM generate_series(1,20) i;
  CREATE TABLE ${table2}(a int, b int);
  ]);


Best regards,
Alexander
Вложения