Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Дата
Msg-id 657815a2-5a89-fcc1-1c9d-d77a6986bc26@gmail.com
обсуждение исходный текст
Ответ на Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Ответы Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Список pgsql-hackers
Hello Bertrand and Michael,

23.01.2024 11:07, Bertrand Drouvot wrote:
> On Tue, Jan 23, 2024 at 02:50:06PM +0900, Michael Paquier wrote:
>
>> Anyway, that's not the end of it.  What should we do for snapshot
>> snapshot records coming from the bgwriter?
> What about?
>
> 3) depending on how stabilized this test (and others that suffer from "random"
> xl_running_xacts) is, then think about the bgwriter.

A recent buildfarm failure [1] reminds me of that remaining question.
Here we have a slow machine (a successful run, for example [2], shows
541.13s duration of the test) and the following information logged:

[13:55:13.725](34.411s) ok 25 - inactiveslot slot invalidation is logged with vacuum on pg_class
[13:55:13.727](0.002s) not ok 26 - activeslot slot invalidation is logged with vacuum on pg_class
[13:55:13.728](0.001s) #   Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
#   at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 229.
[14:27:42.995](1949.267s) # poll_query_until timed out executing this query:
# select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[14:27:42.999](0.004s) not ok 27 - confl_active_logicalslot updated
[14:27:43.000](0.001s) #   Failed test 'confl_active_logicalslot updated'
#   at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235.
Timed out waiting confl_active_logicalslot to be updated at 
C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235.

---
035_standby_logical_decoding_standby.log:
2024-06-06 13:55:07.715 UTC [9172:7] LOG:  invalidating obsolete replication slot "row_removal_inactiveslot"
2024-06-06 13:55:07.715 UTC [9172:8] DETAIL:  The slot conflicted with xid horizon 754.
2024-06-06 13:55:07.715 UTC [9172:9] CONTEXT:  WAL redo at 0/4020A80 for Heap2/PRUNE_ON_ACCESS:
snapshotConflictHorizon:
 
754, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 1, nunused: 0, dead: [48]; blkref #0: rel 1663/16384/2610, blk
0
2024-06-06 13:55:14.372 UTC [7532:1] [unknown] LOG:  connection received: host=127.0.0.1 port=55328
2024-06-06 13:55:14.381 UTC [7532:2] [unknown] LOG:  connection authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner" 
method=sspi 

(C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_hba.conf:2)
2024-06-06 13:55:14.381 UTC [7532:3] [unknown] LOG:  connection authorized: user=pgrunner database=postgres 
application_name=035_standby_logical_decoding.pl
2024-06-06 13:55:14.443 UTC [7532:4] 035_standby_logical_decoding.pl LOG:  statement: select (confl_active_logicalslot
=
 
1) from pg_stat_database_conflicts where datname = 'testdb'
2024-06-06 13:55:14.452 UTC [7532:5] 035_standby_logical_decoding.pl LOG:  disconnection: session time: 0:00:00.090 
user=pgrunner database=postgres host=127.0.0.1 port=55328
# (there is no `invalidating obsolete replication slot "row_removal_activeslot"` message)
...
2024-06-06 14:27:42.675 UTC [4032:4] 035_standby_logical_decoding.pl LOG:  statement: select (confl_active_logicalslot
=
 
1) from pg_stat_database_conflicts where datname = 'testdb'
2024-06-06 14:27:42.681 UTC [4032:5] 035_standby_logical_decoding.pl LOG:  disconnection: session time: 0:00:00.080 
user=pgrunner database=postgres host=127.0.0.1 port=58713
2024-06-06 14:27:43.095 UTC [7892:2] FATAL:  could not receive data from WAL stream: server closed the connection 
unexpectedly
         This probably means the server terminated abnormally
         before or while processing the request.

It's hard to determine from this info, why row_removal_activeslot was not
invalidated, but running this test on a slowed down Windows VM, I (still)
get the same looking failures caused by RUNNING_XACTS appeared just before
`invalidating obsolete replication slot "row_removal_inactiveslot"`.
So I would consider this failure as yet another result of bgwriter activity
and add it to the list of known failures as such...

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-06-06%2012%3A36%3A11
[2] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2024-06-05%2017%3A03%3A13&stg=misc-check

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Assert in heapgettup_pagemode() fails due to underlying buffer change
Следующее
От: jian he
Дата:
Сообщение: Re: altering a column's collation leaves an invalid foreign key