Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
От | Bertrand Drouvot |
---|---|
Тема | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed |
Дата | |
Msg-id | ZZ7GpII4bAYN+jT5@ip-10-97-1-34.eu-west-3.compute.internal обсуждение исходный текст |
Ответ на | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed (Alexander Lakhin <exclusion@gmail.com>) |
Ответы |
Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
|
Список | pgsql-hackers |
Hi, On Wed, Jan 10, 2024 at 05:00:01PM +0300, Alexander Lakhin wrote: > 10.01.2024 12:46, Bertrand Drouvot wrote: > > > Would it be possible to also send the standby logs? > > Yes, please look at the attached logs. This time I've build postgres with > -DWAL_DEBUG and ran tests with TEMP_CONFIG as below: > wal_keep_size=1GB > wal_debug = on > log_autovacuum_min_duration = 0 > log_statement = 'all' > log_min_messages = INFO > > The archive attached contains logs from four runs: > recovery-1-ok -- an example of successful run for reference > recovery-7-pruning and recovery-19-pruning -- failures with a failed > subtest 'activeslot slot invalidation is logged with on-access pruning' > recovery-15-vacuum_pg_class -- a failure with a failed > subtest 'activeslot slot invalidation is logged with vacuum on pg_class' Thanks a lot for the testing! > is an absent message 'obsolete replication slot "row_removal_activeslot"' Looking at recovery-15-vacuum_pg_class/i035_standby_logical_decoding_standby.log here: Yeah, the missing message has to come from InvalidatePossiblyObsoleteSlot(). In case of an active slot we first call ReportSlotInvalidation with the second parameter set to true (to emit the "terminating" message), then SIGTERM the active process and then (later) we should call the other ReportSlotInvalidation() call with the second parameter set to false (to emit the message that we don't see here). So it means InvalidatePossiblyObsoleteSlot() did not trigger the second ReportSlotInvalidation() call. The thing it that it looks like we exited the loop in InvalidatePossiblyObsoleteSlot() because there is more messages from the startup process (789618) after the: " 2024-01-10 11:00:29.109 UTC [789618] LOG: terminating process 790377 to release replication slot "row_removal_activeslot" " one. Do you think you could try to add more debug messages in InvalidatePossiblyObsoleteSlot() to understand why the second call to ReportSlotInvalidation() is not done and IIUC why/how we exited the loop? FWIW, I did try to reproduce by launching pg_recvlogical and then kill -SIGSTOP it. Then producing a conflict, I'm able to get the first message and not the second one (which is expected). But the startup process does not exit the loop, which is expected here. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: