Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

Поиск
Список
Период
Сортировка
От Bertrand Drouvot
Тема Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Дата
Msg-id ZZ7GpII4bAYN+jT5@ip-10-97-1-34.eu-west-3.compute.internal
обсуждение исходный текст
Ответ на Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
Hi,

On Wed, Jan 10, 2024 at 05:00:01PM +0300, Alexander Lakhin wrote:
> 10.01.2024 12:46, Bertrand Drouvot wrote:
> 
> > Would it be possible to also send the standby logs?
> 
> Yes, please look at the attached logs. This time I've build postgres with
> -DWAL_DEBUG and ran tests with TEMP_CONFIG as below:
> wal_keep_size=1GB
> wal_debug = on
> log_autovacuum_min_duration = 0
> log_statement = 'all'
> log_min_messages = INFO
> 
> The archive attached contains logs from four runs:
> recovery-1-ok -- an example of successful run for reference
> recovery-7-pruning and recovery-19-pruning -- failures with a failed
>  subtest 'activeslot slot invalidation is logged with on-access pruning'
> recovery-15-vacuum_pg_class -- a failure with a failed
>  subtest 'activeslot slot invalidation is logged with vacuum on pg_class'

Thanks a lot for the testing!

> is an absent message 'obsolete replication slot "row_removal_activeslot"'

Looking at recovery-15-vacuum_pg_class/i035_standby_logical_decoding_standby.log here:

Yeah, the missing message has to come from InvalidatePossiblyObsoleteSlot().

In case of an active slot we first call ReportSlotInvalidation with the second
parameter set to true (to emit the "terminating" message), then SIGTERM the active
process and then (later) we should call the other ReportSlotInvalidation()
call with the second parameter set to false (to emit the message that we don't
see here).

So it means InvalidatePossiblyObsoleteSlot() did not trigger the second ReportSlotInvalidation()
call. 

The thing it that it looks like we exited the loop in InvalidatePossiblyObsoleteSlot()
because there is more messages from the startup process (789618) after the:


"
2024-01-10 11:00:29.109 UTC [789618] LOG:  terminating process 790377 to release replication slot
"row_removal_activeslot"
"

one.

Do you think you could try to add more debug messages in InvalidatePossiblyObsoleteSlot()
to understand why the second call to ReportSlotInvalidation() is not done and IIUC
why/how we exited the loop?

FWIW, I did try to reproduce by launching pg_recvlogical and then kill -SIGSTOP
it. Then producing a conflict, I'm able to get the first message and not the second
one (which is expected). But the startup process does not exit the loop, which is
expected here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Relation bulk write facility
Следующее
От: Bernd Helmle
Дата:
Сообщение: Re: [PATCH] Add sortsupport for range types and btree_gist