Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

Поиск
Список
Период
Сортировка
От Bertrand Drouvot
Тема Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Дата
Msg-id ZaDnIMTbT+0yR5LZ@ip-10-97-1-34.eu-west-3.compute.internal
обсуждение исходный текст
Ответ на Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
Hi,

On Fri, Jan 12, 2024 at 07:01:55AM +0900, Michael Paquier wrote:
> On Thu, Jan 11, 2024 at 11:00:01PM +0300, Alexander Lakhin wrote:
> > Bertrand, I've relaunched tests in the same slowed down VM with both
> > patches applied (but with no other modifications) and got a failure
> > with pg_class, similar to what we had seen before:
> > 9       #   Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
> > 9       #   at t/035_standby_logical_decoding.pl line 230.
> > 
> > Please look at the logs attached (I see there Standby/RUNNING_XACTS near
> > 'invalidating obsolete replication slot "row_removal_inactiveslot"').

Thanks! 

For this one, the "good" news is that it looks like that we don’t see the
"terminating" message not followed by an "obsolete" message (so the engine
behaves correctly) anymore.

There is simply nothing related to the row_removal_activeslot at all (the catalog_xmin
advanced and there is no conflict).

And I agree that this is due to the Standby/RUNNING_XACTS that is "advancing" the
catalog_xmin of the active slot.

> Standby/RUNNING_XACTS is exactly why 039_end_of_wal.pl uses wal_level
> = minimal, because these lead to unpredictible records inserted,
> impacting the reliability of the tests.  We cannot do that here,
> obviously.  That may be a long shot, but could it be possible to tweak
> the test with a retry logic, retrying things if such a standby
> snapshot is found because we know that the invalidation is not going
> to work anyway?

I think it all depends what the xl_running_xacts does contain (means does it
"advance" or not the catalog_xmin in our case).

In our case it does advance it (should it occurs) due to the "select txid_current()"
that is done in wait_until_vacuum_can_remove() in 035_standby_logical_decoding.pl.

I suggest to make use of txid_current_snapshot() instead (that does not produce
a Transaction/COMMIT wal record, as opposed to txid_current()).

I think that it could be "enough" for our case here, and it's what v5 attached is
now doing.

Let's give v5 a try? (please apply v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
too).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum