Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

Поиск
Список
Период
Сортировка
От Bertrand Drouvot
Тема Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Дата
Msg-id ZaFCoJ1FxBQ9jQkX@ip-10-97-1-34.eu-west-3.compute.internal
обсуждение исходный текст
Ответ на Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Alexander Lakhin <exclusion@gmail.com>)
Ответы Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi,

On Fri, Jan 12, 2024 at 02:00:01PM +0300, Alexander Lakhin wrote:
> Hi,
> 
> 12.01.2024 10:15, Bertrand Drouvot wrote:
> > 
> > For this one, the "good" news is that it looks like that we don’t see the
> > "terminating" message not followed by an "obsolete" message (so the engine
> > behaves correctly) anymore.
> > 
> > There is simply nothing related to the row_removal_activeslot at all (the catalog_xmin
> > advanced and there is no conflict).
> 
> Yes, judging from all the failures that we see now, it looks like the
> 0001-Fix-race-condition...patch works as expected.

Yeah, thanks for confirming, I'll create a dedicated hackers thread for that one.

> > Let's give v5 a try? (please apply v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> > too).
> 
> Unfortunately, I've got the failure again (please see logs attached).
> (_primary.log can confirm that I have used exactly v5 — I see no
> txid_current() calls there...)

Okay ;-( Thanks for the testing. Then I can think of:

1) Michael's proposal up-thread (means tweak the test with a retry logic, retrying
things if such a standby snapshot is found).

2) Don't report a test error for active slots in case its catalog_xmin advanced.

I'd vote for 2) as:

- this is a corner case and the vast majority of the animals don't report any
issues (means the active slot conflict detection is already well covered).

- even on the same animal it should be pretty rare to not have an active slot 
conflict detection not covered at all (and the "failing" one would be probably
moving over time).

- It may be possible that 1) ends up failing (as we'd need to put a limit on the
retry logic anyhow).

What do you think?

And BTW, looking closely at wait_until_vacuum_can_remove(), I'm not sure it's
fully correct, so I'll give it another look.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Improve the connection failure error messages
Следующее
От: Michael Banck
Дата:
Сообщение: Re: plpgsql memory leaks