Re: Race conditions in 019_replslot_limit.pl

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Race conditions in 019_replslot_limit.pl
Дата
Msg-id 20220530190155.47wr3x2prdwyciah@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Race conditions in 019_replslot_limit.pl  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2022-03-27 22:37:34 -0700, Andres Freund wrote:
> On 2022-03-27 17:36:14 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I still feel like there's something off here. But that's probably not enough
> > > to keep causing failures. I'm inclined to leave the debugging in for a bit
> > > longer, but not fail the test anymore?
> > 
> > WFM.
> 
> I've done so now.

I did look over the test results a couple times since then and once more
today. There were a few cases with pretty significant numbers of iterations:

The highest is
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2022-04-07%2022%3A14%3A03
showing:
# multiple walsenders active in iteration 19

It's somewhat interesting that the worst case was just around the feature
freeze, where the load on my buildfarm animal boxes was higher than normal.


I comparison to earlier approaches, with the current in-tree approach, we
don't do anything when hitting the "problem", other than wait. Which does give
us additional information - afaics there's nothing at all indicating that some
other backend existed allowing the replication slot drop to finish.

It just looks like for reasons I still do not understand, removing a directory
and 2 files or so takes multiple seconds (at least ~36 new connections, 18
pg_usleep(100_100)), while there are no other indications of problems.

I also still don't have a theory why this suddenly started to happen.


Unless somebody has another idea, I'm planning to remove all the debugging
code added, but keep the retry based approach in 019_replslot_limit.pl, so we
don't again get all the spurious failures.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: ParseTzFile doesn't FreeFile on error
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Ignoring BRIN for HOT udpates seems broken