Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Дата
Msg-id CA+hUKGKf9nBEhMOcWUkKNkQp6miCcnM8TYNa-+aq9wpzGXCLJA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Список pgsql-hackers
On Fri, Jan 27, 2023 at 9:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> > I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14)
> > did not report any results for a couple days, and it seems it got into
> > an infinite loop in REL_11_STABLE when building hash table in a parallel
> > hashjoin, or something like that.
>
> > It seems to be progressing now, probably because I attached gdb to the
> > workers to get backtraces, which does signals etc.
>
> That reminds me of cases that I saw several times on my now-deceased
> animal florican:
>
> https://www.postgresql.org/message-id/flat/2245838.1645902425%40sss.pgh.pa.us
>
> There's clearly something rotten somewhere in there, but whether
> it's our bug or FreeBSD's isn't clear.

And if it's ours, it's possibly in latch code and not anything higher
(I mean, not in condition variables, barriers, or parallel hash join)
because I saw a similar hang in the shm_mq stuff which uses the latch
API directly.  Note that 13 switched to kqueue but still used the
self-pipe, and 14 switched to a signal event, and this hasn't been
reported in those releases or later, which makes the poll() code path
a key suspect.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Non-superuser subscription owners
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: New strategies for freezing, advancing relfrozenxid early