Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Дата
Msg-id 60bb34ad-a696-c43d-3f7c-1696796e86ce@gmail.com
обсуждение исходный текст
Ответ на Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hello Thomas,

31.08.2023 14:15, Thomas Munro wrote:

> We have a signal that is pending and not blocked, so I don't
> immediately know why poll() hasn't returned control.

When I worked at the Postgres Pro company, we observed a similar lockup
under rather specific conditions (we used Elbrus CPU and the specific Elbrus
compiler (lcc) based on edg).
I managed to reproduce that lockup and Anton Voloshin investigated it.
The issue was caused by the compiler optimization in WaitEventSetWait():
     waiting = true;
...
     while (returned_events == 0)
     {
...
         if (set->latch && set->latch->is_set)
         {
...
             break;
         }

In that case, compiler decided that it may place the read
"set->latch->is_set" before the write "waiting = true".
(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)
I can't provide more details for now, but maybe you could look at the binary
code generated on the target platform to confirm or reject my guess.

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node
Следующее
От: Krishnakumar R
Дата:
Сообщение: Move bki file pre-processing from initdb to bootstrap