Re: Issue with the PRNG used by Postgres

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Issue with the PRNG used by Postgres
Дата
Msg-id 20240410190821.yhquanxyhpqtkett@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2024-04-10 14:02:20 -0400, Tom Lane wrote:
> On third thought ... while I still think this is a misuse of
> perform_spin_delay and we should change it, I'm not sure it'll do
> anything to address Parag's problem, because IIUC he's seeing actual
> "stuck spinlock" reports.  That implies that the inner loop of
> LWLockWaitListLock slept NUM_DELAYS times without ever seeing
> LW_FLAG_LOCKED clear.  What I'm suggesting would change the triggering
> condition to "NUM_DELAYS sleeps without acquiring the lock", which is
> strictly more likely to happen, so it's not going to help him.  It's
> certainly still well out in we-shouldn't-get-there territory, though.

I think it could exascerbate the issue. Parag reported ~7k connections on a
128 core machine. The buffer replacement logic in < 16 tries to lock the old
and new lock partitions at once. That can lead to quite bad "chains" of
dependent lwlocks, occasionally putting all the pressure on a single lwlock.
With 7k waiters on a single spinlock, higher frequency of wakeups will make it
much more likely that the process holding the spinlock will be put to sleep.

This is greatly exacerbated by the issue fixed in a4adc31f690, once the
waitqueue is long, the spinlock will be held for an extended amount of time.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: broken JIT support on Fedora 40
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Table AM Interface Enhancements