Re: Issue with the PRNG used by Postgres

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Issue with the PRNG used by Postgres
Дата
Msg-id 20240412183317.m7ick24374mcj6do@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Issue with the PRNG used by Postgres  (Andres Freund <andres@anarazel.de>)
Ответы Re: Issue with the PRNG used by Postgres  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

Given that I found several ways to spuriously trigger the stuck spinlock
logic, I think we need to backpatch a fix.


On 2024-04-11 21:41:39 -0700, Andres Freund wrote:
> Tracking the total amount of time spent sleeping doesn't require any
> additional syscalls, due to the nanosleep()...

I did quickly implement this, and it works nicely on unix. Unfortunately I
couldn't find something similar for windows. Which would mean we'd need to
record the time before every sleep.

Another issue with that approach is that we'd need to add a field to
SpinDelayStatus, which'd be an ABI break. Not sure there's any external code
using it, but if there were, it'd be a fairly nasty, rarely hit, path.

Afaict there's no padding that could be reused on 32 bit platforms [1].


I wonder if, for easy backpatching, the easiest solution is to just reset
errno before calling pg_usleep(), and only increment status->delays if
errno != EINTR. Given our pg_usleep() implementations, that'd preven the stuck
spinlock logic from triggering too fast.


If the ABI issue weren't there, we could record the current time the first
time we sleep during a spinlock acquisition (there's an existing branch for
that) and then only trigger the stuck spinlock detection when enough time has
passed.  That'd be fine from an efficiency POV.


Even if we decide to do so, I don't think it'd be a good idea to remove the
stuck logic alltogether in the backbranches. That might take out services that
have been running ok-ish for a long time.

Greetings,

Andres Freund


[1] But we are wasting a bunch of space on 64bit platforms due to switching
between pointers and 32bit integers, should probably fix that.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: using extended statistics to improve join estimates
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: allow changing autovacuum_max_workers without restarting