Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
Дата
Msg-id CA+hUKGJrr4edcaygFd=ciH06=6W0GRbSpO0koE0J=w413-SLFg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)  (Antonin Houska <ah@cybertec.at>)
Ответы Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)  (Antonin Houska <ah@cybertec.at>)
Список pgsql-hackers
On Wed, Aug 16, 2023 at 11:18 PM Antonin Houska <ah@cybertec.at> wrote:
> I try to understand this patch (commit 5ffb7c7750) because I use condition
> variable in an extension. One particular problem occured to me, please
> consider:
>
> ConditionVariableSleep() gets interrupted, so AbortTransaction() calls
> ConditionVariableCancelSleep(), but the signal was sent in between. Shouldn't
> at least AbortTransaction() and AbortSubTransaction() check the return value
> of ConditionVariableCancelSleep(), and re-send the signal if needed?

I wondered about that in the context of our only in-tree user of
ConditionVariableSignal(), in parallel btree index creation, but since
it's using the parallel executor infrastructure, any error would be
propagated everywhere so all waits would be aborted.  There is another
place where the infrastructure cancels for you and would now eat the
result: if you prepare to sleep on one CV, and then prepare to sleep
on another, we''ll just cancel the first one.  It think that's
semantically OK: we can't really wait for two CVs at once, and if you
try you'll miss signals anyway, but you must already have code to cope
with that by re-checking your exit conditions.

> Note that I'm just thinking about such a problem, did not try to reproduce it.

Hmm.  I looked for users of ConditionVariableSignal() in the usual web
tools and didn't find anything, so I guess your extension is not
released yet or not open source.  I'm curious: what does it actually
do if there is an error in a CV-wakeup-consuming backend?  I guess it
might be some kind of work-queue processing system... it seems
inevitable that if backends are failing with errors, and you don't
respond by retrying/respawning, you'll lose or significantly delay
jobs/events/something anyway (imagine only slightly different timing:
you consume the signal and start working on a job and then ereport,
which amounts to the same thing in the end now that your transaction
is rolled back?), and when you retry you'll see whatever condition was
waited for anyway.  But that's just me imagining what some
hypothetical strawman system might look like...  what does it really
do?

(FWIW when I worked on a couple of different work queue-like systems
and tried to use ConditionVariableSignal() I eventually concluded that
it was the wrong tool for the job because its wakeup order is
undefined.  It's actually FIFO, but I wanted LIFO so that workers have
a chance to become idle and reduce the pool size, but I started to
think that once you want that level of control you really want to
build a bespoke wait list system, so I never got around to proposing
that we consider changing that.)

Our condition variables are weird.  They're not associated with a
lock, so we made start-of-wait non-atomic: prepare first, then return
control and let the caller check its condition, then sleep.  Typical
user space condition variable APIs force you to acquire some kind of
lock that protects the condition first, then check the condition, then
atomically release-associated-lock-and-start-sleeping, so there is no
data race but also no time where control is returned to the caller but
the thread is on the wait list consuming signals.  That choice has
some pros (you can choose whatever type of lock you want to protect
your condition, or none at all if you can get away with memory
barriers and magic) and cons..  However, as I think Andres was getting
at, having a non-atomic start-of-wait doesn't seem to require us to
have a non-atomic end-of-wait and associated extra contention.  So
maybe we should figure out how to fix that in 17.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Следующее
От: Joe Conway
Дата:
Сообщение: Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?