Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
От | Thomas Munro |
---|---|
Тема | Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables) |
Дата | |
Msg-id | CA+hUKGJrr4edcaygFd=ciH06=6W0GRbSpO0koE0J=w413-SLFg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables) (Antonin Houska <ah@cybertec.at>) |
Ответы |
Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
(Antonin Houska <ah@cybertec.at>)
|
Список | pgsql-hackers |
On Wed, Aug 16, 2023 at 11:18 PM Antonin Houska <ah@cybertec.at> wrote: > I try to understand this patch (commit 5ffb7c7750) because I use condition > variable in an extension. One particular problem occured to me, please > consider: > > ConditionVariableSleep() gets interrupted, so AbortTransaction() calls > ConditionVariableCancelSleep(), but the signal was sent in between. Shouldn't > at least AbortTransaction() and AbortSubTransaction() check the return value > of ConditionVariableCancelSleep(), and re-send the signal if needed? I wondered about that in the context of our only in-tree user of ConditionVariableSignal(), in parallel btree index creation, but since it's using the parallel executor infrastructure, any error would be propagated everywhere so all waits would be aborted. There is another place where the infrastructure cancels for you and would now eat the result: if you prepare to sleep on one CV, and then prepare to sleep on another, we''ll just cancel the first one. It think that's semantically OK: we can't really wait for two CVs at once, and if you try you'll miss signals anyway, but you must already have code to cope with that by re-checking your exit conditions. > Note that I'm just thinking about such a problem, did not try to reproduce it. Hmm. I looked for users of ConditionVariableSignal() in the usual web tools and didn't find anything, so I guess your extension is not released yet or not open source. I'm curious: what does it actually do if there is an error in a CV-wakeup-consuming backend? I guess it might be some kind of work-queue processing system... it seems inevitable that if backends are failing with errors, and you don't respond by retrying/respawning, you'll lose or significantly delay jobs/events/something anyway (imagine only slightly different timing: you consume the signal and start working on a job and then ereport, which amounts to the same thing in the end now that your transaction is rolled back?), and when you retry you'll see whatever condition was waited for anyway. But that's just me imagining what some hypothetical strawman system might look like... what does it really do? (FWIW when I worked on a couple of different work queue-like systems and tried to use ConditionVariableSignal() I eventually concluded that it was the wrong tool for the job because its wakeup order is undefined. It's actually FIFO, but I wanted LIFO so that workers have a chance to become idle and reduce the pool size, but I started to think that once you want that level of control you really want to build a bespoke wait list system, so I never got around to proposing that we consider changing that.) Our condition variables are weird. They're not associated with a lock, so we made start-of-wait non-atomic: prepare first, then return control and let the caller check its condition, then sleep. Typical user space condition variable APIs force you to acquire some kind of lock that protects the condition first, then check the condition, then atomically release-associated-lock-and-start-sleeping, so there is no data race but also no time where control is returned to the caller but the thread is on the wait list consuming signals. That choice has some pros (you can choose whatever type of lock you want to protect your condition, or none at all if you can get away with memory barriers and magic) and cons.. However, as I think Andres was getting at, having a non-atomic start-of-wait doesn't seem to require us to have a non-atomic end-of-wait and associated extra contention. So maybe we should figure out how to fix that in 17.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Следующее
От: Joe ConwayДата:
Сообщение: Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?