Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Дата
Msg-id EE0DD308-3A11-47E7-99FA-1F41C29DE448@phlo.org
обсуждение исходный текст
Ответ на Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On Feb14, 2014, at 13:36 , Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
>>> I don't think that can actually happen because the head of the wait list
>>> isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
>>> for a while...
>>
>> Hm, true, but does that protect us under all circumstances? If another
>> backend grabs the lock before B gets a chance to do so, B will become the
>> wait queue's head, and anyone who enqueues behind B will do so by updating
>> B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
>> by the original lock holder.
>>
>> The specific sequence I have in mind is:
>>
>> A: Take lock
>> B: Enqueue
>> A: Release lock, set B's lwWaiting to false
>> D: Acquire lock
>> B: Enqueue after spurious wakeup
>>   (lock->head := B)
>> C: Enqueue behind B
>>   (B->lwWaitLink := C, lock->tail := C)
>> A: Set B's wWaitLink back to NULL, thereby corrupting the queue
>>   (B->lwWaitLink := NULL)
>> D: Release lock, dequeue and wakeup B
>>   (lock->head := B->lwWaitLink, i.e. lock->head := NULL)
>> B: Take and release lock, queue appears empty!
>>   C blocks indefinitely.
>
> That's assuming either reordering by the compiler or an out-of-order
> store architecture, right?

Yes, it requires that a backend exits out of the PGSemaphoreLock loop in
LWLockAcquire before lwWaitLink has been reset to NULL by the previous lock
holder's LWLockRelease. Only if that happens there is a risk of the PGPROC
being on a wait queue by the time lwWaitLink is finally reset to NULL.

>>>> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
>>>> We take the backends we awake off the queue by updating the queue's head and
>>>> tail, so the contents of lwWaitLink should only matter once the backend is
>>>> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
>>>> back to NULL anway.
>>>
>>> I don't like that, this stuff is hard to debug already, having stale
>>> pointers around doesn't help. I think we should just add the barrier in
>>> the releases with barrier.h and locally use a volatile var in the
>>> branches before that.
>>
>> I like the idea of updating lwWaiting and lwWaitLink while still holding the
>> spinlock better. The costs are probably negligible, and it'd make the code much
>> easier to reason about.
>
> I agree we should do that, but imo not in the backbranches. Anything
> more than than the minimal fix in that code should be avoided in the
> stable branches, this stuff is friggin performance sensitive, and the
> spinlock already is a *major* contention point in many workloads.

No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h? AFAICS the only other choices we have on
these branches are to either ignore the bug - it's probably *extremely* unlikely
- or to use a spinlock acquire/release cycle to create a barrier. The former
leaves me with a bit of an uneasy feeling, and the latter quite certainly has
a larger performance impact than moving the PGPROC updates within the section
protected by the spinlock and using an array to remember the backends to wakeup.

best regards,
Florian Pflug




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: issue with gininsert under very high load
Следующее
От: Greg Stark
Дата:
Сообщение: Re: walsender doesn't send keepalives when writes are pending