Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Дата
Msg-id 27512.1312821572@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Any protocol of that sort has *obviously* got a race condition between
>> changes of the latch state and changes of the separate flag state, if run
>> in a weak-memory-ordering machine.

> On the flip side, I'm not sure that anyone ever really expected that a
> latch could be safely used this way.  Normally, I'd expect the flag to
> be some piece of state protected by an LWLock, and I think that ought
> to be OK provided that the lwlock is released before setting or
> checking the latch.  Maybe we should try to document the contract here
> a little better; I think it's just that there must be a full memory
> barrier (such as LWLockRelease) in both processes involved in the
> interaction.

Heikki argued the same thing in
http://archives.postgresql.org/message-id/4CEA5A0F.1030602@enterprisedb.com
but I don't think anyone has actually done the legwork to verify that
current uses are safe.  So [ ... warms up grep ... ]

Currrent callers of WaitLatch(OrSocket):

XLogPageRead waits on XLogCtl->recoveryWakeupLatch
latch is set by WakeupRecovery, which is called by process'own signal handlers (OK) and XLogWalRcvFlush.  The latter is
OKbecausethere's a spinlock protecting the transmitted data.
 

pgarch_MainLoop waits on mainloop_latch
non-issue because latch is process-local

WalSndLoop waits on MyWalSnd->latch
latch is set by signal handlers and WalSndWakeup.  The latter isOK because it's called right after XLogFlush (which
certainlylockswhatever it touches) or a spinlock release.
 

SyncRepWaitForLSN waits on MyProc->waitLatch
latch is set by signal handlers and SyncRepWakeQueue.  Thelatter appears unsafe at first glance, because it sets
thesharedvariable (thisproc->syncRepState) and immediatelysets the latch.  However, the receiver does this curious
dance:
    syncRepState = MyProc->syncRepState;    if (syncRepState == SYNC_REP_WAITING)    {
LWLockAcquire(SyncRepLock,LW_SHARED);        syncRepState = MyProc->syncRepState;        LWLockRelease(SyncRepLock);
}
which I believe makes it safe, though rather underdocumented;if a race does occur, the receiver will acquire the lock
andrecheck,and the lock acquisition will block until the caller ofSyncRepWakeQueue has released SyncRepLock, and that
releasewillflush any pending writes to shared memory.
 

Conclusions:

(1) 9.1 does not have a problem of this sort.

(2) Breathing hard on any of this code could break it.

(3) I'm not going to hold still for not inserting a memory barrier
into SetLatch, once we have the code.  Unsubstantiated performance
worries are not a sufficient argument not to --- as a wise man once
said, you can make the code arbitrarily fast if it doesn't have to
give the right answer.

(4) In the meantime, we'd better document that it's caller's
responsibility to ensure that the flag variable is adequately
protected by locking.  I think SyncRepWaitForLSN could use a warning
comment, too.  I will go do that.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [RFC] Common object property boards
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs