Latches and barriers
От | Andres Freund |
---|---|
Тема | Latches and barriers |
Дата | |
Msg-id | 20150112154026.GB2092@awork2.anarazel.de обсуждение исходный текст |
Ответы |
Re: Latches and barriers
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
Hi, latch.h has the following comment: * Presently, when using a shared latch for interprocess signalling, the* flag variable(s) set by senders and inspected bythe wait loop must* be protected by spinlocks or LWLocks, else it is possible to miss events* on machines with weak memoryordering (such as PPC). This restriction* will be lifted in future by inserting suitable memory barriers into* SetLatchand ResetLatch. and unix_latch.c has: SetLatch(volatile Latch *latch) {pid_t owner_pid; /* * XXX there really ought to be a memory barrier operation right here, to * ensure that any flag variables we might havechanged get flushed to * main memory before we check/set is_set. Without that, we have to * require that callers providetheir own synchronization for machines * with weak memory ordering (see latch.h). *//* Quick exit if already set */if(latch->is_set) return; ... void ResetLatch(volatile Latch *latch) {/* Only the owner should reset the latch */Assert(latch->owner_pid == MyProcPid); latch->is_set = false; /* * XXX there really ought to be a memory barrier operation right here, to * ensure that the write to is_set gets flushedto main memory before we * examine any flag variables. Otherwise a concurrent SetLatch might * falsely conclude thatit needn't signal us, even though we have missed * seeing some flag updates that SetLatch was supposed to inform us of.* For the moment, callers must supply their own synchronization of flag * variables (see latch.h). */ } Triggered by another thread I converted proc.c and lwlock.c to use latches for blocking. Which worked fine on my laptop, but failed miserably, often within less than a second, on my 2 socket x86 workstation. After a fair amount of headscratching I figured out that it's indeed those missing barriers. Adding them made it work. Thinking about it, it's not too surprising. PGPROC's lwWaiting and procLatch aren't at the same address (more specifically on a different cacheline). X86 allows reordering of loads with stores to different addresses. That's what happening here. While it might not be required for existing latch uses (I'm *not* sure that's true), I still think that we should fix those XXX by actually using barriers now that we have them. I don't think we want every callsite worry about using barriers. Agreed? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: