Re: spinlocks on HP-UX
От | Robert Haas |
---|---|
Тема | Re: spinlocks on HP-UX |
Дата | |
Msg-id | CA+TgmobhJtJsLaDQkjMb_5=EBi=NrP4aWL4X7CTjv-rr1r5mfA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: spinlocks on HP-UX (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: spinlocks on HP-UX
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On Mon, Aug 29, 2011 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> This discussion seems to miss the fact that there are two levels of >> reordering that can happen. First, the compiler can move things >> around. Second, the CPU can move things around. > > Right, I think that's exactly the problem with the previous wording of > that comment; it doesn't address the two logical levels involved. > I've rewritten it, see what you think. > > * Another caution for users of these macros is that it is the caller's > * responsibility to ensure that the compiler doesn't re-order accesses > * to shared memory to precede the actual lock acquisition, or follow the > * lock release. Typically we handle this by using volatile-qualified > * pointers to refer to both the spinlock itself and the shared data > * structure being accessed within the spinlocked critical section. > * That fixes it because compilers are not allowed to re-order accesses > * to volatile objects relative to other such accesses. > * > * On platforms with weak memory ordering, the TAS(), TAS_SPIN(), and > * S_UNLOCK() macros must further include hardware-level memory fence > * instructions to prevent similar re-ordering at the hardware level. > * TAS() and TAS_SPIN() must guarantee that loads and stores issued after > * the macro are not executed until the lock has been obtained. Conversely, > * S_UNLOCK() must guarantee that loads and stores issued before the macro > * have been executed before the lock is released. That's definitely an improvement. I'm actually not convinced that we're entirely consistent here about what we require the semantics of acquiring and releasing a spinlock to be. For example, on x86 and x86_64, we acquire the lock using xchgb, which acts a full memory barrier. But when we release the lock, we just zero out the memory address, which is NOT a full memory barrier. Stores can't cross it, but non-dependent loads of different locations can back up over it. That's pretty close to a full barrier, but it isn't, quite. Now, I don't see why that should really cause any problem, at least for common cases like LWLockAcquire(). If the CPU prefetches the data protected by the lwlock after we know we've got the lock before we've actually released the spinlock and returned from LWLockAcquire(), that should be fine, even good (for performance). The real problem with being squiffy here is that it's not clear how weak we can make the fence instruction on weakly ordered architectures that support multiple types. Right now we're pretty conservative, but I think that may be costing us. I might be wrong; more research is needed here; but I think that we should at least start to get our head about what semantics we actually need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: