Re: Inefficient barriers on solaris with sun cc
От | Andres Freund |
---|---|
Тема | Re: Inefficient barriers on solaris with sun cc |
Дата | |
Msg-id | 20141002143457.GI7158@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: Inefficient barriers on solaris with sun cc (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Inefficient barriers on solaris with sun cc
|
Список | pgsql-hackers |
On 2014-09-26 10:28:21 -0400, Robert Haas wrote: > On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa <os@ohmu.fi> wrote: > >> So you think a read barrier is the same thing as an acquire barrier > >> and a write barrier is the same as a release barrier? That would be > >> surprising. It's certainly not true in general. > > > > The above doc describes the difference: read barrier requires loads before > > the barrier to be completed before loads after the barrier - an acquire > > barrier is the same, but it also requires loads to be complete before stores > > after the barrier. > > > > Similarly write barrier requires stores before the barrier to be completed > > before stores after the barrier - a release barrier is the same, but it also > > requires loads before the barrier to be completed before stores after the > > barrier. > > > > So acquire is read + loads-before-stores and release is write + > > loads-before-stores. > > Hmm. My impression was that an acquire barrier means that loads and > stores can migrate forward across the barrier but not backward; and > that a release barrier means that loads and stores can migrate > backward across the barrier but not forward. It's actually more complex than that :( Simple things first: Oracle's definition seems pretty iron clad: http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html __machine_acq_barrier is a clear superset of __machine_r_barrier and __machine_rel_barrier is a clear superset of __machine_w_barrier And that's what we're essentially discussing, no? That said, there seems to be no reason to avoid using __machine_r/w_barrier(). But for the reason why I defined pg_read_barrier/write_barrier to __atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE): The C11/C++11 definition it's made for is hellishly hard to understand. There's very subtle differences between acquire/release operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant parts of the standards. I think it essentially guarantees the mapping we're talking about, but it's not entirely clear. The way acquire/release fences are defined is that they form a 'synchronizes-with' relationship with each other. Which would, I think, be sufficient given that without a release like operation on the other thread a read/wrie barrier isn't worth much. But there's a rub in that it requires a atomic operation involved somehere to give that guarantee. I *did* check that the emitted code on relevant architectures is sane, but that doesn't guarantee anything for the future. Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is definitely guaranteeing what we need, even if superflously heavy on some platforms. It still is significantly more efficient than __sync_synchronize() which is what was used before. I.e. it generates no code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync otherwise, although I don't know why) and similar on ia64. As a reference, relevant standard sections are: C11: 5.1.2.4 5); 7.17.4 C++11: 29.3; 1.10 Not that we can rely on those, but I think it's a good thing to orient on. > I'm actually not really sure what this means unless the barrier also > does something in and of itself. > For example, consider this: > > some stuff > CAS(&lock, 0, 1) // i am an acquire barrier > more stuff > lock = 0 // i am a release barrier > even more stuff > > If the CAS() and lock = 0 instructions were FULL barriers, then we'd > be saying that the stuff that happens in the critical section needs to > be exactly "more stuff". But if they are acquire and release > barriers, respectively, then the CPU is allowed to move "some stuff" > or "even more stuff" into the critical section; but what it can't do > is move "more stuff" out. > Now if you just have a naked acquire barrier that is not doing > anything itself, I don't really know what the semantics of that should > be. Which is why these acquire/release fences, in contrast to acquire/release operations, have more guarantees... You put your finger right onto the spot. > Say I want to appear to only change things while flag is 1, so I > write this code: > > flag = 1 > acquire barrier > things++ > release barrier > flag = 0 > > With the definition you (and Oracle) propose As written above, I don't think that applies to oracle's definition? > this won't work, because > there's nothing to keep the modification of things from being > reordered before flag = 1. What good is that? Apparently, I don't > have any idea! I hope it's a bit clearer now? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: