Re: Inefficient barriers on solaris with sun cc

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Inefficient barriers on solaris with sun cc
Дата
Msg-id 20141002143457.GI7158@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Inefficient barriers on solaris with sun cc  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Inefficient barriers on solaris with sun cc
Список pgsql-hackers
On 2014-09-26 10:28:21 -0400, Robert Haas wrote:
> On Fri, Sep 26, 2014 at 8:55 AM, Oskari Saarenmaa <os@ohmu.fi> wrote:
> >> So you think a read barrier is the same thing as an acquire barrier
> >> and a write barrier is the same as a release barrier?  That would be
> >> surprising.  It's certainly not true in general.
> >
> > The above doc describes the difference: read barrier requires loads before
> > the barrier to be completed before loads after the barrier - an acquire
> > barrier is the same, but it also requires loads to be complete before stores
> > after the barrier.
> >
> > Similarly write barrier requires stores before the barrier to be completed
> > before stores after the barrier - a release barrier is the same, but it also
> > requires loads before the barrier to be completed before stores after the
> > barrier.
> >
> > So acquire is read + loads-before-stores and release is write +
> > loads-before-stores.
> 
> Hmm.  My impression was that an acquire barrier means that loads and
> stores can migrate forward across the barrier but not backward; and
> that a release barrier means that loads and stores can migrate
> backward across the barrier but not forward.

It's actually more complex than that :(

Simple things first:

Oracle's definition seems pretty iron clad:
http://docs.oracle.com/cd/E18659_01/html/821-1383/gjzmf.html
__machine_acq_barrier is a clear superset of __machine_r_barrier and
__machine_rel_barrier is a clear superset of __machine_w_barrier

And that's what we're essentially discussing, no? That said, there seems
to be no reason to avoid using __machine_r/w_barrier().


But for the reason why I defined pg_read_barrier/write_barrier to
__atomic_thread_fence(__ATOMIC_ACQUIRE/RELEASE):

The C11/C++11 definition it's made for is hellishly hard to
understand. There's very subtle differences between acquire/release
operation and acquire/release fences. 29.8.2/7.17.4 seems to be the relevant
parts of the standards. I think it essentially guarantees the mapping
we're talking about, but it's not entirely clear.

The way acquire/release fences are defined is that they form a
'synchronizes-with' relationship with each other. Which would, I think,
be sufficient given that without a release like operation on the other
thread a read/wrie barrier isn't worth much. But there's a rub in that
it requires a atomic operation involved somehere to give that guarantee.

I *did* check that the emitted code on relevant architectures is sane,
but that doesn't guarantee anything for the future.

Therefore I'm proposing to replace it with __ATOMIC_ACQ_REL which is
definitely guaranteeing what we need, even if superflously heavy on some
platforms. It still is significantly more efficient than
__sync_synchronize() which is what was used before. I.e. it generates no
code on x86 (MFENCE otherwise), and only a lwsync on PPC (hwsync
otherwise, although I don't know why) and similar on ia64.

As a reference, relevant standard sections are:
C11: 5.1.2.4 5); 7.17.4
C++11: 29.3; 1.10
Not that we can rely on those, but I think it's a good thing to orient on.

> I'm actually not really sure what this means unless the barrier also
> does something in and of itself.

> For example, consider this:
> 
> some stuff
> CAS(&lock, 0, 1) // i am an acquire barrier
> more stuff
> lock = 0 // i am a release barrier
> even more stuff
> 
> If the CAS() and lock = 0 instructions were FULL barriers, then we'd
> be saying that the stuff that happens in the critical section needs to
> be exactly "more stuff".  But if they are acquire and release
> barriers, respectively, then the CPU is allowed to move "some stuff"
> or "even more stuff" into the critical section; but what it can't do
> is move "more stuff" out.

> Now if you just have a naked acquire barrier that is not doing
> anything itself, I don't really know what the semantics of that should
> be.

Which is why these acquire/release fences, in contrast to
acquire/release operations, have more guarantees... You put your finger
right onto the spot.

> Say I want to appear to only change things while flag is 1, so I
> write this code:
> 
> flag = 1
> acquire barrier
> things++
> release barrier
> flag = 0
> 
> With the definition you (and Oracle) propose

As written above, I don't think that applies to oracle's definition?

> this won't work, because
> there's nothing to keep the modification of things from being
> reordered before flag = 1.  What good is that?  Apparently, I don't
> have any idea!

I hope it's a bit clearer now?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kohei KaiGai
Дата:
Сообщение: "port/atomics/arch-*.h" are missing from installation
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Scaling shared buffer eviction