Re: xlc atomics

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: xlc atomics
Дата
Msg-id 20160424015407.GD2084318@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: xlc atomics  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: xlc atomics  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, Feb 15, 2016 at 02:02:41PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > That commit (0d32d2e) permitted things to compile and usually pass tests, but
> > I missed the synchronization bug.  Since 2015-10-01, the buildfarm has seen
> > sixteen duplicate-catalog-OID failures.
>
> I'd been wondering about those ...
>
> > These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
> > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> > 5765-J08)" for ppc64le.  The problem is generic-xlc.h
> > pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> > __compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
> > own s_lock.h, its references, and other projects' usage:
>
> Nice catch!
>
> > This patch's test case would have failed about half the time under today's
> > generic-xlc.h.  Fast machines run it in about 1s.  A test that detects the bug
> > 99% of the time would run far longer, hence this compromise.
>
> Sounds like a reasonable compromise to me, although I wonder about the
> value of it if we stick it into pgbench's TAP tests.  How many of the
> slower buildfarm members are running the TAP tests?  Certainly mine are
> not.

I missed a second synchronization bug in generic-xlc.h, but the pgbench test
suite caught it promptly after commit 008608b.  Buildfarm members mandrill and
hornet failed[1] or deadlocked within two runs.  The bug is that the
pg_atomic_compare_exchange_*() specifications grant "full barrier semantics",
but generic-xlc.h provided only the semantics of an acquire barrier.  Commit
008608b exposed that older problem; its LWLockWaitListUnlock() relies on
pg_atomic_compare_exchange_u32() (via pg_atomic_fetch_and_u32()) for release
barrier semantics.

The fix is to issue "sync" before each compare-and-swap, in addition to the
"isync" after it.  This is consistent with the corresponding GCC atomics.  The
pgbench test suite survived dozens of runs so patched.

[1] http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-04-11%2003%3A14%3A13

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: pg_dump dump catalog ACLs
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.