On Mon, Feb 15, 2016 at 02:02:41PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > That commit (0d32d2e) permitted things to compile and usually pass tests, but
> > I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen
> > sixteen duplicate-catalog-OID failures.
>
> I'd been wondering about those ...
>
> > These suggested OidGenLock wasn't doing its job. I've seen similar symptoms
> > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> > 5765-J08)" for ppc64le. The problem is generic-xlc.h
> > pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> > __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our
> > own s_lock.h, its references, and other projects' usage:
>
> Nice catch!
>
> > This patch's test case would have failed about half the time under today's
> > generic-xlc.h. Fast machines run it in about 1s. A test that detects the bug
> > 99% of the time would run far longer, hence this compromise.
>
> Sounds like a reasonable compromise to me, although I wonder about the
> value of it if we stick it into pgbench's TAP tests. How many of the
> slower buildfarm members are running the TAP tests? Certainly mine are
> not.
I missed a second synchronization bug in generic-xlc.h, but the pgbench test
suite caught it promptly after commit 008608b. Buildfarm members mandrill and
hornet failed[1] or deadlocked within two runs. The bug is that the
pg_atomic_compare_exchange_*() specifications grant "full barrier semantics",
but generic-xlc.h provided only the semantics of an acquire barrier. Commit
008608b exposed that older problem; its LWLockWaitListUnlock() relies on
pg_atomic_compare_exchange_u32() (via pg_atomic_fetch_and_u32()) for release
barrier semantics.
The fix is to issue "sync" before each compare-and-swap, in addition to the
"isync" after it. This is consistent with the corresponding GCC atomics. The
pgbench test suite survived dozens of runs so patched.
[1] http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-04-11%2003%3A14%3A13