Re: xlc atomics

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: xlc atomics
Дата
Msg-id 20160215171129.GA347322@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: xlc atomics  (Noah Misch <noah@leadboat.com>)
Ответы Re: xlc atomics  (Andres Freund <andres@anarazel.de>)
Re: xlc atomics  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sat, Jul 04, 2015 at 08:07:49PM -0400, Noah Misch wrote:
> On Sun, Jul 05, 2015 at 12:54:43AM +0200, Andres Freund wrote:
> > On 2015-07-04 18:40:41 -0400, Noah Misch wrote:
> > > (1) "IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)".  Getting it working
> > > required the attached patch.
>
> > Will you apply? Having the ability to test change seems to put you in a
> > much better spot then me.
>
> I will.

That commit (0d32d2e) permitted things to compile and usually pass tests, but
I missed the synchronization bug.  Since 2015-10-01, the buildfarm has seen
sixteen duplicate-catalog-OID failures.  The compiler has always been xlC, and
the branch has been HEAD or 9.5.  Examples:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-15%2004%3A12%3A49
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-08%2001%3A20%3A16
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2015-12-11%2005%3A25%3A54

These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
5765-J08)" for ppc64le.  The problem is generic-xlc.h
pg_atomic_compare_exchange_u32_impl() issuing __isync() before
__compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
own s_lock.h, its references, and other projects' usage:


https://github.com/boostorg/smart_ptr/blob/boost-1.60.0/include/boost/smart_ptr/detail/sp_counted_base_vacpp_ppc.hpp#L55

http://redmine.gromacs.org/projects/gromacs/repository/entry/src/external/thread_mpi/include/thread_mpi/atomic/xlc_ppc.h?rev=v5.1.2#L112
https://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/asm_example.html

This patch's test case would have failed about half the time under today's
generic-xlc.h.  Fast machines run it in about 1s.  A test that detects the bug
99% of the time would run far longer, hence this compromise.


Archaeology enthusiasts may enjoy the bug's changing presentation over time.
LWLockAcquire() exposed it after commit ab5194e redesigned lwlock.c in terms
of atomics.  Assert builds were unaffected for commits in [ab5194e, bbfd7ed);
atomics.h asserts fattened code enough to mask the bug.  However, removing the
AssertPointerAlignment() from pg_atomic_fetch_or_u32() would have sufficed to
surface it in assert builds.  All builds demonstrated the bug once bbfd7ed
introduced xlC pg_attribute_noreturn(), which elicits a simpler instruction
sequence around the ExceptionalCondition() call embedded in each Assert.

nm

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [WIP] ALTER ... OWNER TO ... CASCADE
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Existence check for suitable index in advance when concurrently refreshing.