Обсуждение: postgres has no spinlock support on riscv rv64imafdc

Поиск
Список
Период
Сортировка

postgres has no spinlock support on riscv rv64imafdc

От
Robert Henry
Дата:
Running ona sifive hyifive unleashed riscv board with uname -a of:
   Linux freedom-u540 5.2.9 #1 SMP Tue Oct 15 19:54:42 UTC 2019 riscv64 riscv64 riscv64 GNU/Linux
where each processor is:
   processor : 0
hart : 1
isa : rv64imafdc
mmu : sv39
uarch : sifive,u54-mc

Building Postgres from scratch on the riscv board, no fancy cross compilation, I get:
   
PostgreSQL does not have native spinlock support on this platform.  To continue the compilation, rerun configure using --disable-spinlocks.  However, performance will be poor.  Please report this to pgsql-bugs@lists.postgresql.org.

Re: postgres has no spinlock support on riscv rv64imafdc

От
Tom Lane
Дата:
Robert Henry <rrh.henry@gmail.com> writes:
> Building Postgres from scratch on the riscv board, no fancy cross
> compilation, I get:

> PostgreSQL does not have native spinlock support on this platform.  To
> continue the compilation, rerun configure using --disable-spinlocks.
> However, performance will be poor.  Please report this to
> pgsql-bugs@lists.postgresql.org.

The last we heard about RISC-V was this thread:

https://www.postgresql.org/message-id/flat/20161119230312.GA8656%40redhat.com

If things seem to work well now with --disable-spinlocks, maybe you
could test the patch suggested in that thread.  TBH, though, my preference
would be for some assembly code rather than relying on GCC builtins
as Richard's patch did.

            regards, tom lane



Re: postgres has no spinlock support on riscv rv64imafdc

От
Andres Freund
Дата:
Hi,

On 2019-10-18 09:00:23 +0200, Tom Lane wrote:
> TBH, though, my preference would be for some assembly code rather than
> relying on GCC builtins as Richard's patch did.

-1. I think there's good arguments for using inline assembly on
platforms where we've done so historically, and where we have to support
old compilers without good support for intrinsics/builtins. But I see
very little reason for adding more such cases for newer platforms -
doing so correctly and efficiently is substantial work and fragile.

Greetings,

Andres Freund



Re: postgres has no spinlock support on riscv rv64imafdc

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-10-18 09:00:23 +0200, Tom Lane wrote:
>> TBH, though, my preference would be for some assembly code rather than
>> relying on GCC builtins as Richard's patch did.

> -1. I think there's good arguments for using inline assembly on
> platforms where we've done so historically, and where we have to support
> old compilers without good support for intrinsics/builtins. But I see
> very little reason for adding more such cases for newer platforms -
> doing so correctly and efficiently is substantial work and fragile.

The reason I'm skeptical of that line of argument is that gcc's track
record for support of these intrinsics on non-mainstream architectures
is just sucky.  Now maybe, somebody was careful and it all works great
on RISC-V.  But IMO, the burden of proof is to show that the intrinsics
work, not to show that they don't.

I recall Noah's recent argument in a related context that with an
asm implementation, anybody with a copy of the architecture manual
can review/verify the code; and such a verification doesn't depend
on which compiler version you're using.  If we depend on gcc intrinsics,
we've basically got zero confidence about anything except from testing.

Yeah, you could make a similar argument about any sort of compiler
bug ... but gcc has *earned* my lack of trust in this particular area.

            regards, tom lane



Re: postgres has no spinlock support on riscv rv64imafdc

От
Andres Freund
Дата:
Hi,

On 2019-10-20 00:23:07 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-10-18 09:00:23 +0200, Tom Lane wrote:
> >> TBH, though, my preference would be for some assembly code rather than
> >> relying on GCC builtins as Richard's patch did.
> 
> > -1. I think there's good arguments for using inline assembly on
> > platforms where we've done so historically, and where we have to support
> > old compilers without good support for intrinsics/builtins. But I see
> > very little reason for adding more such cases for newer platforms -
> > doing so correctly and efficiently is substantial work and fragile.
> 
> The reason I'm skeptical of that line of argument is that gcc's track
> record for support of these intrinsics on non-mainstream architectures
> is just sucky.  Now maybe, somebody was careful and it all works great
> on RISC-V.  But IMO, the burden of proof is to show that the intrinsics
> work, not to show that they don't.

I agree there've been problems. But I don't think one can make a lot of
conclusions from the intrinsics quality for dying platforms when judging
new platforms. Most if not all new platforms with a gcc port that PG
would be run on are going to be multi-core platforms - if intrinsics are
broken, there'll be more problems. Furthermore, we're fully exposed to
the intrinsics quality due to atomics anyway - a lot more even, because
there's a shrinking amount of contended spinlocks, and a lot more
contended lwlocks etc (which use atomics).  And lastly, our spinlock
implementations have been far from perfect too.

Greetings,

Andres Freund



Re: postgres has no spinlock support on riscv rv64imafdc

От
Merlin Moncure
Дата:
On Saturday, October 19, 2019, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
> On 2019-10-18 09:00:23 +0200, Tom Lane wrote:
>> TBH, though, my preference would be for some assembly code rather than
>> relying on GCC builtins as Richard's patch did.

> -1. I think there's good arguments for using inline assembly on
> platforms where we've done so historically, and where we have to support
> old compilers without good support for intrinsics/builtins. But I see
> very little reason for adding more such cases for newer platforms -
> doing so correctly and efficiently is substantial work and fragile.

The reason I'm skeptical of that line of argument is that gcc's track
record for support of these intrinsics on non-mainstream architectures
is just sucky.  Now maybe, somebody was careful and it all works great
on RISC-V.  But IMO, the burden of proof is to show that the intrinsics
work, not to show that they don't.

I recall Noah's recent argument in a related context that with an
asm implementation, anybody with a copy of the architecture manual
can review/verify the code; and such a verification doesn't depend
on which compiler version you're using.  If we depend on gcc intrinsics,
we've basically got zero confidence about anything except from testing.

Do our regression tests exercise the quality of the intrinsic implementation? 

merlin

Re: postgres has no spinlock support on riscv rv64imafdc

От
Tom Lane
Дата:
Merlin Moncure <mmoncure@gmail.com> writes:
> On Saturday, October 19, 2019, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason I'm skeptical of that line of argument is that gcc's track
>> record for support of these intrinsics on non-mainstream architectures
>> is just sucky.  Now maybe, somebody was careful and it all works great
>> on RISC-V.  But IMO, the burden of proof is to show that the intrinsics
>> work, not to show that they don't.

> Do our regression tests exercise the quality of the intrinsic
> implementation?

Only indirectly, to the extent that a broken intrinsic might result
in misbehavior of concurrent operations.  If you dig through the git
history for s_lock.h and the atomics files, you can find plenty of
evidence that it can take awhile for us to notice bugs in this area.

The git history is pretty illuminating, actually (he says having
just searched through it).  There is plenty of evidence there that
both the manually-written-asm and the compiler-intrinsics approaches
can have nasty, hard-to-pin-down bugs.  The thing that makes me
prefer the former is that once we do identify a bug, we can fix it.

            regards, tom lane



Re: postgres has no spinlock support on riscv rv64imafdc

От
Thomas Munro
Дата:
On Fri, Dec 13, 2019 at 10:59 AM Andres Freund <andres@anarazel.de> wrote:
> On 2019-10-20 00:23:07 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2019-10-18 09:00:23 +0200, Tom Lane wrote:
> > >> TBH, though, my preference would be for some assembly code rather than
> > >> relying on GCC builtins as Richard's patch did.
> >
> > > -1. I think there's good arguments for using inline assembly on
> > > platforms where we've done so historically, and where we have to support
> > > old compilers without good support for intrinsics/builtins. But I see
> > > very little reason for adding more such cases for newer platforms -
> > > doing so correctly and efficiently is substantial work and fragile.
> >
> > The reason I'm skeptical of that line of argument is that gcc's track
> > record for support of these intrinsics on non-mainstream architectures
> > is just sucky.  Now maybe, somebody was careful and it all works great
> > on RISC-V.  But IMO, the burden of proof is to show that the intrinsics
> > work, not to show that they don't.
>
> I agree there've been problems. But I don't think one can make a lot of
> conclusions from the intrinsics quality for dying platforms when judging
> new platforms. Most if not all new platforms with a gcc port that PG
> would be run on are going to be multi-core platforms - if intrinsics are
> broken, there'll be more problems. Furthermore, we're fully exposed to
> the intrinsics quality due to atomics anyway - a lot more even, because
> there's a shrinking amount of contended spinlocks, and a lot more
> contended lwlocks etc (which use atomics).  And lastly, our spinlock
> implementations have been far from perfect too.

As a data point: the obvious change builds and passes basic testing
with flying colours, on FreeBSD + clang 11 running under qemu.  RISC-V
has been "non-experimental" since clang 9.  FWIW I've signed up to the
preorder list to try to get my hands on some BeagleV hardware to test
this stuff for real ASAP...

-#if defined(__arm__) || defined(__arm) || defined(__aarch64__) ||
defined(__aarch64)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__) ||
defined(__aarch64) || defined(__riscv)



Re: postgres has no spinlock support on riscv rv64imafdc

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> As a data point: the obvious change builds and passes basic testing
> with flying colours, on FreeBSD + clang 11 running under qemu.  RISC-V
> has been "non-experimental" since clang 9.  FWIW I've signed up to the
> preorder list to try to get my hands on some BeagleV hardware to test
> this stuff for real ASAP...

Yeah, I'm on that list too ... crickets so far.

            regards, tom lane



Re: postgres has no spinlock support on riscv rv64imafdc

От
Thomas Munro
Дата:
On Fri, Apr 16, 2021 at 2:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > As a data point: the obvious change builds and passes basic testing
> > with flying colours, on FreeBSD + clang 11 running under qemu.  RISC-V
> > has been "non-experimental" since clang 9.  FWIW I've signed up to the
> > preorder list to try to get my hands on some BeagleV hardware to test
> > this stuff for real ASAP...
>
> Yeah, I'm on that list too ... crickets so far.

I passed this change on to the brave souls who are bringing up the
whole universe of open source software on this new architecture, in
the FreeBSD ports tree.  PostgreSQL's failure to compile was blocking
hundreds of dependent packages so it seemed good to at least try
something at this early stage (other databases are still blockers;
apparently you can trust database hackers to use weird arch-specific
stuff).  It'll be interesting to see if any reports of brokenness come
in from that, but for now I just know that a whole mountain of new
stuff now builds there.

One thing I noticed while worrying about whether it might have
insufficiently strong barriers is that Clang and GCC are generating
different code:

__sync_lock_test_and_set():
 Clang 11: amoswap.w.aqrl  a0,a1,(a0)
 GCC 9:    amoswap.w.aq    a4,a4,(a5)

__sync_lock_release():
 Clang 11: fence   rw,w
           li      a1,0
           sw      a1,0(a0)
 GCC 9:    fence   iorw,ow
           amoswap.w       zero,zero,(a5)

S_LOCK() looks a bit like figure A7 of the manual[1], with at least
acquire semantics (but also release on Clang), and S_UNLOCK() looks a
bit like figure A8, with an explicit release.  I don't immediately
know why acquire and release semantics wouldn't be enough here.
Anyone?  I do see that ARM, Itanium and PowerPC include a full
barrier, though in different places: on ARM,
__sync_lock_test_and_set() on ARM generates ldxr, stxr, dmb (= full
memory barrier) rather than using ldaxr (= acquire) and stlxr (=
release), and this seems to be since [2], which is quite interesting
and might be a clue; on Itanium and PowerPC, it's explicit in
S_UNLOCK().

Some of the questions I have are: can anyone with real hardware access
reproduce the failure reported back in 2016[3], or was that entirely
due to the pre-alpha toolchain?  If it can be reproduced, is it with
GCC only and not Clang, and does adding an initial
__sync_synchronize() help?

[1] https://riscv.org/wp-content/uploads/2019/06/riscv-spec.pdf
[2] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=a96297a28065d27559d17ebe1e0eda308a05e965
[3] https://www.postgresql.org/message-id/20161120184942.GP11243%40redhat.com