Обсуждение: BUG #2401: spinlocks not available on amd64
The following bug has been logged online: Bug reference: 2401 Logged by: Theo Schlossnagle Email address: jesus@omniti.com PostgreSQL version: 8.1.3 Operating system: Solaris 10 Description: spinlocks not available on amd64 Details: Compiling 8.1.3 on solaris 10 x86 with Sun Studio 11 for amd64 target architecture leads us to an error resulting from no available "tas" assembly. The tas.s file doesn't look like valid assembly for the shipped Sun assembler.
Theo Schlossnagle wrote: > > The following bug has been logged online: > > Bug reference: 2401 > Logged by: Theo Schlossnagle > Email address: jesus@omniti.com > PostgreSQL version: 8.1.3 > Operating system: Solaris 10 > Description: spinlocks not available on amd64 > Details: > > Compiling 8.1.3 on solaris 10 x86 with Sun Studio 11 for amd64 target > architecture leads us to an error resulting from no available "tas" > assembly. > > The tas.s file doesn't look like valid assembly for the shipped Sun > assembler. Yes. We will have a fix for it in 8.2, but it was too risky for 8.1.X. You can try a snapshot tarball from our FTP server and see if that works. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Theo Schlossnagle wrote: > >> Compiling 8.1.3 on solaris 10 x86 with Sun Studio 11 for amd64 target > >> architecture leads us to an error resulting from no available "tas" > >> assembly. > >> > >> The tas.s file doesn't look like valid assembly for the shipped Sun > >> assembler. > > > > Yes. We will have a fix for it in 8.2, but it was too risky for > > 8.1.X. > > You can try a snapshot tarball from our FTP server and see if that > > works. > > I reviewed the code there. I think it can be better. The issue is > that s_lock chooses to implement the lock in an unsigned char which > isn't optimal on sparc or x86. An over arching issue is that the tas > instruction is a less functional cas function, so it seems that the > tas should be cas and the spinlocks should be implemented that way. > By using cas, you can can actually know the locker by cas'ing the > lock to the process id instead of 1 -- we use that trick to see who > is holding the spinlock between threads (we obviously use thread ids > in that case). Great. > So... I changed the s_lock.h to merge the sparc and x86 sections thusly: > > ------------- > #if defined(__sun) && (defined(__i386) || defined(__amd64) || defined > (__sparc) | > | defined(__sparc__)) > /* > * Solaris/386 (we only get here for non-gcc case) > */ > #define HAS_TEST_AND_SET > typedef unsigned int slock_t; > > extern volatile slock_t > pg_atomic_cas(volatile slock_t *lock, slock_t with, slock_t cmp); > > #define TAS(a) (pg_atomic_cas((a), 1, 0) != 0) > > #endif > ------------- > > And then replaced solaris_sparc.s and solaris_i386.s (BTW there is no > reason to have these files seprately as you can #ifdef them correctly > in the assembly -- I didn't do that as I didn't want to disturb the > make system). OK, this is a great help. If you think it should be just one file we can do that, but since the are separate instructions sets, separate files I think still makes sense. No one at Sun is helping us (after repeated requests), so we do need someone to help clear up this mess and improve it. Let me merge in what you have done and I will let you know when you can test another snapshot. --------------------------------------------------------------------------- > solaris_sparc.s > ------------- > / > ======================================================================== > ===== > / tas.s -- compare and swap for solaris_sparc > / > ======================================================================== > ===== > > #if defined(__sparcv9) || defined(__sparc) > > .section ".text" > .align 8 > .skip 24 > .align 4 > > .global pg_atomic_cas > pg_atomic_cas: > cas [%o0],%o2,%o1 > mov %o1,%o0 > retl > nop > .type pg_atomic_cas,2 > .size pg_atomic_cas,(.-pg_atomic_cas) > #endif > ------------- > > solaris_i386.s > ------------- > / > ======================================================================== > ===== > / tas.s -- compare and swap for solaris_i386 > / > ======================================================================== > ===== > > .file "tas.s" > > #if defined(__amd64) > .code64 > #endif > > .globl pg_atomic_cas > .type pg_atomic_cas, @function > > .section .text, "ax" > .align 16 > pg_atomic_cas: > #if defined(__amd64) > movl %edx,%eax > lock > cmpxchgl %esi,(%rdi) > #else > movl 4(%esp), %edx > movl 8(%esp), %ecx > movl 12(%esp), %eax > lock > cmpxchgl %ecx, (%edx) > #endif > ret > .size pg_atomic_cas, . - pg_atomic_cas > ------------- > > > // Theo Schlossnagle > // CTO -- http://www.omniti.com/~jesus/ > // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ > // Ecelerity: Run with it. > > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, this is a great help. If you think it should be just one file we > can do that, but since the are separate instructions sets, separate > files I think still makes sense. There is no reason for the i386 or AMD64 code to be different from what's already tested on Linux --- the hardware's the same and the OS surely doesn't make a difference at this level. Does Solaris' assembler really support C-style "#if defined()" constructs in .s files? That seems moderately unlikely. regards, tom lane
Tom Lane wrote: >Bruce Momjian <pgman@candle.pha.pa.us> writes: > > >>OK, this is a great help. If you think it should be just one file we >>can do that, but since the are separate instructions sets, separate >>files I think still makes sense. >> >> > >There is no reason for the i386 or AMD64 code to be different from what's >already tested on Linux --- the hardware's the same and the OS surely >doesn't make a difference at this level. > > On linux you use gcc, which allows for inline assembly. So, the code is already very different. Solaris cc doesn't support inline assemly unless you use .il files (which is a management and build nightmare). GCC's sparc backend is pretty awful, so it makes sense to embrace Sun Studio 11 (it's free after all) to make Postgres run reasonably well on sparc. While we're at it, it makes good sense to unify the methodology of builds on Solaris (regardless of the architecture). "as" on Solaris is shipped as a part of the core system -- so it is available without Sun Studio and will interoperate with other gcc compiled objects for a painless linkage. Yes there is a reason to use different code on Opteron than on i386. It requires less insutruction to do cas operations on opteron than on 80386. It is a tiny amount of code and well test on Solaris. It's in solaris_XXX.s files, so clearly it is different already. i386 and amd64 have different instruction sets and different registers and thus performing 32bit ops inside a 64bit program on opteron requires less register "setup" -- you can save 2 or 3 instructions. Regardless, the code I provided has far fewer instructions for the spinlocks than the code that was there. >Does Solaris' assembler really support C-style "#if defined()" >constructs in .s files? That seems moderately unlikely. > > Yes. That's the code I used to compile my stuff. Solaris's assembler certainly allows CPP (it's the -P flag). It allows easily building dual architecture builds from the same file with simply different flags to compile instead of different build object sources. With the provided files you can compile sparcv8plus (32bit) sparcv9 (64bit) i386 (32bit) and amd64 (64bit). (intel) as -K PIC -P tas.s (amd64) as -K PIC -P -xarch=amd64 tas.s (sparcv8plus) as -K PIC -P -xarch=sparcv8plus tas.s (sparcv9) as -K PIC -P -xarch=sparcv9 tas.s Separating the files out is as simple as making four different files (three in this case as sparcv8plus and sparcv9 use the same asm for 32bit cas). I would still put them in seperate files as you may want to add 64bit atomics at some point in the future and then the sparcv8plus and sparcv9 stuff will differ. Since the code is so small in the first place, it makes sense to me to put them in one file -- but that is clearly just a personal preference. My changes are just offered back. I made them because I wanted to get Pg to compile on my box, but while I was at it, I believe I reduced the complexity of code and offered (with pg_atomic_cas) an opportunity to ascertain _who_ holds the lock when you have spinlock contention. (as you could potentially cas the lock to the pg procpid instead of 1 at no additional cost). Best regards, Theo -- // Theo Schlossnagle // Principal Engineer -- http://www.omniti.com/~jesus/ // Ecelerity: Run with it. -- http://www.omniti.com/
On Apr 19, 2006, at 11:17 PM, Bruce Momjian wrote: > Theo Schlossnagle wrote: >> >> The following bug has been logged online: >> >> Bug reference: 2401 >> Logged by: Theo Schlossnagle >> Email address: jesus@omniti.com >> PostgreSQL version: 8.1.3 >> Operating system: Solaris 10 >> Description: spinlocks not available on amd64 >> Details: >> >> Compiling 8.1.3 on solaris 10 x86 with Sun Studio 11 for amd64 target >> architecture leads us to an error resulting from no available "tas" >> assembly. >> >> The tas.s file doesn't look like valid assembly for the shipped Sun >> assembler. > > Yes. We will have a fix for it in 8.2, but it was too risky for > 8.1.X. > You can try a snapshot tarball from our FTP server and see if that > works. I reviewed the code there. I think it can be better. The issue is that s_lock chooses to implement the lock in an unsigned char which isn't optimal on sparc or x86. An over arching issue is that the tas instruction is a less functional cas function, so it seems that the tas should be cas and the spinlocks should be implemented that way. By using cas, you can can actually know the locker by cas'ing the lock to the process id instead of 1 -- we use that trick to see who is holding the spinlock between threads (we obviously use thread ids in that case). So... I changed the s_lock.h to merge the sparc and x86 sections thusly: ------------- #if defined(__sun) && (defined(__i386) || defined(__amd64) || defined (__sparc) | | defined(__sparc__)) /* * Solaris/386 (we only get here for non-gcc case) */ #define HAS_TEST_AND_SET typedef unsigned int slock_t; extern volatile slock_t pg_atomic_cas(volatile slock_t *lock, slock_t with, slock_t cmp); #define TAS(a) (pg_atomic_cas((a), 1, 0) != 0) #endif ------------- And then replaced solaris_sparc.s and solaris_i386.s (BTW there is no reason to have these files seprately as you can #ifdef them correctly in the assembly -- I didn't do that as I didn't want to disturb the make system). solaris_sparc.s ------------- / ======================================================================== ===== / tas.s -- compare and swap for solaris_sparc / ======================================================================== ===== #if defined(__sparcv9) || defined(__sparc) .section ".text" .align 8 .skip 24 .align 4 .global pg_atomic_cas pg_atomic_cas: cas [%o0],%o2,%o1 mov %o1,%o0 retl nop .type pg_atomic_cas,2 .size pg_atomic_cas,(.-pg_atomic_cas) #endif ------------- solaris_i386.s ------------- / ======================================================================== ===== / tas.s -- compare and swap for solaris_i386 / ======================================================================== ===== .file "tas.s" #if defined(__amd64) .code64 #endif .globl pg_atomic_cas .type pg_atomic_cas, @function .section .text, "ax" .align 16 pg_atomic_cas: #if defined(__amd64) movl %edx,%eax lock cmpxchgl %esi,(%rdi) #else movl 4(%esp), %edx movl 8(%esp), %ecx movl 12(%esp), %eax lock cmpxchgl %ecx, (%edx) #endif ret .size pg_atomic_cas, . - pg_atomic_cas ------------- // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ // Ecelerity: Run with it.
Robert, Is there someone on the Solaris build team who should be seeing this thread? Josh Berkus PostgreSQL @ Sun San Francisco
I'm copying Jim Gates who has started looking into this issue. He's out this week. -Robert Joshua Berkus wrote: >Robert, > >Is there someone on the Solaris build team who should be seeing this thread? > >Josh Berkus >PostgreSQL @ Sun >San Francisco > > >---------------------------(end of broadcast)--------------------------- >TIP 2: Don't 'kill -9' the postmaster > >
The amd64 spintlock instructions use no AMD-specific features. It's base intel 64bit instruction set. We ship a product with similar such spin locks and have never had an issue across a large variety of chipsets (Intel, AMD, and virtualized). In short, if you can actually run 64bit code, the CAS stuff should just work. On Jun 24, 2009, at 8:27 AM, Gregory Stark wrote: > > Theo Schlossnagle wrote: > >> Tom Lane wrote: >> >>> There is no reason for the i386 or AMD64 code to be different from >>> what's >>> already tested on Linux --- the hardware's the same and the OS >>> surely >>> doesn't make a difference at this level. >> >> On linux you use gcc, which allows for inline assembly. So, the >> code is >> already very different. > > How does this interact with binary builds such as rpms? If someone > installs an > amd64 binary on an x86 machine or vice versa does this assembly do > the right > thing at all? Does it perform slowly? > > Ideally we would compile both and pick the right one at run-time but > that > might have annoying overhead if there's a branch before every > pg_atomic_cas > call. > > Perhaps a minimal thing to do would be to detect a mismatch on > startup and log > a message about it. > > -- > Gregory Stark > http://mit.edu/~gsstark/resume.pdf -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle p: +1.443.325.1357 x201 f: +1.410.872.4911
Theo Schlossnagle wrote: > Tom Lane wrote: > >> There is no reason for the i386 or AMD64 code to be different from what's >> already tested on Linux --- the hardware's the same and the OS surely >> doesn't make a difference at this level. > > On linux you use gcc, which allows for inline assembly. So, the code is > already very different. How does this interact with binary builds such as rpms? If someone installs an amd64 binary on an x86 machine or vice versa does this assembly do the right thing at all? Does it perform slowly? Ideally we would compile both and pick the right one at run-time but that might have annoying overhead if there's a branch before every pg_atomic_cas call. Perhaps a minimal thing to do would be to detect a mismatch on startup and log a message about it. -- Gregory Stark http://mit.edu/~gsstark/resume.pdf