Обсуждение: Question about LWLockAcquire's use of semaphores instead of spinlocks

Поиск
Список
Период
Сортировка

Question about LWLockAcquire's use of semaphores instead of spinlocks

От
"Robert E. Bruccoleri"
Дата:
On SGI multiprocessor machines, I suspect that a spinlock
implementation of LWLockAcquire would give better performance than
using IPC semaphores.  Is there any specific reason that a spinlock
could not be used in this context?

+-----------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org                |
| P.O. Box 314                | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534        |                                    |
+-----------------------------+------------------------------------+


Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
Tom Lane
Дата:
"Robert E. Bruccoleri" <bruc@stone.congenomics.com> writes:
> On SGI multiprocessor machines, I suspect that a spinlock
> implementation of LWLockAcquire would give better performance than
> using IPC semaphores.  Is there any specific reason that a spinlock
> could not be used in this context?

Are you confusing LWLockAcquire with TAS spinlocks?

If you're saying that we don't have an implementation of TAS for
SGI hardware, then feel free to contribute one.  If you are wanting to
replace LWLocks with spinlocks, then you are sadly mistaken, IMHO.
        regards, tom lane


Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
"Robert E. Bruccoleri"
Дата:
Tom Lane writes:
> 
> 
> "Robert E. Bruccoleri" <bruc@stone.congenomics.com> writes:
> > On SGI multiprocessor machines, I suspect that a spinlock
> > implementation of LWLockAcquire would give better performance than
> > using IPC semaphores.  Is there any specific reason that a spinlock
> > could not be used in this context?
> 
> Are you confusing LWLockAcquire with TAS spinlocks?

No.

> If you're saying that we don't have an implementation of TAS for
> SGI hardware, then feel free to contribute one.  If you are wanting to
> replace LWLocks with spinlocks, then you are sadly mistaken, IMHO.

This touches on my question. Why am I mistaken? I don't understand.

BTW, about 5 years ago, I rewrote the TAS spinlocks for the
SGI platform to make it work correctly. The current implementation
is fine.

+-----------------------------+------------------------------------+ 
| Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org                |
| P.O. Box 314                | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534        |                                    |
+-----------------------------+------------------------------------+


Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
"Luis Alberto Amigo Navarro"
Дата:
Hi Bob:
We're have been working with an sproc version of postgres and it has improve
performance over a NUMA3 origin 3000 due to IRIX implements round_robin by
default on memory placement instead of first touch as it did on fork. We're
been wondering about replacing IPC shmem with a shared arena to help
performance improve on IRIX. I dont´know if people here in postgres are
interested on specifical ports but it could help you improve your
performance.
Regards
----- Original Message -----
From: "Robert E. Bruccoleri" <bruc@stone.congenomics.com>
To: "Tom Lane" <tgl@sss.pgh.pa.us>
Cc: <bruc@acm.org>; <pgsql-hackers@postgresql.org>
Sent: Sunday, July 28, 2002 5:45 AM
Subject: Re: [HACKERS] Question about LWLockAcquire's use of semaphores
instead of spinlocks


> Tom Lane writes:
> >
> >
> > "Robert E. Bruccoleri" <bruc@stone.congenomics.com> writes:
> > > On SGI multiprocessor machines, I suspect that a spinlock
> > > implementation of LWLockAcquire would give better performance than
> > > using IPC semaphores.  Is there any specific reason that a spinlock
> > > could not be used in this context?
> >
> > Are you confusing LWLockAcquire with TAS spinlocks?
>
> No.
>
> > If you're saying that we don't have an implementation of TAS for
> > SGI hardware, then feel free to contribute one.  If you are wanting to
> > replace LWLocks with spinlocks, then you are sadly mistaken, IMHO.
>
> This touches on my question. Why am I mistaken? I don't understand.
>
> BTW, about 5 years ago, I rewrote the TAS spinlocks for the
> SGI platform to make it work correctly. The current implementation
> is fine.
>
> +-----------------------------+------------------------------------+
> | Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org                |
> | P.O. Box 314                | URL:   http://www.congen.com/~bruc |
> | Pennington, NJ 08534        |                                    |
> +-----------------------------+------------------------------------+
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>




Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
Tom Lane
Дата:
"Robert E. Bruccoleri" <bruc@stone.congenomics.com> writes:
> Tom Lane writes:
>> If you're saying that we don't have an implementation of TAS for
>> SGI hardware, then feel free to contribute one.  If you are wanting to
>> replace LWLocks with spinlocks, then you are sadly mistaken, IMHO.

> This touches on my question. Why am I mistaken? I don't understand.

Because we just got done replacing spinlocks with LWLocks ;-).  I don't
believe that reverting that change will improve matters.  It will
certainly hurt on SMP machines, and I believe that it would at best
be a breakeven proposition on uniprocessors.  See the discussions last
fall that led up to development of the LWLock mechanism.

The problem with TAS spinlocks is that they are suitable only for
implementing locks that will be held for *very short* periods, ie,
actual contention is rare.  Over time we had allowed that mechanism to
be abused for locking fairly large and complex shared-memory data
structures (eg, the lock manager, the buffer manager).  The next step
up, a lock-manager lock, is very expensive and certainly can't be used
by the lock manager itself anyway.  LWLocks are an intermediate
mechanism that is marginally more expensive than a spinlock but behaves
much more gracefully in the presence of contention.  LWLocks also allow
us to distinguish shared and exclusive lock modes, thus further reducing
contention in some cases.

BTW, now that I reread the title of your message, I wonder if you
haven't just misunderstood what's happening in lwlock.c.  There is no
IPC semaphore call in the fast (no-contention) path of control.  A
semaphore call occurs only when we are forced to wait, ie, yield the
processor.  Substituting a spinlock for that cannot improve matters;
it would essentially result in wasting the remainder of our timeslice
in a busy-loop, rather than yielding the CPU at once to some other
process that can get some useful work done.
        regards, tom lane


Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
"Robert E. Bruccoleri"
Дата:
Dear Tom,Thank you for the explanation. I did not understand what was
going on in lwlock.c.My systems are all SGI Origins having between 8 and 32
processors, and I've been running PostgreSQL on them for about 5
years.  These machines do provide a number of good mechanisms for high
performance shared memory parallelism that I don't think are found
elsewhere.  I wish that I had the time to understand and tune
PostgreSQL to run really well on them.I have a question for you and other developers with regard to
my SGI needs. If I made a functional Origin 2000 system available to
you with hardware support, would the group be willing to tailor the
SGI port for better performance?
                Sincerely,                Bob
> 
> 
> "Robert E. Bruccoleri" <bruc@stone.congenomics.com> writes:
> > Tom Lane writes:
> >> If you're saying that we don't have an implementation of TAS for
> >> SGI hardware, then feel free to contribute one.  If you are wanting to
> >> replace LWLocks with spinlocks, then you are sadly mistaken, IMHO.
> 
> > This touches on my question. Why am I mistaken? I don't understand.
> 
> Because we just got done replacing spinlocks with LWLocks ;-).  I don't
> believe that reverting that change will improve matters.  It will
> certainly hurt on SMP machines, and I believe that it would at best
> be a breakeven proposition on uniprocessors.  See the discussions last
> fall that led up to development of the LWLock mechanism.
> 
> The problem with TAS spinlocks is that they are suitable only for
> implementing locks that will be held for *very short* periods, ie,
> actual contention is rare.  Over time we had allowed that mechanism to
> be abused for locking fairly large and complex shared-memory data
> structures (eg, the lock manager, the buffer manager).  The next step
> up, a lock-manager lock, is very expensive and certainly can't be used
> by the lock manager itself anyway.  LWLocks are an intermediate
> mechanism that is marginally more expensive than a spinlock but behaves
> much more gracefully in the presence of contention.  LWLocks also allow
> us to distinguish shared and exclusive lock modes, thus further reducing
> contention in some cases.
> 
> BTW, now that I reread the title of your message, I wonder if you
> haven't just misunderstood what's happening in lwlock.c.  There is no
> IPC semaphore call in the fast (no-contention) path of control.  A
> semaphore call occurs only when we are forced to wait, ie, yield the
> processor.  Substituting a spinlock for that cannot improve matters;
> it would essentially result in wasting the remainder of our timeslice
> in a busy-loop, rather than yielding the CPU at once to some other
> process that can get some useful work done.
> 
>             regards, tom lane
> 

+-----------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org                |
| P.O. Box 314                | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534        |                                    |
+-----------------------------+------------------------------------+


Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
"Robert E. Bruccoleri"
Дата:
Dear Luis,I would be very interested. Replacing the IPC shared memory
with an arena make a lot of sense. --Bob

> 
> Hi Bob:
> We're have been working with an sproc version of postgres and it has improve
> performance over a NUMA3 origin 3000 due to IRIX implements round_robin by
> default on memory placement instead of first touch as it did on fork. We're
> been wondering about replacing IPC shmem with a shared arena to help
> performance improve on IRIX. I dont´know if people here in postgres are
> interested on specifical ports but it could help you improve your
> performance.
> Regards

+-----------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org                |
| P.O. Box 314                | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534        |                                    |
+-----------------------------+------------------------------------+


Re: Question about LWLockAcquire's use of semaphores instead of spinlocks

От
"Luis Alberto Amigo Navarro"
Дата:
----- Original Message -----
From: "Robert E. Bruccoleri" <bruc@stone.congenomics.com>
To: "Luis Alberto Amigo Navarro" <lamigo@atc.unican.es>
Cc: <bruc@acm.org>; <tgl@sss.pgh.pa.us>; <pgsql-hackers@postgresql.org>
Sent: Monday, July 29, 2002 2:48 AM
Subject: Re: [HACKERS] Question about LWLockAcquire's use of semaphores
instead of spinlocks


> Dear Luis,
> I would be very interested. Replacing the IPC shared memory
> with an arena make a lot of sense. --Bob
>
On old PowerChallenge postgres works really fine, but in new NUMA
architectures postgres works so badly, as we have known, forked backends
don't allow IRIX to manage memory as it would be desired. Leaving First
Touch placement algorithm means that almost every useful data is placed on
the first node the process is run. Trying to use more than one node with
this schema results in a false sharing, secondary cache hits ratio drops
below 85% due to latency on a second node is about 6 times bigger than in
the first node even worse if you have more than 4 nodes. All of this causes
that you're almost only working with a node (4 cpus in origin 3000).
Implementing Round-Robin placement algorithms causes that memory pages are
placed each one in one node, this causes that all nodes have the same chance
to work with some pages locally and some pages remotely. The more the number
of nodes, the more advantage you can take with round-robin.
You can enable round-robin recompiling postgres, setting before the
enviroment variable _DSM_ROUND_ROBIN=TRUE
it works fine with fork(), and it is not necessary using sprocs.
Changing IPC shared memory for a shared arena could improve performance
because it's the native shared segment on IRIX. it's something we're willing
to do, but by now it is only a project.
Hope it helps





Re: Question about LWLockAcquire's use of semaphores instead

От
Bruce Momjian
Дата:
Robert E. Bruccoleri wrote:
> Dear Tom,
>     Thank you for the explanation. I did not understand what was
> going on in lwlock.c.

Yes, as Tom said, using the pre-7.2 code on SMP machines, if one backend
had a spinlock, the other backend would TAS loop trying to get the lock
until its timeslice ended or the other backend released the lock.  Now,
we TAS, then sleep on a semaphore and get woken up when the first
backend releases the lock.  We worked hard on that logic, I can tell you
that and it was a huge discussion topic on the Fall of 2001.

>     My systems are all SGI Origins having between 8 and 32
> processors, and I've been running PostgreSQL on them for about 5
> years.  These machines do provide a number of good mechanisms for high
> performance shared memory parallelism that I don't think are found
> elsewhere.  I wish that I had the time to understand and tune
> PostgreSQL to run really well on them.
>     I have a question for you and other developers with regard to
> my SGI needs. If I made a functional Origin 2000 system available to
> you with hardware support, would the group be willing to tailor the
> SGI port for better performance?

We would have to understand how the SGI code is better than our existing
code on SMP machines.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Question about LWLockAcquire's use of semaphores instead

От
"Luis Alberto Amigo Navarro"
Дата:
> We would have to understand how the SGI code is better than our existing
> code on SMP machines.

there is a big problem with postgres on SGI NUMA architectures, on UMA
systems postgres works fine, but NUMA Origins need a native shared memory
management. It scales fine over old challenges, but scales very poorly on
NUMA architectures, giving fine speed-up only within a single node. For more
than one node throughput drops greatly, implementing Round-robin memory
placement algorithms it gets a bit better, changing from forks to native
sprocs(medium-weighted processes) makes it work better, but not good enough,
if you want postgres to run fine on this machines I think (it's not tested
yet) it would be neccesary to implement native shared arenas instead of IPC
shared memory in order to let IRIX make a fine load-balance.

I take advantage of this message to say that there is a cuple of things that
we have to insert on FAQ-IRIX about using 32 bits or 64 bits objects,
because it is a known issue that using 32 bit objects on IRIX do not allow
to use more than 1,2 Gb of shared memory because system management is unable
to find a single segment of this size.

Regards




Re: Question about LWLockAcquire's use of semaphores instead

От
Tom Lane
Дата:
"Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> writes:
> if you want postgres to run fine on this machines I think (it's not tested
> yet) it would be neccesary to implement native shared arenas instead of IPC
> shared memory in order to let IRIX make a fine load-balance.

In CVS tip, the direct dependencies on SysV shared-memory calls have
been split into a separate file, src/backend/port/sysv_shmem.c.  It
would not be difficult to make a crude port to some other shared-memory
API, if you want to do some performance testing.

A not-so-crude port would perhaps be more difficult.  One thing that we
depend on is being able to detect whether old backends are still running
in a particular database cluster (this could happen if the postmaster
itself crashes, leaving orphaned backends behind).  Currently this is
done by recording the shmem key in the postmaster's lockfile, and then
during restart looking to see if any process is still attached to that
shmem segment.  So we are relying on the fact that SysV shmem segments
(a) are not anonymous, and (b) accept a syscall to find out whether any
other processes are attached to them.  If the shared-memory API you want
to use doesn't support similar capabilities, then there's a problem.
You might be able to think of a different way to provide the same kind
of interlock, though.

> I take advantage of this message to say that there is a cuple of things that
> we have to insert on FAQ-IRIX about using 32 bits or 64 bits objects,

Send a patch ;-)
        regards, tom lane


Re: Question about LWLockAcquire's use of semaphores instead

От
"Luis Alberto Amigo Navarro"
Дата:
----- Original Message -----
From: "Bruce Momjian" <pgman@candle.pha.pa.us>
To: <bruc@acm.org>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-hackers@postgresql.org>
>
> We would have to understand how the SGI code is better than our existing
> code on SMP machines.
>

I've been searching for data from SGI's Origin presentation to illustrate
what am I saying, this graph only covers Memory bandwith, but take present
that as distance between nodes increase, memory access latency is also
increased: