Обсуждение: Spinning verses sleeping in s_lock

Поиск
Список
Период
Сортировка

Spinning verses sleeping in s_lock

От
mlw
Дата:
Let's think about this:

What is the advantage of spinning?

On a uniprocessor box, there is "never" any advantage because you must release
the CPU in order to allow the process which owns the lock to run long enough to
release it.

On an SMP box, this is a bit more complicated. If you have two CPUs, then
maybe, one process can spin, but obviously, more than one spinner is wasting
CPU, one of the spinners must release its time slice in order for another
process release the resource.

Is there a global area where a single count of all the processes spinning can
be kept? That way, when a process fails to acquire a lock, and there are
already (num_cpus -1) processes spinning, they can call select() right away.

Does this sound like an interesting approach?


Re: Spinning verses sleeping in s_lock

От
Tom Lane
Дата:
mlw <markw@mohawksoft.com> writes:
> What is the advantage of spinning?

> On a uniprocessor box, there is "never" any advantage because you must
> release the CPU in order to allow the process which owns the lock to
> run long enough to release it.

Actually, that analysis is faulty.

Suppose that we didn't use a select(), yield(), or anything, but just
spun until we got the lock.  On a uniprocessor this would imply spinning
until the end of our current timeslice, whereupon the CPU would be given
to someone else.  When control finally comes back to our process, more
than likely the holder of the spinlock has been allowed to run and has
finished his use of the spinlock, and we will be able to get it.  Thus,
in this scenario we waste the rest of our timeslice, but we are able
to proceed as soon as the scheduler next wishes to give us control.

If "the rest of our timeslice" is not long, this might actually be
better than an immediate delay via select() or usleep(), because those
won't allow us to run for X number of milliseconds.  It's entirely
possible that the CPU cycles we "save" by not spinning will end up being
spent in the kernel's idle loop, unless the overall load is so high that
the CPU never runs out of things to do.  I doubt this approach would be
a winner on average, but to claim that it never wins is wrong.

Now if we have available a sched_yield() type of primitive, that lets
us give up the processor without committing to any minimum delay before
we can be rescheduled, then I think it's usually true that on a
uniprocessor we might as well do sched_yield after just a single failure
of TAS().  (However, on some architectures such as Alpha even that much
is wrong; TAS() can "fail" for reasons that do not mean the spinlock is
locked.  So the most portable thing is to try TAS at least a few times,
even on a uniprocessor.)

> On an SMP box, this is a bit more complicated. If you have two CPUs, then
> maybe, one process can spin, but obviously, more than one spinner is wasting
> CPU, one of the spinners must release its time slice in order for another
> process release the resource.

I don't believe that either, not completely.  A process that is spinning
but hasn't got a CPU is not wasting cycles; it's essentially done an
implicit sched_yield, and as we just saw that is the most efficient way
of doing things.  What you are really trying to say is that it's
inefficient for *all* the currently executing processes to be spinning
on a lock that none of them hold.  This is true but there's no direct
way for us to determine (in userland code) that that condition holds.
If you try to maintain a userland counter of the number of actively
spinning processes, you will fail because a process might lose the CPU
while it has the counter incremented.  (There are also a whole bunch
of portability problems associated with trying to build an atomically
incrementable/decrementable counter, bearing in mind that the counter
can't itself be protected by a spinlock.)  Nor can you directly
determine whether the current holder of the spinlock is actively
running or not.  The most effective indirect test of that condition
is really to spin for awhile and see if the lock gets released.

A final comment --- given that in 7.2 we only use spinlocks to protect
very short segments of code, I believe it's fairly improbable for more
than two processes to be contending for a spinlock anyway.  So it's
probably sufficient to distinguish whether we have one or more than
one CPU, and statically select one of two spinning strategies on that
basis.  Trying to dynamically adapt for more CPUs/contending processes
will reap only minimal returns.
        regards, tom lane


Re: Spinning verses sleeping in s_lock

От
mlw
Дата:
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> > What is the advantage of spinning?
> 
> > On a uniprocessor box, there is "never" any advantage because you must
> > release the CPU in order to allow the process which owns the lock to
> > run long enough to release it.
> 
> Actually, that analysis is faulty.
> 
> Suppose that we didn't use a select(), yield(), or anything, but just
> spun until we got the lock.  On a uniprocessor this would imply spinning
> until the end of our current timeslice, whereupon the CPU would be given
> to someone else.  When control finally comes back to our process, more
> than likely the holder of the spinlock has been allowed to run and has
> finished his use of the spinlock, and we will be able to get it.  Thus,
> in this scenario we waste the rest of our timeslice, but we are able
> to proceed as soon as the scheduler next wishes to give us control.

On average, we would have to assume wasting half of a time slice. Yes, if we
are at the end of our time slice, spinning may be a winner, but on average, it
will be half our time slice.

This depends on the OS and scheduler and the duration of the time slice.

> 
> If "the rest of our timeslice" is not long, this might actually be
> better than an immediate delay via select() or usleep(), because those
> won't allow us to run for X number of milliseconds.  It's entirely
> possible that the CPU cycles we "save" by not spinning will end up being
> spent in the kernel's idle loop, unless the overall load is so high that
> the CPU never runs out of things to do.  I doubt this approach would be
> a winner on average, but to claim that it never wins is wrong.

Oh, I may have over stated my position, but on average, it would absolutely be
a loser.

> 
> Now if we have available a sched_yield() type of primitive, that lets
> us give up the processor without committing to any minimum delay before
> we can be rescheduled, then I think it's usually true that on a
> uniprocessor we might as well do sched_yield after just a single failure
> of TAS().  (However, on some architectures such as Alpha even that much
> is wrong; TAS() can "fail" for reasons that do not mean the spinlock is
> locked.  So the most portable thing is to try TAS at least a few times,
> even on a uniprocessor.)

Two things: 
(1) Why doesn't Postgresql use sched_yield() instead of select() in s_lock()?
There must be a way wrap that stuff in a macro in a port/os.h file.

(2) TAS fails when it should work on the alpha? Is it known why? 

> 
> > On an SMP box, this is a bit more complicated. If you have two CPUs, then
> > maybe, one process can spin, but obviously, more than one spinner is wasting
> > CPU, one of the spinners must release its time slice in order for another
> > process release the resource.
> 
> I don't believe that either, not completely.  A process that is spinning
> but hasn't got a CPU is not wasting cycles; it's essentially done an
> implicit sched_yield, and as we just saw that is the most efficient way
> of doing things.  What you are really trying to say is that it's
> inefficient for *all* the currently executing processes to be spinning
> on a lock that none of them hold.  This is true but there's no direct
> way for us to determine (in userland code) that that condition holds.

SMP has a lot of issues that are not completely obvious. I really don't like
spinlocks in userland. In an active server, wasting CPU cycles is just wrong.

> If you try to maintain a userland counter of the number of actively
> spinning processes, you will fail because a process might lose the CPU
> while it has the counter incremented.  (There are also a whole bunch
> of portability problems associated with trying to build an atomically
> incrementable/decrementable counter, bearing in mind that the counter
> can't itself be protected by a spinlock.)  Nor can you directly
> determine whether the current holder of the spinlock is actively
> running or not.  The most effective indirect test of that condition
> is really to spin for awhile and see if the lock gets released.

Any userland strategy to mitigate wasted CPU cycles has problems. The very
nature of the problem is OS related. Should PostgreSQL have a more OS specific
implementation in a port section of code?

> 
> A final comment --- given that in 7.2 we only use spinlocks to protect
> very short segments of code, I believe it's fairly improbable for more
> than two processes to be contending for a spinlock anyway.  So it's
> probably sufficient to distinguish whether we have one or more than
> one CPU, and statically select one of two spinning strategies on that
> basis.  Trying to dynamically adapt for more CPUs/contending processes
> will reap only minimal returns.

That is really one of those funny things about SMP stuff, you are probably
right, but sometimes little things really surprise.


Re: Spinning verses sleeping in s_lock

От
Tom Lane
Дата:
mlw <markw@mohawksoft.com> writes:
> (1) Why doesn't Postgresql use sched_yield() instead of select() in s_lock()?

Portability concerns.  It is something that I think we ought to look at
in a future release.  However, based on my latest comparisons I'm no
longer willing to think of it as a "must fix" for 7.2.

> (2) TAS fails when it should work on the alpha? Is it known why? 

Because the Alpha guys designed it to fail anytime it couldn't succeed
immediately, or something like that.  Consult the archives around end of
Dec 2000.  One thing I recall is that the first TAS attempt after a
process gains the CPU is almost guaranteed to fail, because the
necessary page lookup table entries haven't been faulted in.  The first
version of WAL support in beta 7.1 assumed it could TAS once and yield
the CPU on failure, and guess what: it spun forever on Alphas.

> SMP has a lot of issues that are not completely obvious. I really don't like
> spinlocks in userland. In an active server, wasting CPU cycles is just wrong.

If it takes more cycles to yield the CPU (and dispatch another process,
and eventually redispatch your own process) than it does to spin until
the other guy lets go of the lock, then "wasting" CPU cycles by spinning
is not wrong.  In our present usage of spinlocks this condition is
almost guaranteed to be true, for a multiprocessor system.  There are
no OSes that can do two process dispatches in ~20 instructions.

The real issue here is to devise a solution that costs less than 20
instructions on average in the multiprocessor case, while not wasting
cycles in the uniprocessor case.  And is portable.  That's a tough
challenge.  I think we will want to look at developing code to determine
whether there's more than one CPU, for sure; one algorithm to do both
cases optimally seems impossible.
        regards, tom lane


Re: Spinning verses sleeping in s_lock

От
Bruce Momjian
Дата:
> A final comment --- given that in 7.2 we only use spinlocks to protect
> very short segments of code, I believe it's fairly improbable for more
> than two processes to be contending for a spinlock anyway.  So it's
> probably sufficient to distinguish whether we have one or more than
> one CPU, and statically select one of two spinning strategies on that
> basis.  Trying to dynamically adapt for more CPUs/contending processes
> will reap only minimal returns.

Added to TODO:
* Add code to detect an SMP machine and handle spinlocks  accordingly

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026