Обсуждение: pgbench internal contention

Поиск
Список
Период
Сортировка

pgbench internal contention

От
Robert Haas
Дата:
On machines with lots of CPU cores, pgbench can start eating up a lot
of system time.  Investigation reveals that the problem is with
random(), which glibc implements like this:

long int
__random ()
{
  int32_t retval;
  __libc_lock_lock (lock);
  (void) __random_r (&unsafe_state, &retval);
  __libc_lock_unlock (lock);
  return retval;
}
weak_alias (__random, random)

Rather obviously, if you're running enough pgbench threads, you're
going to have a pretty ugly point of contention there.  On the 32-core
machine provided by Nate Boley, with my usual 5-minute SELECT-only
test, lazy-vxid and sinval-fastmessages applied, and scale factor 100,
"time" shows that pgbench uses almost as much system time as user
time:

$ time pgbench -n -S -T 300 -c 64 -j 64
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 300 s
number of transactions actually processed: 55319555
tps = 184396.016257 (including connections establishing)
tps = 184410.926840 (excluding connections establishing)

real    5m0.019s
user    21m10.100s
sys    17m45.480s

I patched it to use random_r() - the patch is attached - and here are
the (rather gratifying) results of that test:

$ time ./pgbench -n -S -T 300 -c 64 -j 64
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 64
number of threads: 64
duration: 300 s
number of transactions actually processed: 71851589
tps = 239503.585813 (including connections establishing)
tps = 239521.816698 (excluding connections establishing)

real    5m0.016s
user    20m40.880s
sys    9m25.930s

Since a client-limited benchmark isn't very interesting, I think this
change makes sense.  Thoughts?  Objections?  Coding style
improvements?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: pgbench internal contention

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On machines with lots of CPU cores, pgbench can start eating up a lot
> of system time.  Investigation reveals that the problem is with
> random(),

Interesting.

> I patched it to use random_r() - the patch is attached - and here are
> the (rather gratifying) results of that test:
> Since a client-limited benchmark isn't very interesting, I think this
> change makes sense.  Thoughts?  Objections?

Portability, or rather lack of it.  What about using erand48, which we
already have a dependency on (and substitute code for)?
        regards, tom lane


Re: pgbench internal contention

От
Robert Haas
Дата:
On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On machines with lots of CPU cores, pgbench can start eating up a lot
>> of system time.  Investigation reveals that the problem is with
>> random(),
>
> Interesting.
>
>> I patched it to use random_r() - the patch is attached - and here are
>> the (rather gratifying) results of that test:
>> Since a client-limited benchmark isn't very interesting, I think this
>> change makes sense.  Thoughts?  Objections?
>
> Portability, or rather lack of it.  What about using erand48, which we
> already have a dependency on (and substitute code for)?

Neither our implementation nor glibc's appears to be thread-safe, and
erand48() is deprecated according to my Linux man page:

NOTES      These functions are declared obsolete by  SVID  3,  which  states  that      rand(3) should be used instead.

glibc provides erand48_r(), and I suppose we could kludge up something
similar out of what's already in src/port?

This is also not exactly the world's most sophisticated algorithm, but
perhaps for pgbench that doesn't matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgbench internal contention

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Portability, or rather lack of it. �What about using erand48, which we
>> already have a dependency on (and substitute code for)?

> Neither our implementation nor glibc's appears to be thread-safe,

I think you're confusing srand48 with erand48.  The latter relies on a
caller-supplied state value, so if it's not thread-safe the caller has
only itself to blame.
        regards, tom lane


Re: pgbench internal contention

От
Robert Haas
Дата:
On Sat, Jul 30, 2011 at 2:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 29, 2011 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Portability, or rather lack of it.  What about using erand48, which we
>>> already have a dependency on (and substitute code for)?
>
>> Neither our implementation nor glibc's appears to be thread-safe,
>
> I think you're confusing srand48 with erand48.  The latter relies on a
> caller-supplied state value, so if it's not thread-safe the caller has
> only itself to blame.

I may be confused, but I'm not mixing it up with srand48.  On my
Fedora 12 VM, the man page says for erand48_r says, in relevant part:

SYNOPSIS      #include <stdlib.h>
      int erand48_r(unsigned short xsubi[3],                    struct drand48_data *buffer, double *result);

DESCRIPTION      These functions are the reentrant analogs of the functions described in      drand48(3).   Instead  of
modifying the global random generator state,      they use the supplied data buffer. 

And the glibc implementation of erand48 is this:

double
erand48 (xsubi)    unsigned short int xsubi[3];
{ double result;
 (void) __erand48_r (xsubi, &__libc_drand48_data, &result);
 return result;
}

On second look, I think our version is probably safe in this regard
since it appears modify only the state passed in and not anything
else.

*pokes a little more*

If I'm reading the code right, it only modifies __libc_drand48_data on
first call, so as long as we called erand48() at least once before
spawning the child threads, it would probably work.  That seems pretty
fragile in the face of the fact that they explicitly state that
they're modifying the global random generator state and that you
should use erand48_r() if you want reentrant behavior.  So I think if
we're going to go the erand48() route we probably ought to force
pgbench to always use our version rather than any OS-supplied version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgbench internal contention

От
Andy Colson
Дата:
On 07/29/2011 04:00 PM, Robert Haas wrote:
> On machines with lots of CPU cores, pgbench can start eating up a lot
> of system time.  Investigation reveals that the problem is with
> random(), which glibc implements like this:
>
> long int
> __random ()
> {
>    int32_t retval;
>    __libc_lock_lock (lock);
>    (void) __random_r (&unsafe_state,&retval);
>    __libc_lock_unlock (lock);
>    return retval;
> }
> weak_alias (__random, random)
>
> Rather obviously, if you're running enough pgbench threads, you're
> going to have a pretty ugly point of contention there.  On the 32-core
> machine provided by Nate Boley, with my usual 5-minute SELECT-only
> test, lazy-vxid and sinval-fastmessages applied, and scale factor 100,
> "time" shows that pgbench uses almost as much system time as user
> time:
>
> $ time pgbench -n -S -T 300 -c 64 -j 64
> transaction type: SELECT only
> scaling factor: 100
> query mode: simple
> number of clients: 64
> number of threads: 64
> duration: 300 s
> number of transactions actually processed: 55319555
> tps = 184396.016257 (including connections establishing)
> tps = 184410.926840 (excluding connections establishing)
>
> real    5m0.019s
> user    21m10.100s
> sys    17m45.480s
>
> I patched it to use random_r() - the patch is attached - and here are
> the (rather gratifying) results of that test:
>
> $ time ./pgbench -n -S -T 300 -c 64 -j 64
> transaction type: SELECT only
> scaling factor: 100
> query mode: simple
> number of clients: 64
> number of threads: 64
> duration: 300 s
> number of transactions actually processed: 71851589
> tps = 239503.585813 (including connections establishing)
> tps = 239521.816698 (excluding connections establishing)
>
> real    5m0.016s
> user    20m40.880s
> sys    9m25.930s
>
> Since a client-limited benchmark isn't very interesting, I think this
> change makes sense.  Thoughts?  Objections?  Coding style
> improvements?
>
>
>
>
>
How much randomness do we really need for test data.  What if it where changed to more of a random starting point and
thenautoinc'd after that.  Or if there were two func's, a rand() and a next().  If your test really needs randomness
userand(), otherwise use next(), it would be way faster, and you dont really care what the number is anyway.
 

-Andy


Re: pgbench internal contention

От
Robert Haas
Дата:
On Jul 30, 2011, at 9:40 AM, Andy Colson <andy@squeakycode.net> wrote:
> On 07/29/2011 04:00 PM, Robert Haas wrote:
>> On machines with lots of CPU cores, pgbench can start eating up a lot
>> of system time.  Investigation reveals that the problem is with
>> random(), which glibc implements like this:
>>
>> long int
>> __random ()
>> {
>>   int32_t retval;
>>   __libc_lock_lock (lock);
>>   (void) __random_r (&unsafe_state,&retval);
>>   __libc_lock_unlock (lock);
>>   return retval;
>> }
>> weak_alias (__random, random)
>>
>> Rather obviously, if you're running enough pgbench threads, you're
>> going to have a pretty ugly point of contention there.  On the 32-core
>> machine provided by Nate Boley, with my usual 5-minute SELECT-only
>> test, lazy-vxid and sinval-fastmessages applied, and scale factor 100,
>> "time" shows that pgbench uses almost as much system time as user
>> time:
>>
>> $ time pgbench -n -S -T 300 -c 64 -j 64
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 300 s
>> number of transactions actually processed: 55319555
>> tps = 184396.016257 (including connections establishing)
>> tps = 184410.926840 (excluding connections establishing)
>>
>> real    5m0.019s
>> user    21m10.100s
>> sys    17m45.480s
>>
>> I patched it to use random_r() - the patch is attached - and here are
>> the (rather gratifying) results of that test:
>>
>> $ time ./pgbench -n -S -T 300 -c 64 -j 64
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 300 s
>> number of transactions actually processed: 71851589
>> tps = 239503.585813 (including connections establishing)
>> tps = 239521.816698 (excluding connections establishing)
>>
>> real    5m0.016s
>> user    20m40.880s
>> sys    9m25.930s
>>
>> Since a client-limited benchmark isn't very interesting, I think this
>> change makes sense.  Thoughts?  Objections?  Coding style
>> improvements?
>>
>>
>>
>>
>>
> How much randomness do we really need for test data.  What if it where changed to more of a random starting point and
thenautoinc'd after that.  Or if there were two func's, a rand() and a next().  If your test really needs randomness
userand(), otherwise use next(), it would be way faster, and you dont really care what the number is anyway. 

Well, I think you need at least pseudo-randomness for pgbench - reading the table in sequential order is not going to
performthe same as doing random fetches against it. 

...Robert

Re: pgbench internal contention

От
Robert Haas
Дата:
On Sat, Jul 30, 2011 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> If I'm reading the code right, it only modifies __libc_drand48_data on
> first call, so as long as we called erand48() at least once before
> spawning the child threads, it would probably work.  That seems pretty
> fragile in the face of the fact that they explicitly state that
> they're modifying the global random generator state and that you
> should use erand48_r() if you want reentrant behavior.  So I think if
> we're going to go the erand48() route we probably ought to force
> pgbench to always use our version rather than any OS-supplied version.

Attached is a try at that approach.  Performance benefits are similar
to before.  Same test case as in my OP on this thread, alternating
runs without and with this patch:

tps = 199133.418419 (including connections establishing)
real    5m0.017s
user    23m42.170s
sys    18m46.270s

tps = 226202.289151 (including connections establishing)
real    5m0.018s
user    22m7.520s
sys    9m54.570s

tps = 191144.247489 (including connections establishing)
real    5m0.025s
user    23m35.200s
sys    17m19.070s

tps = 226353.187955 (including connections establishing)
real    5m0.017s
user    21m42.300s
sys    10m9.820s

tps = 189602.248908 (including connections establishing)
real    5m0.044s
user    23m24.060s
sys    17m1.050s

tps = 224521.459164 (including connections establishing)
real    5m0.017s
user    22m9.620s
sys    10m22.590s

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: pgbench internal contention

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> If I'm reading the code right, it only modifies __libc_drand48_data on
> first call, so as long as we called erand48() at least once before
> spawning the child threads, it would probably work.  That seems pretty
> fragile in the face of the fact that they explicitly state that
> they're modifying the global random generator state and that you
> should use erand48_r() if you want reentrant behavior.  So I think if
> we're going to go the erand48() route we probably ought to force
> pgbench to always use our version rather than any OS-supplied version.

Or add erand48_r() to our version and use that.
        regards, tom lane


Re: pgbench internal contention

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jul 30, 2011 at 9:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> If I'm reading the code right, it only modifies __libc_drand48_data on
>> first call, so as long as we called erand48() at least once before
>> spawning the child threads, it would probably work. �That seems pretty
>> fragile in the face of the fact that they explicitly state that
>> they're modifying the global random generator state and that you
>> should use erand48_r() if you want reentrant behavior. �So I think if
>> we're going to go the erand48() route we probably ought to force
>> pgbench to always use our version rather than any OS-supplied version.

> Attached is a try at that approach.

I don't find this to be a particularly safe idea:

>  #ifndef HAVE_ERAND48
> -/* we assume all of these are present or missing together */
> -extern double erand48(unsigned short xseed[3]);
> -extern long lrand48(void);
> -extern void srand48(long seed);
> +#define erand48(x) pg_erand48(x)
> +#define lrand48() pg_lrand48()
> +#define srand48(x) pg_srand48(x)
>  #endif

See our recent trials with python headers for an example of why this
might cause problems on some platforms.  For that matter, I believe
<stdlib.h> would be within its rights to be defining these names as
macros already.

If you want erand48_r, best to provide that API, not kluge up some
other functions.
        regards, tom lane


Re: pgbench internal contention

От
Greg Smith
Дата:
On 07/30/2011 09:08 AM, Robert Haas wrote:
> If I'm reading the code right, it only modifies __libc_drand48_data on
> first call, so as long as we called erand48() at least once before
> spawning the child threads, it would probably work.  That seems pretty
> fragile in the face of the fact that they explicitly state that
> they're modifying the global random generator state and that you
> should use erand48_r() if you want reentrant behavior.  So I think if
> we're going to go the erand48() route we probably ought to force
> pgbench to always use our version rather than any OS-supplied version.
>    

By the way:  the landmines in this whole area are what sunk the attempt 
to switch pgbench over to using 64 bits for the accounts table back in 
February.  See the last few messages of 
http://postgresql.1045698.n5.nabble.com/Re-PERFORM-pgbench-to-the-MAXINT-td3337374.html 
to revisit.  I think you've retraced all of that now, but double 
checking your plan against things like the AIX specific weirdness I 
pointed out there may be useful.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: pgbench internal contention

От
Robert Haas
Дата:
On Mon, Aug 1, 2011 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Attached is a try at that approach.
>
> I don't find this to be a particularly safe idea:
>
>>  #ifndef HAVE_ERAND48
>> -/* we assume all of these are present or missing together */
>> -extern double erand48(unsigned short xseed[3]);
>> -extern long lrand48(void);
>> -extern void srand48(long seed);
>> +#define erand48(x) pg_erand48(x)
>> +#define lrand48() pg_lrand48()
>> +#define srand48(x) pg_srand48(x)
>>  #endif
>
> See our recent trials with python headers for an example of why this
> might cause problems on some platforms.  For that matter, I believe
> <stdlib.h> would be within its rights to be defining these names as
> macros already.

If HAVE_ERAND48 isn't defined and erand48() is nevertheless defined as
a macro, it's a bug in the autoconf test for erand48().  The whole
point is to do that only when there isn't any erand48().  But we could
dodge the issue at any rate by making erand48(), lrand48(), and
srand48() functions that call pg_erand48(), pg_lrand48(), and
pg_srand48() rather than macros.  And I'd rather do that than this,
honestly:

> If you want erand48_r, best to provide that API, not kluge up some
> other functions.

...because erand48() is a GNU extension with a stupid API.  I don't
see much value in supporting that, on both counts.  We're going to end
up with the built-in erand48_r() on precisely those systems that use
glibc, and our own everywhere else.  For the 25 SLOCs it's going cost
us, I'd rather use the same code everywhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgbench internal contention

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
>> If you want erand48_r, best to provide that API, not kluge up some
>> other functions.

> ...because erand48() is a GNU extension with a stupid API.

I assume you mean erand48_r, there, because erand48 is pretty standard.

> I don't
> see much value in supporting that, on both counts.  We're going to end
> up with the built-in erand48_r() on precisely those systems that use
> glibc, and our own everywhere else.  For the 25 SLOCs it's going cost
> us, I'd rather use the same code everywhere.

Maybe.  But if that's the approach we want to use, let's just call it
pg_erand48 in the code, and dispense with the alias macros as well as
all vestiges of configure support.

BTW, as far as the original plan of using random_r is concerned, how
did you manage to not run into this?
http://sourceware.org/bugzilla/show_bug.cgi?id=3662
I just wasted half an hour on that stupidity in an unrelated context...
        regards, tom lane


Re: pgbench internal contention

От
Robert Haas
Дата:
On Tue, Aug 2, 2011 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>> If you want erand48_r, best to provide that API, not kluge up some
>>> other functions.
>
>> ...because erand48() is a GNU extension with a stupid API.
>
> I assume you mean erand48_r, there, because erand48 is pretty standard.

Yes.

>> I don't
>> see much value in supporting that, on both counts.  We're going to end
>> up with the built-in erand48_r() on precisely those systems that use
>> glibc, and our own everywhere else.  For the 25 SLOCs it's going cost
>> us, I'd rather use the same code everywhere.
>
> Maybe.  But if that's the approach we want to use, let's just call it
> pg_erand48 in the code, and dispense with the alias macros as well as
> all vestiges of configure support.

Works for me.  Just to confirm, that means we'd also change GEQO to
use pg_erand48().

> BTW, as far as the original plan of using random_r is concerned, how
> did you manage to not run into this?
> http://sourceware.org/bugzilla/show_bug.cgi?id=3662
> I just wasted half an hour on that stupidity in an unrelated context...

Good grief.  It's hard to imagine a more user-hostile attitude than
the one taken there.  I did run into that precise issue, but managed
to stumble on what is apparently the officially sanctioned method of
working around it - viz, zeroing the state array.  However, that bug
report is a pretty compelling argument for the position that expecting
any of the GNU blah_r() functions to behave halfway sanely or be
properly documented is a pipe dream.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgbench internal contention

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 2, 2011 at 8:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe. �But if that's the approach we want to use, let's just call it
>> pg_erand48 in the code, and dispense with the alias macros as well as
>> all vestiges of configure support.

> Works for me.  Just to confirm, that means we'd also change GEQO to
> use pg_erand48().

Right, that's what I was thinking.
        regards, tom lane


Re: pgbench internal contention

От
Bruce Momjian
Дата:
Robert Haas wrote:
> Works for me.  Just to confirm, that means we'd also change GEQO to
> use pg_erand48().
> 
> > BTW, as far as the original plan of using random_r is concerned, how
> > did you manage to not run into this?
> > http://sourceware.org/bugzilla/show_bug.cgi?id=3662
> > I just wasted half an hour on that stupidity in an unrelated context...
> 
> Good grief.  It's hard to imagine a more user-hostile attitude than
> the one taken there.  I did run into that precise issue, but managed

Yes, it is scary to think people are relying on software written by
people which such a poor attitude toward quality.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +