[HACKERS] atomics/arch-x86.h is stupider than atomics/generic-gcc.h?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема [HACKERS] atomics/arch-x86.h is stupider than atomics/generic-gcc.h?
Дата
Msg-id 17694.1504665058@sss.pgh.pa.us
обсуждение исходный текст
Список pgsql-hackers
I spent some time trying to devise a suitable performance microbenchmark
for the atomic ops, in pursuit of whether the proposal at
https://commitfest.postgresql.org/14/1154/
is worth doing.  I came up with the attached very simple-minded test
case, which you run with something like

    create function my_test_atomic_ops(bigint) returns int
    strict volatile language c as '/path/to/atomic-perf-test.so';

    \timing

    select my_test_atomic_ops(1000000000);

The performance of a single process running this is interesting, but
only mildly so: what we want to know about is what happens when you
run two or more calls concurrently.

On my primary server, dual quad-core Xeon E5-2609 @ 2.4GHz, RHEL6
(so gcc version 4.4.7 20120313 (Red Hat 4.4.7-18)), in a disable-cassert
build, I see that a single process running the 1G-iterations case
repeatably takes about 9600ms.  Two competing processes take roughly
1 minute to do twice as much work.  (The two processes tend to finish
at significantly different times, indicating that this box's method
for resolving bus conflicts isn't 100% fair.  I'm taking the average
of the two runtimes as a representative number.)

This is with no source-code changes, meaning that what I'm testing is
arch-x86.h's version of pg_atomic_fetch_add_u32, which compiles to
basically

    lock
    xaddl    %eax,(%rdx)

I then diked out that version, so that the build fell back to
generic-gcc.h's version of the function.  With the test program
as attached, the inner loop is basically the same, and so is the
runtime.  But what I was testing before that was a version that
ignored the result of pg_atomic_fetch_add_u32,

    while (count-- > 0)
    {
        (void) pg_atomic_fetch_add_u32(myptr, 1);
    }

and what I was quite surprised to see was a single-thread time of
9600ms and a two-thread time of ~40s.  The reason was not too far
to seek: gcc is smart enough to notice that it doesn't need the
result of pg_atomic_fetch_add_u32, and so it compiles that to just

    lock addl    $1, (%rax)

which is evidently significantly more efficient than the xaddl under
contention load.

Or in words of one syllable: at least for pg_atomic_fetch_add_u32,
we are working hard in atomics/arch-x86.h to get worse code than
gcc would give us natively.  (And, in case you didn't notice, this
is far from the latest and shiniest gcc.)

This case is not to be dismissed as insignificant either, since of the
three non-test occurrences of pg_atomic_fetch_add_u32 in our tree, two
ignore the result.

So I think we'd be well advised to cast a doubtful eye at the asm
constructs we've got here, and figure out which ones are really
meaningfully smarter than gcc's primitives.

            regards, tom lane

#include "postgres.h"

#include "fmgr.h"
#include "storage/lwlock.h"
#include "storage/shmem.h"


PG_MODULE_MAGIC;

static pg_atomic_uint32 *globptr = NULL;

int32 globjunk = 0;

PG_FUNCTION_INFO_V1(my_test_atomic_ops);

Datum
my_test_atomic_ops(PG_FUNCTION_ARGS)
{
    int64        count = PG_GETARG_INT64(0);
    int32 result;
    pg_atomic_uint32 *myptr;
    int32 junk = 0;

    if (globptr == NULL)
    {
        /* First time through in this process; find shared memory */
        bool        found;

        LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);

        globptr = ShmemInitStruct("my_test_atomic_ops",
                                sizeof(*globptr),
                                &found);

        if (!found)
        {
            /* First time through anywhere */
            pg_atomic_init_u32(globptr, 0);
        }

        LWLockRelease(AddinShmemInitLock);
    }

    myptr = globptr;

    while (count-- > 0)
    {
        junk += pg_atomic_fetch_add_u32(myptr, 1);
    }

    globjunk += junk;

    result = pg_atomic_read_u32(myptr);

    PG_RETURN_INT32(result);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] increasing the default WAL segment size
Следующее
От: Haribabu Kommi
Дата:
Сообщение: Re: [HACKERS] pg_stat_wal_write statistics view