Обсуждение: Align large shared memory allocations

Поиск

Список

Период

Сортировка

Align large shared memory allocations

От

Manfred Spraul

Дата:

20 сентября 2003 г., 20:45:39

Attached is a patch that aligns large shared memory allocations beyond
MAXIMUM_ALIGNOF. The reason for this is that Intel's cpus have a fast
path for bulk memory copies that only works with aligned addresses. It's
possible that other cpus have similar restrictions.
With 7.3.4, it achives a 5% performance gain with pgbench. It has no
effect with 7.3.3, because the buffers are already aligned by chance. I
haven't properly tested 7.4cvs yet.

One problem is the "32" - it's arbitrary, it probably belongs into an
arch dependant header file. But where?

--
    Manfred
diff -u pgsql.orig/src/backend/storage/ipc/shmem.c pgsql/src/backend/storage/ipc/shmem.c
--- pgsql.orig/src/backend/storage/ipc/shmem.c    2003-09-20 20:17:08.000000000 +0200
+++ pgsql/src/backend/storage/ipc/shmem.c    2003-09-20 20:34:21.000000000 +0200
@@ -131,6 +131,7 @@
 void *
 ShmemAlloc(Size size)
 {
+    uint32        newStart;
     uint32        newFree;
     void       *newSpace;

@@ -146,10 +147,21 @@

     SpinLockAcquire(ShmemLock);

-    newFree = shmemseghdr->freeoffset + size;
+    newStart = shmemseghdr->freeoffset;
+    if (size >= BLCKSZ)
+    {
+        /* Align BLCKSZ sized buffers even further:
+         * - the costs are small
+         * - some cpus (most notably Intel Pentium III)
+         *   prefer well-aligned addresses for memory copies
+         */
+        newStart = TYPEALIGN(32, newStart);
+    }
+
+    newFree = newStart + size;
     if (newFree <= shmemseghdr->totalsize)
     {
-        newSpace = (void *) MAKE_PTR(shmemseghdr->freeoffset);
+        newSpace = (void *) MAKE_PTR(newStart);
         shmemseghdr->freeoffset = newFree;
     }
     else

Re: Align large shared memory allocations

От

Tom Lane

Дата:

20 сентября 2003 г., 21:55:44

Manfred Spraul <manfred@colorfullife.com> writes:
> Attached is a patch that aligns large shared memory allocations beyond
> MAXIMUM_ALIGNOF. The reason for this is that Intel's cpus have a fast
> path for bulk memory copies that only works with aligned addresses.

This patch is missing a demonstration that it's actually worth anything.
What kind of performance gain do you get?

> One problem is the "32" - it's arbitrary, it probably belongs into an
> arch dependant header file. But where?

We don't really have arch-dependent header files.  What I'd be inclined
to do is "#define ALIGNOF_BUFFER 32" in pg_config_manual.h, then
#define BUFFERALIGN(LEN) to parallel the other TYPEALIGN macros in c.h,
and finally use that in the ShmemAlloc code.

            regards, tom lane

Re: Align large shared memory allocations

От

Manfred Spraul

Дата:

21 сентября 2003 г., 10:50:27

Tom Lane wrote:

>Manfred Spraul <manfred@colorfullife.com> writes:
>
>
>>Attached is a patch that aligns large shared memory allocations beyond
>>MAXIMUM_ALIGNOF. The reason for this is that Intel's cpus have a fast
>>path for bulk memory copies that only works with aligned addresses.
>>
>>
>
>This patch is missing a demonstration that it's actually worth anything.
>What kind of performance gain do you get?
>
>
7.4cvs on a 1.13 GHz Intel Celeron mobile, 384 MB RAM, "Severn" RedHat
Linux 2.4 beta, postmaster -N 30 -B 64, data directory on ramdisk,
pgbench -c 10 -s 11 -t 1000:
Without the patch: 124 tps
with the patch: 130 tps.

I've reduced the buffer setting to 64 because without that, a too large
part of the database was cached by postgres. I expect that with all
Intel Pentium III chips, it will be worth 10-20% less system time. I had
around 30% system time after reducing the number of buffers, thus the
~5% performance improvement.

>We don't really have arch-dependent header files.  What I'd be inclined
>to do is "#define ALIGNOF_BUFFER 32" in pg_config_manual.h, then
>#define BUFFERALIGN(LEN) to parallel the other TYPEALIGN macros in c.h,
>and finally use that in the ShmemAlloc code.
>
>
Ok, new patch attached.

--
    Manfred
Index: src/backend/storage/ipc/shmem.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/ipc/shmem.c,v
retrieving revision 1.70
diff -u -r1.70 shmem.c
--- src/backend/storage/ipc/shmem.c    4 Aug 2003 02:40:03 -0000    1.70
+++ src/backend/storage/ipc/shmem.c    21 Sep 2003 07:53:13 -0000
@@ -131,6 +131,7 @@
 void *
 ShmemAlloc(Size size)
 {
+    uint32        newStart;
     uint32        newFree;
     void       *newSpace;

@@ -146,10 +147,14 @@

     SpinLockAcquire(ShmemLock);

-    newFree = shmemseghdr->freeoffset + size;
+    newStart = shmemseghdr->freeoffset;
+    if (size >= BLCKSZ)
+        newStart = BUFFERALIGN(newStart);
+
+    newFree = newStart + size;
     if (newFree <= shmemseghdr->totalsize)
     {
-        newSpace = (void *) MAKE_PTR(shmemseghdr->freeoffset);
+        newSpace = (void *) MAKE_PTR(newStart);
         shmemseghdr->freeoffset = newFree;
     }
     else
Index: src/include/c.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/c.h,v
retrieving revision 1.152
diff -u -r1.152 c.h
--- src/include/c.h    4 Aug 2003 02:40:10 -0000    1.152
+++ src/include/c.h    21 Sep 2003 07:53:14 -0000
@@ -529,6 +529,7 @@
 #define LONGALIGN(LEN)            TYPEALIGN(ALIGNOF_LONG, (LEN))
 #define DOUBLEALIGN(LEN)        TYPEALIGN(ALIGNOF_DOUBLE, (LEN))
 #define MAXALIGN(LEN)            TYPEALIGN(MAXIMUM_ALIGNOF, (LEN))
+#define BUFFERALIGN(LEN)        TYPEALIGN(ALIGNOF_BUFFER, (LEN))


 /* ----------------------------------------------------------------
Index: src/include/pg_config_manual.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/pg_config_manual.h,v
retrieving revision 1.5
diff -u -r1.5 pg_config_manual.h
--- src/include/pg_config_manual.h    4 Aug 2003 00:43:29 -0000    1.5
+++ src/include/pg_config_manual.h    21 Sep 2003 07:53:14 -0000
@@ -176,6 +176,14 @@
  */
 #define MAX_RANDOM_VALUE  (0x7FFFFFFF)

+/*
+ * Alignment of the disk blocks in the shared memory area.
+ * A significant amount of the total system time is required for
+ * copying disk blocks between the os buffers and the cache in the
+ * shared memory area. Some cpus (most notably Intel Pentium III)
+ * prefer well-aligned addresses for memory copies.
+ */
+#define ALIGNOF_BUFFER    32

 /*
  *------------------------------------------------------------------------

Re: Align large shared memory allocations

От

Tom Lane

Дата:

21 сентября 2003 г., 18:04:27

Manfred Spraul <manfred@colorfullife.com> writes:
> Tom Lane wrote:
>> This patch is missing a demonstration that it's actually worth anything.
>> What kind of performance gain do you get?
>>
> 7.4cvs on a 1.13 GHz Intel Celeron mobile, 384 MB RAM, "Severn" RedHat
> Linux 2.4 beta, postmaster -N 30 -B 64, data directory on ramdisk,
> pgbench -c 10 -s 11 -t 1000:
> Without the patch: 124 tps
> with the patch: 130 tps.

I tried it on an Intel box here (P4 I think).  Using postmaster -B 64 -N 30
and three tries of pgbench -s 10 -c 1 -t 1000 after creation of the test
tables, I get:

tps = 92.461144 (including connections establishing)
tps = 92.500572 (excluding connections establishing)

tps = 88.078814 (including connections establishing)
tps = 88.115905 (excluding connections establishing)

tps = 85.434473 (including connections establishing)
tps = 85.468807 (excluding connections establishing)

and with the patch:

tps = 122.927066 (including connections establishing)
tps = 122.998129 (excluding connections establishing)

tps = 110.716370 (including connections establishing)
tps = 110.773928 (excluding connections establishing)

tps = 138.155991 (including connections establishing)
tps = 138.245777 (excluding connections establishing)

So there's definitely a visible difference on recent Pentiums.  It might
not help on other CPUs, but we can surely waste a couple dozen bytes in
the hope that it might.

Patch applied.  Do you want to look at making it happen for local
buffers and buffile.c as well?

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Align large shared memory allocations

Align large shared memory allocations

Re: Align large shared memory allocations

Re: Align large shared memory allocations

Re: Align large shared memory allocations