Re: tweaking MemSet() performance

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: tweaking MemSet() performance
Дата
Msg-id 200208291937.g7TJbQC20180@candle.pha.pa.us
обсуждение исходный текст
Ответ на tweaking MemSet() performance  (Neil Conway <neilc@samurai.com>)
Список pgsql-hackers
I consider this a very good test.  As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then.  I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.

I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.

I tested the following program:    #include <string.h>#include "postgres.h"#undef    MEMSET_LOOP_LIMIT#define
MEMSET_LOOP_LIMIT 1000000intmain(int argc, char **argv){    int        len = atoi(argv[1]);    char        buffer[len];
  long long    i;    for (i = 0; i < 9900000; i++)        MemSet(buffer, 0, len);    return 0;}
 

and, yes, -O2 is significant!  Looks like we use -O2 on all platforms
that use GCC so we should be OK there.

I tested with the following script:
for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";time tst1 $TIME; done

and got for MemSet:*64real    0m1.001suser    0m1.000ssys     0m0.003s*128real    0m1.578suser    0m1.567ssys
0m0.013s*256real   0m2.723suser    0m2.723ssys     0m0.003s*512real    0m5.044suser    0m5.029ssys
0m0.013s*1024real   0m9.621suser    0m9.621ssys     0m0.003s*2048real    0m18.821suser    0m18.811ssys
0m0.013s*4096real   0m37.266suser    0m37.266ssys     0m0.003s
 

and for memset():*64real    0m1.813suser    0m1.801ssys     0m0.014s*128real    0m2.489suser    0m2.499ssys
0m0.994s*256real   0m4.397suser    0m5.389ssys     0m0.005s*512real    0m5.186suser    0m6.170ssys
0m0.015s*1024real   0m6.676suser    0m6.676ssys     0m0.003s*2048real    0m9.766suser    0m9.776ssys
0m0.994s*4096real   0m15.970suser    0m15.954ssys     0m0.003s
 

so for BSD/OS, the break-even is 512.

I am on a dual P3/550 using 2.95.2.  I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.

I suggest changing the MEMSET_LOOP_LIMIT to 512.

---------------------------------------------------------------------------

Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
> 
>  *    We got the 64 number by testing this against the stock memset() on
>  *    BSD/OS 3.0. Larger values were slower.    bjm 1997/09/11
>  *
>  *    I think the crossover point could be a good deal higher for
>  *    most platforms, actually.  tgl 2000-03-19
> 
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
> 
> The test program was:
> 
> #include <string.h>
> #include "postgres.h"
> 
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
> 
> int
> main(void)
> {
>     char buffer[BUFFER_SIZE];
>     long long i;
> 
>     for (i = 0; i < 99000000; i++)
>     {
>         MemSet(buffer, 0, sizeof(buffer));
>     }
> 
>     return 0;
> }
> 
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
> 
> It was compiled like so:
> 
>         gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
> 
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
> 
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
> 
> BUFFER_SIZE = 64
>         MemSet() -> 2.756, 2.810, 2.789
>         memset() -> 13.844, 13.782, 13.778
> 
> BUFFER_SIZE = 128
>         MemSet() -> 5.848, 5.989, 5.861
>         memset() -> 15.637, 15.631, 15.631
> 
> BUFFER_SIZE = 256
>         MemSet() -> 9.602, 9.652, 9.633
>         memset() -> 19.305, 19.370, 19.302
> 
> BUFFER_SIZE = 512
>         MemSet() -> 17.416, 17.462, 17.353
>         memset() -> 26.657, 26.658, 26.678
> 
> BUFFER_SIZE = 1024
>         MemSet() -> 32.144, 32.179, 32.086
>         memset() -> 41.186, 41.115, 41.176
> 
> BUFFER_SIZE = 2048
>         MemSet() -> 60.39, 60.48, 60.32
>         memset() -> 71.19, 71.18, 71.17
> 
> BUFFER_SIZE = 4096
>         MemSet() -> 118.29, 120.07, 118.69
>         memset() -> 131.40, 131.41
> 
> ... at which point I stopped benchmarking.
> 
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
> 
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
> 
> Cheers,
> 
> Neil
> 
> -- 
> Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


--ELM1030676823-15578-0_--


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "D'Arcy J.M. Cain"
Дата:
Сообщение: Re: Type definition process (was Re: MemoryContextAlloc: invalid request size 1934906735)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Type definition process (was Re: MemoryContextAlloc: invalid request size 1934906735)