Re: tweaking MemSet() performance
| От | Bruce Momjian |
|---|---|
| Тема | Re: tweaking MemSet() performance |
| Дата | |
| Msg-id | 200208291937.g7TJbQC20180@candle.pha.pa.us обсуждение исходный текст |
| Ответ на | tweaking MemSet() performance (Neil Conway <neilc@samurai.com>) |
| Список | pgsql-hackers |
I consider this a very good test. As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then. I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.
I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.
I tested the following program: #include <string.h>#include "postgres.h"#undef MEMSET_LOOP_LIMIT#define
MEMSET_LOOP_LIMIT 1000000intmain(int argc, char **argv){ int len = atoi(argv[1]); char buffer[len];
long long i; for (i = 0; i < 9900000; i++) MemSet(buffer, 0, len); return 0;}
and, yes, -O2 is significant! Looks like we use -O2 on all platforms
that use GCC so we should be OK there.
I tested with the following script:
for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";time tst1 $TIME; done
and got for MemSet:*64real 0m1.001suser 0m1.000ssys 0m0.003s*128real 0m1.578suser 0m1.567ssys
0m0.013s*256real 0m2.723suser 0m2.723ssys 0m0.003s*512real 0m5.044suser 0m5.029ssys
0m0.013s*1024real 0m9.621suser 0m9.621ssys 0m0.003s*2048real 0m18.821suser 0m18.811ssys
0m0.013s*4096real 0m37.266suser 0m37.266ssys 0m0.003s
and for memset():*64real 0m1.813suser 0m1.801ssys 0m0.014s*128real 0m2.489suser 0m2.499ssys
0m0.994s*256real 0m4.397suser 0m5.389ssys 0m0.005s*512real 0m5.186suser 0m6.170ssys
0m0.015s*1024real 0m6.676suser 0m6.676ssys 0m0.003s*2048real 0m9.766suser 0m9.776ssys
0m0.994s*4096real 0m15.970suser 0m15.954ssys 0m0.003s
so for BSD/OS, the break-even is 512.
I am on a dual P3/550 using 2.95.2. I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.
I suggest changing the MEMSET_LOOP_LIMIT to 512.
---------------------------------------------------------------------------
Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
>
> * We got the 64 number by testing this against the stock memset() on
> * BSD/OS 3.0. Larger values were slower. bjm 1997/09/11
> *
> * I think the crossover point could be a good deal higher for
> * most platforms, actually. tgl 2000-03-19
>
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
>
> The test program was:
>
> #include <string.h>
> #include "postgres.h"
>
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
>
> int
> main(void)
> {
> char buffer[BUFFER_SIZE];
> long long i;
>
> for (i = 0; i < 99000000; i++)
> {
> MemSet(buffer, 0, sizeof(buffer));
> }
>
> return 0;
> }
>
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
>
> It was compiled like so:
>
> gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
>
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
>
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
>
> BUFFER_SIZE = 64
> MemSet() -> 2.756, 2.810, 2.789
> memset() -> 13.844, 13.782, 13.778
>
> BUFFER_SIZE = 128
> MemSet() -> 5.848, 5.989, 5.861
> memset() -> 15.637, 15.631, 15.631
>
> BUFFER_SIZE = 256
> MemSet() -> 9.602, 9.652, 9.633
> memset() -> 19.305, 19.370, 19.302
>
> BUFFER_SIZE = 512
> MemSet() -> 17.416, 17.462, 17.353
> memset() -> 26.657, 26.658, 26.678
>
> BUFFER_SIZE = 1024
> MemSet() -> 32.144, 32.179, 32.086
> memset() -> 41.186, 41.115, 41.176
>
> BUFFER_SIZE = 2048
> MemSet() -> 60.39, 60.48, 60.32
> memset() -> 71.19, 71.18, 71.17
>
> BUFFER_SIZE = 4096
> MemSet() -> 118.29, 120.07, 118.69
> memset() -> 131.40, 131.41
>
> ... at which point I stopped benchmarking.
>
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
>
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610)
359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square,
Pennsylvania19073
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--ELM1030676823-15578-0_--
В списке pgsql-hackers по дате отправления: