Обсуждение: remove BufferBlockPointers for speed and space
It is said that the BufferBlockPointers is used to speedup the BufferGetBlock() macro. I compared three ways of getting block pointers. I.e., existing method (arrary method), calculating block pointer by adding base addr and offset*blockid method (mul method) and optimizing mul method by using bit shift (shift method). All of them calculate the block pointer 80000 times (i.e., the BufferBlockPointers array is of size 80000), and each take 3 rounds. The result is: SunOS/gcc 3.2 duration round 1 of array method: 4.179 ms duration round 2 of array method: 4.160 ms duration round 3 of array method: 4.143 ms duration round 1 of mul method: 3.311 ms duration round 2 of mul method: 3.233 ms duration round 3 of mul method: 3.233 ms duration round 1 of shift method: 3.554 ms duration round 2 of shift method: 3.235 ms duration round 3 of shift method: 3.233 ms Linux/gcc 3.2 duration round 1 of array method: 0.422 ms duration round 2 of array method: 0.324 ms duration round 3 of array method: 0.354 ms duration round 1 of mul method: 0.271 ms duration round 2 of mul method: 0.248 ms duration round 3 of mul method: 0.304 ms duration round 1 of shift method: 0.322 ms duration round 2 of shift method: 0.239 ms duration round 3 of shift method: 0.265 ms We can conclude that: (1) mul or shift are definitely better than array method; (2) mul and shift are comparable; So we can remove BufferBlockPointers for speed and space. Regards, Qingqing -------- Updated files: /include/storage/bufmgr.h /storage/buffer/buf_init.c /storage/buffer/bufmgr.c Index: bufmgr.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/storage/bufmgr.h,v retrieving revision 1.94 diff -c -r1.94 bufmgr.h *** bufmgr.h 8 Aug 2005 03:12:16 -0000 1.94 --- bufmgr.h 11 Aug 2005 05:15:04 -0000 *************** *** 33,39 **** extern int bgwriter_all_maxpages; /* in buf_init.c */ ! extern DLLIMPORT Block *BufferBlockPointers; extern DLLIMPORT int32 *PrivateRefCount; /* in localbuf.c */ --- 33,39 ---- extern int bgwriter_all_maxpages; /* in buf_init.c */ ! extern DLLIMPORT char *BufferBlocks; extern DLLIMPORT int32 *PrivateRefCount; /* in localbuf.c */ *************** *** 107,113 **** BufferIsLocal(buffer) ? \ LocalBufferBlockPointers[-(buffer) - 1] \ : \ ! BufferBlockPointers[(buffer) - 1] \ ) /* --- 107,113 ---- BufferIsLocal(buffer) ? \ LocalBufferBlockPointers[-(buffer) - 1] \ : \ ! BufferBlocks + ((buffer) - 1)*BLCKSZ \ ) /* Index: buf_init.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/buf_init.c,v retrieving revision 1.74 diff -c -r1.74 buf_init.c *** buf_init.c 8 Aug 2005 03:11:44 -0000 1.74 --- buf_init.c 11 Aug 2005 05:17:38 -0000 *************** *** 19,28 **** BufferDesc *BufferDescriptors; - Block *BufferBlockPointers; int32 *PrivateRefCount; ! ! static char *BufferBlocks; /* statistics counters */ long int ReadBufferCount; --- 19,26 ---- BufferDesc *BufferDescriptors; int32 *PrivateRefCount; ! char *BufferBlocks; /* statistics counters */ long int ReadBufferCount; *************** *** 154,183 **** void InitBufferPoolAccess(void) { - char *block; - int i; - /* * Allocate and zero local arrays of per-buffer info. */ - BufferBlockPointers = (Block *) calloc(NBuffers, - sizeof(*BufferBlockPointers)); PrivateRefCount = (int32 *) calloc(NBuffers, sizeof(*PrivateRefCount)); - - /* - * Construct addresses for the individual buffer data blocks. We do - * this just to speed up the BufferGetBlock() macro. (Since the - * addresses should be the same in every backend, we could inherit - * this data from the postmaster --- but in the EXEC_BACKEND case - * that doesn't work.) - */ - block = BufferBlocks; - for (i = 0; i < NBuffers; i++) - { - BufferBlockPointers[i] = (Block) block; - block += BLCKSZ; - } } /* --- 152,162 ---- Index: bufmgr.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.192 diff -c -r1.192 bufmgr.c *** bufmgr.c 8 Aug 2005 19:44:22 -0000 1.192 --- bufmgr.c 11 Aug 2005 05:18:35 -0000 *************** *** 54,60 **** /* Note: these two macros only work on shared buffers, not local ones! */ ! #define BufHdrGetBlock(bufHdr) BufferBlockPointers[(bufHdr)->buf_id] #define BufferGetLSN(bufHdr) (*((XLogRecPtr*) BufHdrGetBlock(bufHdr))) /* Note: this macro only works on local buffers, not shared ones! */ --- 54,60 ---- /* Note: these two macros only work on shared buffers, not local ones! */ ! #define BufHdrGetBlock(bufHdr) (BufferBlocks + ((bufHdr)->buf_id)*BLCKSZ) #define BufferGetLSN(bufHdr) (*((XLogRecPtr*) BufHdrGetBlock(bufHdr))) /* Note: this macro only works on local buffers, not shared ones! */
On Thu, 11 Aug 2005, Qingqing Zhou wrote: > It is said that the BufferBlockPointers is used to speedup the > BufferGetBlock() macro. I compared three ways of getting block pointers. > I.e., existing method (arrary method), calculating block pointer by adding > base addr and offset*blockid method (mul method) and optimizing mul method > by using bit shift (shift method). All of them calculate the block pointer > 80000 times (i.e., the BufferBlockPointers array is of size 80000), and each > take 3 rounds. > > The result is: > > SunOS/gcc 3.2 > duration round 1 of array method: 4.179 ms > duration round 2 of array method: 4.160 ms > duration round 3 of array method: 4.143 ms > duration round 1 of mul method: 3.311 ms > duration round 2 of mul method: 3.233 ms > duration round 3 of mul method: 3.233 ms > duration round 1 of shift method: 3.554 ms > duration round 2 of shift method: 3.235 ms > duration round 3 of shift method: 3.233 ms > > Linux/gcc 3.2 > duration round 1 of array method: 0.422 ms > duration round 2 of array method: 0.324 ms > duration round 3 of array method: 0.354 ms > duration round 1 of mul method: 0.271 ms > duration round 2 of mul method: 0.248 ms > duration round 3 of mul method: 0.304 ms > duration round 1 of shift method: 0.322 ms > duration round 2 of shift method: 0.239 ms > duration round 3 of shift method: 0.265 ms > > We can conclude that: > (1) mul or shift are definitely better than array method; > (2) mul and shift are comparable; Do you have results for more recent gcc releases? Thanks, Gavin
Gavin Sherry <swm@linuxworld.com.au> writes: > On Thu, 11 Aug 2005, Qingqing Zhou wrote: >> It is said that the BufferBlockPointers is used to speedup the >> BufferGetBlock() macro. I compared three ways of getting block pointers. > Do you have results for more recent gcc releases? Or more than one hardware architecture (which you didn't even say what you tested...) Also, I would like to see the actual test code. I wonder whether what you measured is the ability of the compiler to optimize references to successive elements of an array inside a loop; that has little or nothing to do with the typical usage of BufferGetBlock(). regards, tom lane
On Thu, 11 Aug 2005, Tom Lane wrote: > Gavin Sherry <swm@linuxworld.com.au> writes: > > On Thu, 11 Aug 2005, Qingqing Zhou wrote: > >> It is said that the BufferBlockPointers is used to speedup the > >> BufferGetBlock() macro. I compared three ways of getting block pointers. > > > Do you have results for more recent gcc releases? > > Or more than one hardware architecture (which you didn't even say what > you tested...) Well, he tested on SunOS (!) and Linux -- I presume that's two architectures. But yeah, that definately does mean that the results hold true for our user base. If we had the code (and if it does measure the right thing), then we can easily test on a lot more architectures. Gavin
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > > Also, I would like to see the actual test code. I wonder whether what > you measured is the ability of the compiler to optimize references to > successive elements of an array inside a loop; that has little or > nothing to do with the typical usage of BufferGetBlock(). > The source code is attached. compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out that the timing is reduced a lot so not believable. --- /* * testbuf.c */ #include <stdio.h> #include <sys/file.h> #include <sys/param.h> #include <sys/stat.h> #include <sys/time.h> #include <unistd.h> #include <fcntl.h> #define BLCKSZ 8192 #define NBuffers 80000 typedef void* Block; int main(void) { int i, round, method; Block k, start; struct timeval start_t, stop_t; long usecs; Block *array = (Block *) calloc(NBuffers, sizeof(Block)); start = (Block)0xff3386; for (i = 0; i < NBuffers; i++) array[i] = start + BLCKSZ*i; for (method = 0; method < 3; method ++) { start = (Block)0xff3386; for (round = 0; round < 3; round ++) { gettimeofday(&start_t, NULL); if (method == 0) { for (i = 0; i < NBuffers; i++) k = array[i]; } if (method == 1) { for (i = 0; i < NBuffers; i++) k = start + i*BLCKSZ; } if (method == 2) { for (i = 0; i < NBuffers; i++) k = start + (i<<13); } gettimeofday(&stop_t, NULL); if (stop_t.tv_usec < start_t.tv_usec) { stop_t.tv_sec--; stop_t.tv_usec += 1000000; } usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000 + (long) (stop_t.tv_usec - start_t.tv_usec); fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n", round + 1, method==0?"array":method==1?"mul":"shift", (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + (stop_t.tv_usec - start_t.tv_usec) / 1000), (long) (stop_t.tv_usec - start_t.tv_usec) % 1000); } } }
FreeBSD 5.4 RELEASE gcc 3.4.2 on Intel dual PIII 1Ghz [postgres:~/develop/c/testbuf]$ gcc -Wall -o testbuf testbuf.c [postgres:~/develop/c/testbuf]$ ./testbuf duration round 1 of array method: 1.737 ms duration round 2 of array method: 1.676 ms duration round 3 of array method: 1.527 ms duration round 1 of mul method: 0.548 ms duration round 2 of mul method: 0.548 ms duration round 3 of mul method: 0.546 ms duration round 1 of shift method: 0.593 ms duration round 2 of shift method: 0.592 ms duration round 3 of shift method: 0.575 ms [postgres:~/develop/c/testbuf]$ gcc -Wall -O2 -o testbuf testbuf.c [postgres:~/develop/c/testbuf]$ ./testbuf duration round 1 of array method: 0.169 ms duration round 2 of array method: 0.165 ms duration round 3 of array method: 0.165 ms duration round 1 of mul method: 0.164 ms duration round 2 of mul method: 0.164 ms duration round 3 of mul method: 0.165 ms duration round 1 of shift method: 0.164 ms duration round 2 of shift method: 0.164 ms duration round 3 of shift method: 0.165 ms Looks to me like -O2 makes the difference very small (on this platform/gcc combo) - is 5/169 worth doing? BTW - I patched the program to stop gcc whining: --- testbuf.c.orig Thu Aug 11 18:41:29 2005 +++ testbuf.c Thu Aug 11 18:37:43 2005 @@ -2,6 +2,7 @@ * testbuf.c */ #include <stdio.h> +#include <stdlib.h> #include <sys/file.h> #include <sys/param.h> #include <sys/stat.h> @@ -66,6 +67,8 @@ (long) (stop_t.tv_usec - start_t.tv_usec) % 1000); } } + + return 0; } Qingqing Zhou wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >>Also, I would like to see the actual test code. I wonder whether what >>you measured is the ability of the compiler to optimize references to >>successive elements of an array inside a loop; that has little or >>nothing to do with the typical usage of BufferGetBlock(). >> > > > The source code is attached. > > compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out that > the timing is reduced a lot so not believable. > --- > > /* > * testbuf.c > */ > #include <stdio.h> > #include <sys/file.h> > #include <sys/param.h> > #include <sys/stat.h> > #include <sys/time.h> > #include <unistd.h> > #include <fcntl.h> > > #define BLCKSZ 8192 > #define NBuffers 80000 > > typedef void* Block; > > int main(void) > { > int i, round, method; > Block k, start; > struct timeval start_t, stop_t; > long usecs; > Block *array = (Block *) calloc(NBuffers, sizeof(Block)); > > start = (Block)0xff3386; > for (i = 0; i < NBuffers; i++) > array[i] = start + BLCKSZ*i; > > for (method = 0; method < 3; method ++) > { > start = (Block)0xff3386; > for (round = 0; round < 3; round ++) > { > gettimeofday(&start_t, NULL); > if (method == 0) > { > for (i = 0; i < NBuffers; i++) > k = array[i]; > } > if (method == 1) > { > for (i = 0; i < NBuffers; i++) > k = start + i*BLCKSZ; > } > if (method == 2) > { > for (i = 0; i < NBuffers; i++) > k = start + (i<<13); > } > gettimeofday(&stop_t, NULL); > > if (stop_t.tv_usec < start_t.tv_usec) > { > stop_t.tv_sec--; > stop_t.tv_usec += 1000000; > } > > usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000 > + (long) (stop_t.tv_usec - start_t.tv_usec); > > fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n", > round + 1, > method==0?"array":method==1?"mul":"shift", > (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + > (stop_t.tv_usec - start_t.tv_usec) / 1000), > (long) (stop_t.tv_usec - start_t.tv_usec) % 1000); > } > } > } > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > >
Mark Kirkwood wrote: > Looks to me like -O2 makes the difference very small (on this > platform/gcc combo) - is 5/169 worth doing? Ahem - misunderstood your comment here, sorry. > Qingqing Zhou wrote: >> compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out >> that >> the timing is reduced a lot so not believable.
Gavin Sherry wrote: >>Or more than one hardware architecture (which you didn't even say what >>you tested...) >> >> > >Well, he tested on SunOS (!) and Linux -- I presume that's two >architectures. > > Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or look at a buildfarm solaris build info) So what he said he tested could mean Solaris/x86 + Linux x86 or Solaris/Sparc + Linux Sparc among other possible combos. Given the results I assume they are at least not the same box ;-) cheers andrew
On Thu, 11 Aug 2005, Andrew Dunstan wrote: > > > Gavin Sherry wrote: > > >>Or more than one hardware architecture (which you didn't even say what > >>you tested...) > >> > >> > > > >Well, he tested on SunOS (!) and Linux -- I presume that's two > >architectures. > > > > > > Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or > look at a buildfarm solaris build info) True. But my previous experience in university environments is that SunOS usually refers to SunOS 2.6 -- and the performance indicates old hardware. The thing is, compilser optimised versions of the test reveal very little difference in performance. This may be because the compiler is very good at optimising sequential annd predictable access to the array. Instead, we should mimic what we see in the real world: random access. Gavin
Gavin Sherry wrote: >>> >>> >>Sun still calls Solaris SunOs - try doing uname -s on a Solaris box (or >>look at a buildfarm solaris build info) >> >> > >True. But my previous experience in university environments is that SunOS >usually refers to SunOS 2.6 > > I think I must be a lot older then you. To me SunOs means the old pre-Solaris BSD-based OS ;-) cheers andrew
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > The source code is attached. > gettimeofday(&start_t, NULL); > if (method == 0) > { > for (i = 0; i < NBuffers; i++) > k = array[i]; > } > if (method == 1) > { > for (i = 0; i < NBuffers; i++) > k = start + i*BLCKSZ; > } > if (method == 2) > { > for (i = 0; i < NBuffers; i++) > k = start + (i<<13); > } > gettimeofday(&stop_t, NULL); This is definitely going to tell us a lot more about the compiler's loop strength reduction algorithms than it is going to tell about the time to evaluate any one of these expressions in isolation. What's worse, the compiler would be quite within its rights to detect that k isn't used anywhere, and optimize the loop away completely. What I would suggest is first to fill another array with some large number of buffer numbers (randomly chosen from 1..N) and then time loops of the form for (i = 0; i < arraysize; i++) { bn = buffernumbers[i]; bufferlocs[i] = BufferGetBlock(bn); } for all three possible definitions of BufferGetBlock (where both arrays are global variables, just to be sure the compiler doesn't think it can discard the store). This should give some numbers worth trusting. regards, tom lane
Hello, With gcc-4 on Ubuntu AMD Athlon 2600 Barton: jd@jd:~$ ./a.out duration round 1 of array method: 0.539 ms duration round 2 of array method: 0.529 ms duration round 3 of array method: 0.530 ms duration round 1 of mul method: 0.258 ms duration round 2 of mul method: 5.563 ms duration round 3 of mul method: 0.258 ms duration round 1 of shift method: 0.240 ms duration round 2 of shift method: 0.248 ms duration round 3 of shift method: 0.240 ms jd@jd:~$ Sincerely, Joshua D. Drake Qingqing Zhou wrote: >"Tom Lane" <tgl@sss.pgh.pa.us> writes: > > >>Also, I would like to see the actual test code. I wonder whether what >>you measured is the ability of the compiler to optimize references to >>successive elements of an array inside a loop; that has little or >>nothing to do with the typical usage of BufferGetBlock(). >> >> >> > >The source code is attached. > >compiled with "gcc testbuf.c". I tried -O2 actually, and it turns out that >the timing is reduced a lot so not believable. >--- > >/* > * testbuf.c > */ >#include <stdio.h> >#include <sys/file.h> >#include <sys/param.h> >#include <sys/stat.h> >#include <sys/time.h> >#include <unistd.h> >#include <fcntl.h> > >#define BLCKSZ 8192 >#define NBuffers 80000 > >typedef void* Block; > >int main(void) >{ > int i, round, method; > Block k, start; > struct timeval start_t, stop_t; > long usecs; > Block *array = (Block *) calloc(NBuffers, sizeof(Block)); > > start = (Block)0xff3386; > for (i = 0; i < NBuffers; i++) > array[i] = start + BLCKSZ*i; > > for (method = 0; method < 3; method ++) > { > start = (Block)0xff3386; > for (round = 0; round < 3; round ++) > { > gettimeofday(&start_t, NULL); > if (method == 0) > { > for (i = 0; i < NBuffers; i++) > k = array[i]; > } > if (method == 1) > { > for (i = 0; i < NBuffers; i++) > k = start + i*BLCKSZ; > } > if (method == 2) > { > for (i = 0; i < NBuffers; i++) > k = start + (i<<13); > } > gettimeofday(&stop_t, NULL); > > if (stop_t.tv_usec < start_t.tv_usec) > { > stop_t.tv_sec--; > stop_t.tv_usec += 1000000; > } > > usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000 > + (long) (stop_t.tv_usec - start_t.tv_usec); > > fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n", > round + 1, > method==0?"array":method==1?"mul":"shift", > (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + > (stop_t.tv_usec - start_t.tv_usec) / 1000), > (long) (stop_t.tv_usec - start_t.tv_usec) % 1000); > } > } >} > > > >---------------------------(end of broadcast)--------------------------- >TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > >
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > > This is definitely going to tell us a lot more about the compiler's loop > strength reduction algorithms than it is going to tell about the time to > evaluate any one of these expressions in isolation. What's worse, the > compiler would be quite within its rights to detect that k isn't used > anywhere, and optimize the loop away completely. > Yes, it is possible. A safter way is to examine the assembling code. And I found there is no such optimization in Linux/gcc3.2. > What I would suggest is first to fill another array with some large > number of buffer numbers (randomly chosen from 1..N) and then time loops > of the form > > for (i = 0; i < arraysize; i++) > { > bn = buffernumbers[i]; > bufferlocs[i] = BufferGetBlock(bn); > } > > for all three possible definitions of BufferGetBlock (where both arrays > are global variables, just to be sure the compiler doesn't think it can > discard the store). This should give some numbers worth trusting. > I've patched the code according to your suggestion. Result is: SunOS/gcc 3.2 verify: 1308873472 :duration round 1 of array method: 8.906 ms verify: 1308873472 :duration round 2 of array method: 8.775 ms verify: 1308873472 :duration round 3 of array method: 8.761 ms verify: 1308873472 :duration round 1 of mul method: 6.454 ms verify: 1308873472 :duration round 2 of mul method: 6.545 ms verify: 1308873472 :duration round 3 of mul method: 6.535 ms verify: 1308873472 :duration round 1 of shift method: 6.534 ms verify: 1308873472 :duration round 2 of shift method: 6.597 ms verify: 1308873472 :duration round 3 of shift method: 6.540 ms Linux/gcc 3.2 verify: -2141277440 :duration round 1 of array method: 2.385 ms verify: -2141277440 :duration round 2 of array method: 2.116 ms verify: -2141277440 :duration round 3 of array method: 2.057 ms verify: -2141277440 :duration round 1 of mul method: 0.943 ms verify: -2141277440 :duration round 2 of mul method: 0.983 ms verify: -2141277440 :duration round 3 of mul method: 0.840 ms verify: -2141277440 :duration round 1 of shift method: 0.864 ms verify: -2141277440 :duration round 2 of shift method: 0.931 ms verify: -2141277440 :duration round 3 of shift method: 0.968 ms The test file is attached: --- /* * buftest.c */ #include <stdio.h> #include <stdlib.h> #include <sys/file.h> #include <sys/time.h> #include <unistd.h> #define BLCKSZ 8192 #define NBuffers 80000 typedef void* Block; Block *array = NULL; Block *reslt = NULL; int *bufnm = NULL; static int grand(int min, int max) { return (min + (int) (max * 1.0 * rand() / (RAND_MAX + 1.0))); } int main(void) { int n, i, round, method; Block start; struct timeval start_t, stop_t; long usecs; srandom(getpid()); array = (Block *) calloc(NBuffers, sizeof(Block)); reslt = (Block *) calloc(NBuffers, sizeof(Block)); bufnm = (int *) calloc(NBuffers, sizeof(int)); start = (Block)0xff3386; for (i = 0; i < NBuffers; i++) array[i] = start + BLCKSZ*i; for (i = 0; i < NBuffers; i++) bufnm[i] = grand(0, NBuffers); for (method = 0; method < 3; method ++) { start = (Block)0xff3386; for (round = 0; round < 3; round ++) { gettimeofday(&start_t, NULL); if (method == 0) { for (i = 0; i < NBuffers; i++) { n = bufnm[i]; reslt[i] = array[n]; } } if (method == 1) { for (i = 0; i < NBuffers; i++) { n = bufnm[i]; reslt[i] = start + n*BLCKSZ; } } if (method == 2) { for (i = 0; i < NBuffers; i++) { n = bufnm[i]; reslt[i] = start + (n<<13); } } gettimeofday(&stop_t, NULL); usecs = 0; for (i = 0; i< NBuffers; i++) usecs += (long) reslt[i]; fprintf(stdout, "verify: %ld :", usecs ); if (stop_t.tv_usec < start_t.tv_usec) { stop_t.tv_sec--; stop_t.tv_usec += 1000000; } usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 1000000 + (long) (stop_t.tv_usec - start_t.tv_usec); fprintf (stdout, "duration round %d of %s method: %ld.%03ld ms\n", round + 1, method==0?"array":method==1?"mul":"shift", (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 + (stop_t.tv_usec - start_t.tv_usec) / 1000), (long) (stop_t.tv_usec - start_t.tv_usec) % 1000); } } free(array); free(reslt); free(bufnm); return 0; }
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > I've patched the code according to your suggestion. Result is: > [ snip ] OK, that test seems a little more believable. One point you didn't consider is that on 64-bit machines, the integer bufnum really has to be coerced to size_t to avoid overflow if the buffer array exceeds 2Gb. (Which we don't support today, but might well by the end of day tomorrow, seeing that there's a patch in the queue about it.) But I ran the test case with the extra coercion on an IA64 machine at Red Hat, and got substantially the same results as you did: the array method is just slower. Another consideration is that the array is competing for L2 cache --- the test program can't really show that, since it has no other use for L2 cache, but in the context of the real database I suspect this is at least as much of a win as shaving a few nanoseconds off the BufferGetBlock macro itself. So ... patch applied, and thanks for the good idea! regards, tom lane