Re: FunctionCallN improvement.
От | Darcy Buskermolen |
---|---|
Тема | Re: FunctionCallN improvement. |
Дата | |
Msg-id | 200502011410.35048.darcy@wavefire.com обсуждение исходный текст |
Ответ на | Re: FunctionCallN improvement. (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On February 1, 2005 01:23 pm, Tom Lane wrote: > a_ogawa <a_ogawa@hi-ho.ne.jp> writes: > > I made the test program to measure the effect of this macro. > > Well, if we're going to be tense about this, let's actually be tense > about it. Your test program isn't a great model for what's going to > happen in fmgr.c, because you've designed it so that Nargs cannot be > known at compile time. In the fmgr routines, Nargs is certainly a > compile-time constant, and so implementations that can exploit that > will have an advantage. > > Also, we can take advantage of some improvements in the MemSet macro > family that occurred since fmgr.c was last rewritten. I see no reason > not to use MemSetLoop directly, since the fcinfo struct will have the > correct size and correct alignment. > > In addition to your original macro, I tried two other variants: one > that uses MemSetLoop with a loop length rounded to the next higher > multiple of 4, and one that expects the argisnull settings to be written > out directly, in the same style as is currently done in FunctionCall1 > and FunctionCall2. (This amounts to unrolling the loop in the original > macro; something that could be done by the compiler given a constant > Nargs, but it seems not to be done by the compilers I tested.) > > I tested two cases: NARGS = 2, which is certainly the single most > critical case, and NARGS = 5, which is probably the largest number > of arguments that we really care too much about. (You have to hand-edit > the test program and recompile to adjust NARGS, since the point is to > treat it as a compile-time constant.) > > Here are wall-clock timings on the architectures and compilers I have at > hand: > > NARGS = 2 > MemSetLoop OrigMacro SetMacro Unrolled > > i386, gcc -O2 37.655s 6.411s 7.060s 6.362s > > i386, gcc -O6 35.420s 1.129s 1.814s 0.567s > > PPC, gcc -O2 54.033s 6.754s 11.138s 6.438s > > HPPA, gcc -O2 58.82s 10.38s 9.79s 7.85s > > HPPA, cc +O2 60.39s 13.43s 8.40s 7.31s > > NARGS = 5 > MemSetLoop OrigMacro SetMacro Unrolled > > i386, gcc -O2 37.566s 11.329s 7.688s 8.874s > > i386, gcc -O6 32.992s 5.928s 2.881s 0.566s > > PPC, gcc -O2 86.300s 19.048s 14.626s 8.751s > > HPPA, gcc -O2 58.28s 15.09s 13.42s 14.37s > > HPPA, cc +O2 58.23s 8.96s 12.88s 7.28s I see simular comparitive times on an UltraSparc running Solaris. > > (I used different loop counts on the different machines to get similar > overall times for the memset case; so it's OK to compare numbers across > a row but not down a column.) > > Based on this I think we ought to go with the "unrolled" approach, ie, > we'll create a macro to initialize the fixed fields of fcinfo but fill > in the arg and argisnull arrays with code like what's already in > FunctionCall2: > > fcinfo.arg[0] = arg1; > fcinfo.arg[1] = arg2; > fcinfo.argnull[0] = false; > fcinfo.argnull[1] = false; > > If anyone would like to try the results on other platforms, my test > program is attached. > > regards, tom lane -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com
В списке pgsql-hackers по дате отправления: