Обсуждение: inlining
Let me add, I am not inlining all the functions, but only the top part of them that deals with cachoffsets and nulls. These are the easy ones, and the ones that get used most often. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > Let me add, I am not inlining all the functions, but only the top part > of them that deals with cachoffsets and nulls. These are the easy ones, > and the ones that get used most often. fastgetattr() is called from a HUNDREDS places - I'm not sure that this is good idea. I suggest to inline _entire_ body of this func in the execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get data from tuples. (We could #define FASTGETATTR macro and re-write fastgetattr() as just this macro "call".) I don't know should we follow the same way for fastgetiattr() or not... Vadim
Sorry - this is with valid charset... Vadim B. Mikheev wrote: > > Bruce Momjian wrote: > > > > Let me add, I am not inlining all the functions, but only the top part > > of them that deals with cachoffsets and nulls. These are the easy ones, > > and the ones that get used most often. > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > this is good idea. > > I suggest to inline _entire_ body of this func in the > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > data from tuples. > > (We could #define FASTGETATTR macro and re-write fastgetattr() as just > this macro "call".) > > I don't know should we follow the same way for fastgetiattr() or not... > > Vadim
>
> Bruce Momjian wrote:
> >
> > Let me add, I am not inlining all the functions, but only the top part
> > of them that deals with cachoffsets and nulls. These are the easy ones,
> > and the ones that get used most often.
>
> fastgetattr() is called from a HUNDREDS places - I'm not sure that
> this is good idea.
Here is the fastgetattr macro. Again, I just inlined the cacheoffset
and null handlling at the top. Doesn't look like much code, though the
?: macro format makes it look larger. What do you think?
I did the same with fastgetiattr, which in fact just called
index_getattr, so that is gone now. For getsysattr, I made an array of
offsetof(), and do a lookup into the array from heap_getattr, so that is
gone too.
---------------------------------------------------------------------------
#define fastgetattr(tup, attnum, tupleDesc, isnull) \
( \
AssertMacro((attnum) > 0) ? \
( \
((isnull) ? (*(isnull) = false) : (dummyret)NULL), \
HeapTupleNoNulls(tup) ? \
( \
((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \
(char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \
) \
: \
( \
((attnum)-1 > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
att_isnull((attnum)-1, (tup)->t_bits) ? \
( \
((isnull) ? (*(isnull) = true) : (dummyret)NULL), \
(Datum)NULL \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
(Datum)NULL \
) \
)
>
> I suggest to inline _entire_ body of this func in the
> execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> data from tuples.
>
> (We could #define FASTGETATTR macro and re-write fastgetattr() as just
> this macro "call".)
>
> I don't know should we follow the same way for fastgetiattr() or not...
>
> Vadim
>
--
Bruce Momjian
maillist@candle.pha.pa.us
Bruce Momjian wrote: > > > > > Bruce Momjian wrote: > > > > > > Let me add, I am not inlining all the functions, but only the top part > > > of them that deals with cachoffsets and nulls. These are the easy ones, > > > and the ones that get used most often. > > > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > > this is good idea. > > Here is the fastgetattr macro. Again, I just inlined the cacheoffset > and null handlling at the top. Doesn't look like much code, though the > ?: macro format makes it look larger. What do you think? Try to gmake clean and gmake... Please compare old/new sizes for debug version too. Vadim
>
> Bruce Momjian wrote:
> >
> > >
> > > Bruce Momjian wrote:
> > > >
> > > > Let me add, I am not inlining all the functions, but only the top part
> > > > of them that deals with cachoffsets and nulls. These are the easy ones,
> > > > and the ones that get used most often.
> > >
> > > fastgetattr() is called from a HUNDREDS places - I'm not sure that
> > > this is good idea.
> >
> > Here is the fastgetattr macro. Again, I just inlined the cacheoffset
> > and null handlling at the top. Doesn't look like much code, though the
> > ?: macro format makes it look larger. What do you think?
>
> Try to gmake clean and gmake... Please compare old/new sizes for
> debug version too.
OK, here it is, 'size' with two regression run timings:
OLD
text data bss dec hex
831488 155648 201524 1188660 122334
151.12 real 4.66 user 8.52 sys
141.70 real 1.28 user 7.44 sys
NEW
text data bss dec hex
864256 155648 201548 1221452 12a34c
143.52 real 3.48 user 9.08 sys
146.10 real 1.34 user 7.44 sys
These numbers are with assert and -g on.
Interesting that the 1st regression test is the greatest, and the 2nd is
the least, with the same no-inlining, but with standard optimizations.
Now, my test of startup times shows it saves 0.015 seconds on a 0.10
second test. This 0.015 is the equvalent to the fork() overhead time.
This speedup is reproducable.
The inlining is a 3% increase in size, but provides a 15% speed increase
on my startup test.
Looks good to me. I am going to apply the patch, and let people tell me
if they see a speedup worth a 3% binary size increase.
The only visible change is that heap_getattr() does not take a buffer
parameter anymore, thanks to the removal of time travel.
Vadim, I will send you the patch separately to look at.
--
Bruce Momjian
maillist@candle.pha.pa.us
>
> Bruce Momjian wrote:
> >
> > Let me add, I am not inlining all the functions, but only the top part
> > of them that deals with cachoffsets and nulls. These are the easy ones,
> > and the ones that get used most often.
>
> fastgetattr() is called from a HUNDREDS places - I'm not sure that
> this is good idea.
Here is the fastgetattr macro. Again, I just inlined the cacheoffset
and null handlling at the top. Doesn't look like much code, though the
?: macro format makes it look larger. What do you think?
I did the same with fastgetiattr, which in fact just called
index_getattr, so that is gone now. For getsysattr, I made an array of
offsetof(), and do a lookup into the array from heap_getattr, so that is
gone too.
---------------------------------------------------------------------------
#define fastgetattr(tup, attnum, tupleDesc, isnull) \
( \
AssertMacro((attnum) > 0) ? \
( \
((isnull) ? (*(isnull) = false) : (dummyret)NULL), \
HeapTupleNoNulls(tup) ? \
( \
((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \
(char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \
) \
: \
( \
((attnum)-1 > 0) ? \
( \
(Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
att_isnull((attnum)-1, (tup)->t_bits) ? \
( \
((isnull) ? (*(isnull) = true) : (dummyret)NULL), \
(Datum)NULL \
) \
: \
( \
nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
) \
) \
) \
: \
( \
(Datum)NULL \
) \
)
>
> I suggest to inline _entire_ body of this func in the
> execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> data from tuples.
>
> (We could #define FASTGETATTR macro and re-write fastgetattr() as just
> this macro "call".)
>
> I don't know should we follow the same way for fastgetiattr() or not...
>
> Vadim
>
--
Bruce Momjian
maillist@candle.pha.pa.us
> OLD > text data bss dec hex > 831488 155648 201524 1188660 122334 > 151.12 real 4.66 user 8.52 sys > 141.70 real 1.28 user 7.44 sys > > NEW > text data bss dec hex > 864256 155648 201548 1221452 12a34c I have the new size down to 852000, which is only 2.5% increase. -- Bruce Momjian maillist@candle.pha.pa.us
> > Bruce Momjian wrote: > > > > Let me add, I am not inlining all the functions, but only the top part > > of them that deals with cachoffsets and nulls. These are the easy ones, > > and the ones that get used most often. > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > this is good idea. > > I suggest to inline _entire_ body of this func in the > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > data from tuples. I don't think I can do that easily. Inlining the top of the the function that uses attcacheoff or gets NULL's is easy, but after that, lots of loops and stuff, which are hard to inline because you really can't define your own variables inside a macro that returns a value. Let's see that profiling shows after my changes, and how many times nocache_getattr(), the new name for the remaining part of the function, actually has to be called. Also, there is nocache_getiattr(), and get_sysattr() is gone. Just an array lookup for the offset now. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > > > > Bruce Momjian wrote: > > > > > > Let me add, I am not inlining all the functions, but only the top part > > > of them that deals with cachoffsets and nulls. These are the easy ones, > > > and the ones that get used most often. > > > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > > this is good idea. > > > > I suggest to inline _entire_ body of this func in the > > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > > data from tuples. > > I don't think I can do that easily. Inlining the top of the the > function that uses attcacheoff or gets NULL's is easy, but after that, > lots of loops and stuff, which are hard to inline because you really > can't define your own variables inside a macro that returns a value. Ok. > > Let's see that profiling shows after my changes, and how many times > nocache_getattr(), the new name for the remaining part of the function, > actually has to be called. Ok. > > Also, there is nocache_getiattr(), and get_sysattr() is gone. Just an > array lookup for the offset now. Nice. Vadim