Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
От | Steve Singer |
---|---|
Тема | Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4) |
Дата | |
Msg-id | BLU0-SMTP78072D679E8BC80C7BD3C3DC110@phx.gbl обсуждение исходный текст |
Ответ на | Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4) (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Re: [PATCH] unified frontend support for pg_malloc et
al and palloc/pfree mulation (was xlogreader-v4)
(Andres Freund <andres@2ndquadrant.com>)
|
Список | pgsql-hackers |
On 13-01-09 03:07 PM, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> Well, I *did* benchmark it as noted elsewhere in the thread, but thats >> obviously just machine (E5520 x 2) with one rather restricted workload >> (pgbench -S -jc 40 -T60). At least its rather palloc heavy. >> Here are the numbers: >> before: >> #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 >> after: >> #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 >> So on my system if there is a difference, its positive (0.12%). > pgbench-based testing doesn't fill me with a lot of confidence for this > --- its numbers contain a lot of communication overhead, not to mention > that pgbench itself can be a bottleneck. It struck me that we have a > recent test case that's known to be really palloc-intensive, namely > Pavel's example here: > http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com > > I set up a non-cassert build of commit > 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that > reduced the data-copying overhead for that). On my Fedora 16 machine > (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) > I get a runtime for Pavel's example of 17023 msec (average over five > runs). I then applied oprofile and got a breakdown like this: > > samples| %| > ------------------ > 108409 84.5083 /home/tgl/testversion/bin/postgres > 13723 10.6975 /lib64/libc-2.14.90.so > 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so > > samples % symbol name > 10960 10.1495 AllocSetAlloc > 6325 5.8572 MemoryContextAllocZeroAligned > 6225 5.7646 base_yyparse > 3765 3.4866 copyObject > 2511 2.3253 MemoryContextAlloc > 2292 2.1225 grouping_planner > 2044 1.8928 SearchCatCache > 1956 1.8113 core_yylex > 1763 1.6326 expression_tree_walker > 1347 1.2474 MemoryContextCreate > 1340 1.2409 check_stack_depth > 1276 1.1816 GetCachedPlan > 1175 1.0881 AllocSetFree > 1106 1.0242 GetSnapshotData > 1106 1.0242 _SPI_execute_plan > 1101 1.0196 extract_query_dependencies_walker > > I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. > Now I get an average runtime of 16666 ms, a full 2% faster, which is a > bit astonishing, particularly because the oprofile results haven't moved > much: > > 107642 83.7427 /home/tgl/testversion/bin/postgres > 14677 11.4183 /lib64/libc-2.14.90.so > 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so > > samples % symbol name > 10038 9.3537 AllocSetAlloc > 6392 5.9562 MemoryContextAllocZeroAligned > 5763 5.3701 base_yyparse > 4810 4.4821 copyObject > 2268 2.1134 grouping_planner > 2178 2.0295 core_yylex > 1963 1.8292 palloc > 1867 1.7397 SearchCatCache > 1835 1.7099 expression_tree_walker > 1551 1.4453 check_stack_depth > 1374 1.2803 _SPI_execute_plan > 1282 1.1946 MemoryContextCreate > 1187 1.1061 AllocSetFree > ... > 653 0.6085 palloc0 > ... > 552 0.5144 MemoryContextAlloc > > The number of calls of AllocSetAlloc certainly hasn't changed at all, so > how did that get faster? > > I notice that the postgres executable is about 0.2% smaller, presumably > because a whole lot of inlined fetches of CurrentMemoryContext are gone. > This makes me wonder if my result is due to chance improvements of cache > line alignment for inner loops. > > I would like to know if other people get comparable results on other > hardware (non-Intel hardware would be especially interesting). If this > result holds up across a range of platforms, I'll withdraw my objection > to making palloc a plain function. > > regards, tom lane > Sorry for the delay I only read this thread today. I just tried Pawel's test on a POWER5 machine with an older version of gcc (see the grebe buildfarm animal for details) 78a5e738e: 37874.855 (average of 6 runs) 78a5e738 + palloc.h + mcxt.c: 38076.8035 The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdownof the functions since Andres has already removed this from his patch. Steve
В списке pgsql-hackers по дате отправления:
Следующее
От: "Dickson S. Guedes"Дата:
Сообщение: Re: Review: Patch to compute Max LSN of Data Pages