Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

Поиск
Список
Период
Сортировка
От Steve Singer
Тема Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Дата
Msg-id BLU0-SMTP78072D679E8BC80C7BD3C3DC110@phx.gbl
обсуждение исходный текст
Ответ на Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 13-01-09 03:07 PM, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> Well, I *did* benchmark it as noted elsewhere in the thread, but thats
>> obviously just machine (E5520 x 2) with one rather restricted workload
>> (pgbench -S -jc 40 -T60). At least its rather palloc heavy.
>> Here are the numbers:
>> before:
>> #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
>> after:
>> #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992
>> So on my system if there is a difference, its positive (0.12%).
> pgbench-based testing doesn't fill me with a lot of confidence for this
> --- its numbers contain a lot of communication overhead, not to mention
> that pgbench itself can be a bottleneck.  It struck me that we have a
> recent test case that's known to be really palloc-intensive, namely
> Pavel's example here:
> http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com
>
> I set up a non-cassert build of commit
> 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
> reduced the data-copying overhead for that).  On my Fedora 16 machine
> (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
> I get a runtime for Pavel's example of 17023 msec (average over five
> runs).  I then applied oprofile and got a breakdown like this:
>
>    samples|      %|
> ------------------
>     108409 84.5083 /home/tgl/testversion/bin/postgres
>      13723 10.6975 /lib64/libc-2.14.90.so
>       3153  2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so
>
> samples  %        symbol name
> 10960    10.1495  AllocSetAlloc
> 6325      5.8572  MemoryContextAllocZeroAligned
> 6225      5.7646  base_yyparse
> 3765      3.4866  copyObject
> 2511      2.3253  MemoryContextAlloc
> 2292      2.1225  grouping_planner
> 2044      1.8928  SearchCatCache
> 1956      1.8113  core_yylex
> 1763      1.6326  expression_tree_walker
> 1347      1.2474  MemoryContextCreate
> 1340      1.2409  check_stack_depth
> 1276      1.1816  GetCachedPlan
> 1175      1.0881  AllocSetFree
> 1106      1.0242  GetSnapshotData
> 1106      1.0242  _SPI_execute_plan
> 1101      1.0196  extract_query_dependencies_walker
>
> I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
> Now I get an average runtime of 16666 ms, a full 2% faster, which is a
> bit astonishing, particularly because the oprofile results haven't moved
> much:
>
>     107642 83.7427 /home/tgl/testversion/bin/postgres
>      14677 11.4183 /lib64/libc-2.14.90.so
>       3180  2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so
>
> samples  %        symbol name
> 10038     9.3537  AllocSetAlloc
> 6392      5.9562  MemoryContextAllocZeroAligned
> 5763      5.3701  base_yyparse
> 4810      4.4821  copyObject
> 2268      2.1134  grouping_planner
> 2178      2.0295  core_yylex
> 1963      1.8292  palloc
> 1867      1.7397  SearchCatCache
> 1835      1.7099  expression_tree_walker
> 1551      1.4453  check_stack_depth
> 1374      1.2803  _SPI_execute_plan
> 1282      1.1946  MemoryContextCreate
> 1187      1.1061  AllocSetFree
> ...
> 653       0.6085  palloc0
> ...
> 552       0.5144  MemoryContextAlloc
>
> The number of calls of AllocSetAlloc certainly hasn't changed at all, so
> how did that get faster?
>
> I notice that the postgres executable is about 0.2% smaller, presumably
> because a whole lot of inlined fetches of CurrentMemoryContext are gone.
> This makes me wonder if my result is due to chance improvements of cache
> line alignment for inner loops.
>
> I would like to know if other people get comparable results on other
> hardware (non-Intel hardware would be especially interesting).  If this
> result holds up across a range of platforms, I'll withdraw my objection
> to making palloc a plain function.
>
>             regards, tom lane
>

Sorry for the delay I only read this thread today.


I just tried Pawel's test on a POWER5 machine with an older version of 
gcc (see the grebe buildfarm animal for details)

78a5e738e:                   37874.855 (average of 6 runs)
78a5e738 + palloc.h + mcxt.c: 38076.8035

The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a
breakdownof the functions since Andres has already removed this from his patch.
 

Steve





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: string escaping in tutorial/syscat.source
Следующее
От: "Dickson S. Guedes"
Дата:
Сообщение: Re: Review: Patch to compute Max LSN of Data Pages