Обсуждение: 9.3 regression with dbt2
<div class="WordSection1"><p class="MsoPlainText">Hi,<p class="MsoPlainText"> <p class="MsoPlainText">We recently observed~15% performance regression with dbt2 from PG 9.3.<p class="MsoPlainText">We narrowed down on testing master between9.2 cut and 9.3 cut.<p class="MsoPlainText">It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culpritcommit.<p class="MsoPlainText">We did several runs and perf profiling comparing it against its parent (f925c79b9f36c54b67053ade5ad225a75b8dc803).<pclass="MsoPlainText">Also tested a 12/16 commit on the master (3b97e6823b949624afdc3ce4c92b29a80429715f)once, it performed similarly as 0ac..<p class="MsoPlainText"> <p class="MsoPlainText">Regards,<pclass="MsoPlainText">Dong<p class="MsoPlainText"> <p class="MsoPlainText"> <p class="MsoPlainText">Results:<pclass="MsoPlainText">f92: 53k-56k'ish notpm<p class="MsoPlainText">0ac: 47k-48k'ish notpm<pclass="MsoPlainText"> <p class="MsoPlainText">Server SUT:<p class="MsoPlainText">HP ML350 G6<p class="MsoPlainText">Two Xeon E5520 (4c/p, 8 cores total, hyper-threading disabled)<p class="MsoPlainText">12GB DRAM<p class="MsoPlainText">HPP410i RAID controller (256MB battery-backed cache)<p class="MsoPlainText">- three 10k-rpm SAS: /<pclass="MsoPlainText">- three 10k-rpm SAS: /pgdata<p class="MsoPlainText">- one 15k-rpm SAS: /pgxlog<p class="MsoPlainText">-ext4 (rw,relatime,data=ordered) on all mounts. <p class="MsoPlainText"> <p class="MsoPlainText">Fedora19 (3.11.10-200.fc19.x86_64)<p class="MsoPlainText"> <p class="MsoPlainText">max_connections=100<pclass="MsoPlainText">shared_buffers=8192MB<p class="MsoPlainText">effective_cache_size=10GB<pclass="MsoPlainText">temp_buffers=8186kB<p class="MsoPlainText">work_mem=4093kB<pclass="MsoPlainText">maintenance_work_mem=399MB<p class="MsoPlainText">wal_buffers=-1<pclass="MsoPlainText">checkpoint_segments=300<p class="MsoPlainText">checkpoint_completion_target=0.9<pclass="MsoPlainText">logging_collector=on<p class="MsoPlainText">log_timezone=UTC<pclass="MsoPlainText">datestyle='iso, mdy'<p class="MsoPlainText">lc_messages=C<p class="MsoPlainText">lc_monetary=C<pclass="MsoPlainText">lc_numeric=C<p class="MsoPlainText">lc_time=C<p class="MsoPlainText">default_text_search_config='pg_catalog.english'<pclass="MsoPlainText">listen_addresses='*'<p class="MsoPlainText">log_destination=csvlog<pclass="MsoPlainText">log_directory=pg_log<p class="MsoPlainText">log_filename='pg-%a'<pclass="MsoPlainText">log_rotation_age=1440<p class="MsoPlainText">log_truncate_on_rotation=on<pclass="MsoPlainText"> <p class="MsoPlainText">Client and workload:<p class="MsoPlainText">Dell390. Two core. Direct connect with the Server SUT.<p class="MsoPlainText">dbt2 (ToT)<p class="MsoPlainText">40warehouse<p class="MsoPlainText">8 terminals, 8 connections<p class="MsoPlainText">zero think/keytime<p class="MsoPlainText">12-min run<p class="MsoPlainText"> <p class="MsoPlainText">Flat perf profiles of twosuch runs look like:<p class="MsoPlainText">f92:<p class="MsoPlainText">Samples: 608K of event 'cycles', Event count (approx.):6679607097416 <p class="MsoPlainText">+ 4.04% postgres postgres [.] heap_hot_search_buffer <p class="MsoPlainText">+ 3.63% postgres postgres [.] AllocSetAlloc <p class="MsoPlainText">+ 3.37% postgres postgres [.] hash_search_with_hash_value <p class="MsoPlainText">+ 2.85% postgres postgres [.] _bt_compare <p class="MsoPlainText">+ 2.67% postgres postgres [.] SearchCatCache <p class="MsoPlainText">+ 2.46% postgres postgres [.] LWLockAcquire <p class="MsoPlainText">+ 2.16% postgres postgres [.] XLogInsert <p class="MsoPlainText">+ 2.08% postgres postgres [.] PinBuffer <p class="MsoPlainText">+ 1.32% postgres postgres [.] ExecInitExpr <p class="MsoPlainText">+ 1.31% postgres libc-2.17.so [.] _int_malloc <p class="MsoPlainText">+ 1.29% swapper [kernel.kallsyms] [k] intel_idle <p class="MsoPlainText">+ 1.23% postgres postgres [.] MemoryContextAllocZeroAligned <p class="MsoPlainText">+ 1.13% postgres postgres [.] heap_page_prune_opt <p class="MsoPlainText">+ 1.06% postgres libc-2.17.so [.] __memcpy_ssse3_back <p class="MsoPlainText">+ 1.02% postgres postgres [.] LWLockRelease <p class="MsoPlainText">+ 0.94% postgres postgres [.] copyObject <p class="MsoPlainText">+ 0.89% postgres postgres [.] fmgr_info_cxt_security <p class="MsoPlainText">+ 0.82% postgres postgres [.] _bt_checkkeys <p class="MsoPlainText">+ 0.81% postgres postgres [.] hash_any <p class="MsoPlainText">+ 0.73% postgres postgres [.] FunctionCall2Coll <p class="MsoPlainText">+ 0.69% postgres libc-2.17.so [.] __strncpy_sse2_unaligned <p class="MsoPlainText">+ 0.67% postgres postgres [.] HeapTupleSatisfiesMVCC <p class="MsoPlainText">+ 0.66% postgres postgres [.] MemoryContextAlloc <p class="MsoPlainText">+ 0.65% postgres postgres [.] expression_tree_walker <p class="MsoPlainText">+ 0.59% postgres postgres [.] check_stack_depth <p class="MsoPlainText">+ 0.57% postgres libc-2.17.so [.] __printf_fp <p class="MsoPlainText">+ 0.56% postgres libc-2.17.so [.] _int_free <p class="MsoPlainText">+ 0.52% postgres postgres [.] base_yyparse<p class="MsoPlainText"> <p class="MsoPlainText">0ac:<p class="MsoPlainText">Samples:706K of event 'cycles', Event count (approx.): 6690377376522 <p class="MsoPlainText">+ 3.82% postgres postgres [.] GetMultiXactIdMembers <p class="MsoPlainText">+ 3.43% postgres postgres [.] LWLockAcquire <p class="MsoPlainText">+ 3.31% postgres postgres [.] hash_search_with_hash_value <p class="MsoPlainText">+ 3.09% postgres postgres [.] heap_hot_search_buffer <p class="MsoPlainText">+ 3.00% postgres postgres [.] AllocSetAlloc <p class="MsoPlainText">+ 2.56% postgres postgres [.] _bt_compare <p class="MsoPlainText">+ 2.19% postgres postgres [.] PinBuffer <p class="MsoPlainText">+ 2.13% postgres postgres [.] SearchCatCache <p class="MsoPlainText">+ 1.99% postgres postgres [.] XLogInsert <pclass="MsoPlainText">+ 1.53% postgres postgres [.] LWLockRelease <p class="MsoPlainText">+ 1.32% postgres postgres [.] HeapTupleSatisfiesMVCC <p class="MsoPlainText">+ 1.31% swapper [kernel.kallsyms] [k] intel_idle <p class="MsoPlainText">+ 1.25% postgres postgres [.] ExecInitExpr <p class="MsoPlainText">+ 1.21% postgres postgres [.] heap_page_prune_opt<p class="MsoPlainText">+ 1.03% postgres libc-2.17.so [.] _int_malloc<p class="MsoPlainText">+ 0.99% postgres postgres [.] MemoryContextAllocZeroAligned<p class="MsoPlainText">+ 0.97% postgres libc-2.17.so [.] __memcpy_ssse3_back<p class="MsoPlainText">+ 0.89% postgres postgres [.] heapgetpage<p class="MsoPlainText">+ 0.76% postgres postgres [.]hash_any<p class="MsoPlainText">+ 0.69% postgres postgres [.] fmgr_info_cxt_security<p class="MsoPlainText">+ 0.69% postgres postgres [.] _bt_checkkeys<p class="MsoPlainText">+ 0.65% postgres postgres [.] copyObject<p class="MsoPlainText">+ 0.63% postgres postgres [.] FunctionCall2Coll<p class="MsoPlainText">+ 0.58% postgres postgres [.] MemoryContextAlloc<p class="MsoPlainText">+ 0.56% postgres libc-2.17.so [.] __strncpy_sse2_unaligned<p class="MsoPlainText">+ 0.53% postgres postgres [.] XidInMVCCSnapshot<p class="MsoPlainText">+ 0.52% postgres [kernel.kallsyms] [k] copy_user_generic_string<p class="MsoPlainText">+ 0.51% postgres libc-2.17.so [.] __printf_fp<p class="MsoPlainText">+ 0.50% postgres libc-2.17.so [.]vfprintf<p class="MsoPlainText"> <p class="MsoPlainText"> <p class="MsoPlainText">P.S.<p class="MsoPlainText">The regressionwas found when testing vPostgres performance.<p class="MsoPlainText">During the narrow-down, we found the commit005f583ba4e6d4d19b62959ef8e70a3da4d188a5 (on REL9_2_STABLE) improves performance (versus its parent commit) by 5-10%.Perhaps serve an extra data point for the patch.<p class="MsoPlainText"> <p class="MsoNormal"> </div>
Hello, On 2013-12-18 10:24:56 -0800, Dong Ye wrote: > It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit > commit. How long does a run take to verify the problem? Could you retry with the patch attached to http://www.postgresql.org/message-id/20131201114514.GG18793@alap2.anarazel.de ? Based on the theory that it creates many superflous multixacts. > Flat perf profiles of two such runs look like: Those aren't really flat profiles tho ;) > 0ac: > > Samples: 706K of event 'cycles', Event count (approx.): 6690377376522 > > > + 3.82% postgres postgres [.] > GetMultiXactIdMembers Could you expland that one some levels, so we see the callers? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
~20 minutes each run with binary.
Thanks,
Try your patch now..
You are right I used -g in perf record. But what I reported was flat (meant as a start).
Expand GetMultiXactIdMembers:
3.82% postgres postgres [.] GetMultiXactIdMembers
|
|--9.09%-- GetMultiXactIdMembers
|
|--0.84%-- 0x48fb894853f58948
| |
| |--0.74%-- 0x4296e0004296c
| | GetMultiXactIdMembers
| |
| |--0.64%-- 0x52f8d00052f8d
| | GetMultiXactIdMembers
| |
| |--0.64%-- 0xf6ce8000f6ce8
| | GetMultiXactIdMembers
| |
| |--0.62%-- 0x41de300041de1
| | GetMultiXactIdMembers
| |
| |--0.61%-- 0xf2c77000f2c71
| | GetMultiXactIdMembers
| |
| |--0.60%-- 0x3127700031275
| | GetMultiXactIdMembers
| |
| |--0.59%-- 0x10c98b0010c987
| | GetMultiXactIdMembers
| |
| |--0.59%-- 0x31df000031df0
| | GetMultiXactIdMembers
| |
| |--0.59%-- 0xbefbd000befbd
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0xfe97c000fe976
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x82501000824f9
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x3a4410003a43c
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x3b0cf0003b0c3
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x5325f0005325b
| | GetMultiXactIdMembers
| |
| |--0.58%-- 0x7b6b80007b6b8
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0x52e9b00052e9b
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0xf3d45000f3d40
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0x27afd00027afa
| | GetMultiXactIdMembers
| |
| |--0.57%-- 0x3244d0003244d
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0x53e0d00053e06
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0xb64c6000b64bc
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0x423f1000423ef
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0xc18f2000c18ed
| | GetMultiXactIdMembers
| |
| |--0.56%-- 0x6bdcf0006bdcd
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xc6d25000c6d25
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xf6534000f6534
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0x10bba80010bba0
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xb5a76000b5a6e
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0x2d3c10002d3b5
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0xcc095000cc095
| | GetMultiXactIdMembers
| |
| |--0.55%-- 0x2e3340002e334
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0xafffc000afff9
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x68caf00068caf
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x5ec090005ec09
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x51eb100051eac
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0x5402000054020
| | GetMultiXactIdMembers
| |
| |--0.54%-- 0xe352a000e3526
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x6a5d30006a5d3
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x2860b00028606
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0xce9ad000ce9ab
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x42d4000042d40
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x27e6900027e69
| | GetMultiXactIdMembers
| |
| |--0.53%-- 0x7aaf20007aaef
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0x2b6150002b615
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0x2969200029692
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0xc801e000c8016
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0xc72c5000c72bf
| | GetMultiXactIdMembers
| |
| |--0.52%-- 0xc2018000c2018
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0x6ad840006ad7b
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0x80b4600080b41
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0xd627a000d6276
| | GetMultiXactIdMembers
| |
| |--0.51%-- 0xfafb9000fafb9
| | GetMultiXactIdMembers
| |
| |--0.50%-- 0xc6b3c000c6b38
| | GetMultiXactIdMembers
| |
| |--0.50%-- 0x7207700072072
| | GetMultiXactIdMembers
| --69.92%-- [...]
--90.08%-- [...]
Thanks,
Dong
On Wed, Dec 18, 2013 at 2:17 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hello,How long does a run take to verify the problem? Could you retry with the
On 2013-12-18 10:24:56 -0800, Dong Ye wrote:
> It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit
> commit.
patch attached to
http://www.postgresql.org/message-id/20131201114514.GG18793@alap2.anarazel.de
? Based on the theory that it creates many superflous multixacts.Those aren't really flat profiles tho ;)
> Flat perf profiles of two such runs look like:Could you expland that one some levels, so we see the callers?
> 0ac:
>
> Samples: 706K of event 'cycles', Event count (approx.): 6690377376522
>
>
> + 3.82% postgres postgres [.]
> GetMultiXactIdMembers
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, On 2013-12-18 14:59:45 -0500, Dong Ye wrote: > ~20 minutes each run with binary. > Try your patch now.. > You are right I used -g in perf record. But what I reported was flat (meant > as a start). > > Expand GetMultiXactIdMembers: > > 3.82% postgres postgres [.] > GetMultiXactIdMembers That looks like a postgres compiled without -fno-omit-frame-pointer. Without that hierarchical profiles are meaningless. Very new perfs can also do it using dwarf, but it's unusabl slow... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
<div dir="ltr">HI<br /><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Dec 18, 2013 at 3:03 PM, Andres Freund<span dir="ltr"><<a href="mailto:andres@2ndquadrant.com" target="_blank">andres@2ndquadrant.com</a>></span> wrote:<blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> That looks likea postgres compiled without<br /> -fno-omit-frame-pointer. Without that hierarchical profiles are<br /> meaningless.Very new perfs can also do it using dwarf, but it's unusabl<br /> slow...<br /><div class="HOEnZb"><div class="h5"><br/></div></div></blockquote><div style="style">Let me complete current test without the flag (to serve as acheckpoint as well). I will do a run with the flag after.</div><div style="style"><br /></div><div style="style">Thanks,</div><divstyle="style">Dong</div></div></div></div>
Hi,
FWIW, the profile is below. I will do a run with the flag..
Samples: 598K of event 'cycles', Event count (approx.): 6568957160059
+ 4.03% postgres postgres [.] hash_search_with_hash_value
+ 3.74% postgres postgres [.] heap_hot_search_buffer
+ 3.17% postgres postgres [.] LWLockAcquire
+ 2.92% postgres postgres [.] _bt_compare
+ 2.64% postgres postgres [.] PinBuffer
+ 2.62% postgres postgres [.] AllocSetAlloc
+ 2.34% postgres postgres [.] XLogInsert
+ 2.27% postgres postgres [.] SearchCatCache
+ 1.81% postgres postgres [.] HeapTupleSatisfiesMVCC
+ 1.52% postgres postgres [.] ExecInitExpr
+ 1.45% postgres postgres [.] heap_page_prune_opt
+ 1.41% swapper [kernel.kallsyms] [k] intel_idle
+ 1.30% postgres postgres [.] LWLockRelease
+ 1.12% postgres postgres [.] heapgetpage
+ 0.96% postgres libc-2.17.so [.] _int_malloc
+ 0.86% postgres libc-2.17.so [.] __memcpy_ssse3_back
+ 0.84% postgres postgres [.] hash_any
+ 0.81% postgres postgres [.] MemoryContextAllocZeroAligned
+ 0.76% postgres postgres [.] XidInMVCCSnapshot
+ 0.74% postgres postgres [.] _bt_checkkeys
+ 0.70% postgres postgres [.] fmgr_info_cxt_security
+ 0.70% postgres postgres [.] FunctionCall2Coll
+ 0.66% postgres libc-2.17.so [.] __strncpy_sse2_unaligned
+ 0.63% postgres postgres [.] tbm_iterate
+ 0.58% postgres postgres [.] base_yyparse
+ 0.58% postgres libc-2.17.so [.] __printf_fp
+ 0.55% postgres postgres [.] palloc
+ 0.51% postgres postgres [.] TransactionIdPrecedes
+ 0.51% postgres postgres [.] slot_deform_tuple
+ 0.50% postgres postgres [.] ReadBuffer_common
Applying your patch plus adding -fno-omit-frame-pointer, I got 54526.90 notpm.
The profile (part) below:
# Samples: 610K of event 'cycles'
# Event count (approx.): 6686532056450
#
# Overhead Command Shared Object Symbol
# ........ .......... ............. ..................................................
#
4.08% postgres postgres [.] hash_search_with_hash_value
|
--- hash_search_with_hash_value
|
--2.87%-- BufTableLookup
|
--2.86%-- ReadBuffer_common
ReadBufferExtended
|
|--0.86%-- index_fetch_heap
| |
| --0.62%-- index_getnext
| IndexNext
| ExecScan
| ExecProcNode
|
|--0.71%-- BitmapHeapNext
| ExecScan
| ExecProcNode
|
--0.57%-- _bt_relandgetbuf
|
--0.57%-- _bt_search
3.75% postgres postgres [.] heap_hot_search_buffer
|
--- heap_hot_search_buffer
|
|--1.92%-- BitmapHeapNext
| ExecScan
| ExecProcNode
| |
| --1.12%-- standard_ExecutorRun
| _SPI_execute_plan
| SPI_execute_plan
| |
| --0.70%-- payment
| ExecMakeFunctionResult
| ExecProject
| ExecResult
| ExecProcNode
| standard_ExecutorRun
| PortalRunSelect
| PortalRun
| PostgresMain
| ServerLoop
| PostmasterMain
| main
| __libc_start_main
|
--1.74%-- index_fetch_heap
|
--1.46%-- index_getnext
|
--1.45%-- IndexNext
ExecScan
ExecProcNode
|
--0.96%-- standard_ExecutorRun
_SPI_execute_plan
SPI_execute_plan
|
--0.50%-- new_order
ExecMakeFunctionResult
ExecProject
ExecResult
ExecProcNode
standard_ExecutorRun
PortalRunSelect
PortalRunFetch
PerformPortalFetch
standard_ProcessUtility
PortalRunUtility
FillPortalStore
PortalRun
PostgresMain
ServerLoop
PostmasterMain
main
__libc_start_main
3.15% postgres postgres [.] LWLockAcquire
|
--- LWLockAcquire
|
--1.65%-- ReadBuffer_common
ReadBufferExtended
2.74% postgres postgres [.] PinBuffer
|
--- PinBuffer
|
--2.72%-- ReadBuffer_common
ReadBufferExtended
|
|--0.71%-- index_fetch_heap
| |
| --0.51%-- index_getnext
| IndexNext
| ExecScan
| ExecProcNode
|
|--0.67%-- BitmapHeapNext
| ExecScan
| ExecProcNode
|
--0.60%-- heapgetpage
heapgettup_pagemode
heap_getnext
SeqNext
ExecScan
ExecProcNode
2.72% postgres postgres [.] _bt_compare
|
--- _bt_compare
|
--2.51%-- _bt_binsrch
|
|--1.18%-- _bt_search
| |
| --0.74%-- _bt_first
| |
| --0.72%-- btgettuple
| FunctionCall2Coll
| index_getnext_tid
| |
| --0.58%-- index_getnext
| |
| --0.58%-- IndexNext
| ExecScan
| ExecProcNode
|
--0.97%-- _bt_first
|
--0.94%-- btgettuple
FunctionCall2Coll
index_getnext_tid
|
--0.66%-- index_getnext
|
--0.66%-- IndexNext
ExecScan
ExecProcNode
|
--0.51%-- standard_ExecutorRun
_SPI_execute_plan
SPI_execute_plan
2.47% postgres postgres [.] AllocSetAlloc
|
--- AllocSetAlloc
|
--0.54%-- MemoryContextAllocZeroAligned