Re: Parser Cruft in gram.y

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Parser Cruft in gram.y
Дата
Msg-id 20121220150448.GG4303@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: Parser Cruft in gram.y  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Parser Cruft in gram.y
Список pgsql-hackers
On 2012-12-20 15:51:37 +0100, Andres Freund wrote:
> On 2012-12-20 15:45:47 +0100, Andres Freund wrote:
> > On 2012-12-20 09:11:46 -0500, Robert Haas wrote:
> > > On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > > > On 18 December 2012 22:10, Robert Haas <robertmhaas@gmail.com> wrote:
> > > >> Well that would be nice, but the problem is that I see no way to
> > > >> implement it.  If, with a unified parser, the parser is 14% of our
> > > >> source code, then splitting it in two will probably crank that number
> > > >> up well over 20%, because there will be duplication between the two.
> > > >> That seems double-plus un-good.
> > > >
> > > > I don't think the size of the parser binary is that relevant. What is
> > > > relevant is how much of that is regularly accessed.
> > > >
> > > > Increasing parser cache misses for DDL and increasing size of binary
> > > > overall are acceptable costs if we are able to swap out the unneeded
> > > > areas and significantly reduce the cache misses on the well travelled
> > > > portions of the parser.
> > >
> > > I generally agree.  We don't want to bloat the size of the parser with
> > > wild abandon, but yeah if we can reduce the cache misses on the
> > > well-travelled portions that seems like it ought to help.  My previous
> > > hacky attempt to quantify the potential benefit in this area was:
> > >
> > > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php
> > >
> > > On my machine there seemed to be a small but consistent win; on a very
> > > old box Jeff Janes tried, it didn't seem like there was any benefit at
> > > all.  Somehow, I have a feeling we're missing a trick here.
> >
> > I don't think you will see too many cache misses on such a low number of
> > extremly simply statements, so its not too surprising not to see a that
> > big difference with that.
> >
> > Are we sure its really cache-misses and not just the actions performed
> > in the grammar that make bison code show up in profiles? I remember the
> > latter being the case...
>
> Hm. A very, very quick perf stat -dvvv of pgbench -S -c 20 -j 20 -T 20 later:
>
>      218350.885559 task-clock                #   10.095 CPUs utilized
>          1,676,479 context-switches          #    0.008 M/sec
>              2,396 cpu-migrations            #    0.011 K/sec
>            796,038 page-faults               #    0.004 M/sec
>    506,312,525,518 cycles                    #    2.319 GHz                     [20.00%]
>    405,944,435,754 stalled-cycles-frontend   #   80.18% frontend cycles idle    [30.32%]
>    236,712,872,641 stalled-cycles-backend    #   46.75% backend  cycles idle    [40.51%]
>    193,459,120,458 instructions              #    0.38  insns per cycle
>                                              #    2.10  stalled cycles per insn [50.70%]
>     36,433,144,472 branches                  #  166.856 M/sec                   [51.12%]
>      3,623,778,087 branch-misses             #    9.95% of all branches         [50.87%]
>     50,344,123,581 L1-dcache-loads           #  230.565 M/sec                   [50.33%]
>      5,548,192,686 L1-dcache-load-misses     #   11.02% of all L1-dcache hits   [49.69%]
>      2,666,461,361 LLC-loads                 #   12.212 M/sec                   [35.63%]
>        112,407,198 LLC-load-misses           #    4.22% of all LL-cache hits    [ 9.67%]
>
>       21.629396701 seconds time elapsed
>
> So there definitely a noticeable rate of cache misses...

L1 misses:
# Samples: 997K of event 'L1-dcache-load-misses'
# Overhead   Command       Shared Object Symbol
# ........  ........  ...............................................................    6.49%  postgres  postgres
     [.] SearchCatCache    3.65%  postgres  postgres            [.] base_yyparse    3.48%  postgres  postgres
[.] hash_search_with_hash_value    3.41%  postgres  postgres            [.] AllocSetAlloc    1.84%  postgres  postgres
         [.] LWLockAcquire    1.40%  postgres  postgres            [.] fmgr_info_cxt_security    1.36%  postgres
postgres           [.] nocachegetattr    1.23%  postgres  libc-2.13.so        [.] _int_malloc    1.20%  postgres
postgres           [.] core_yylex    1.15%  postgres  postgres            [.] MemoryContextAllocZeroAligned    0.94%
postgres postgres            [.] PostgresMain    0.94%  postgres  postgres            [.] MemoryContextAlloc    0.91%
postgres libc-2.13.so        [.] __memcpy_ssse3_back    0.89%  postgres  postgres            [.]
CatalogCacheComputeHashValue   0.86%  postgres  postgres            [.] PinBuffer    0.86%  postgres  [kernel.kallsyms]
 [k] __audit_syscall_entry    0.80%  postgres  libc-2.13.so        [.] __strcmp_sse42    0.80%  postgres  postgres
     [.] _bt_compare    0.78%  postgres  postgres            [.] FunctionCall2Coll    0.77%  postgres  libc-2.13.so
  [.] malloc    0.73%  postgres  libc-2.13.so        [.] __memset_sse2    0.72%  postgres  postgres            [.]
GetSnapshotData   0.69%  postgres  [kernel.kallsyms]   [k] fget_light    0.69%  postgres  postgres            [.]
DirectFunctionCall1Coll   0.67%  postgres  postgres            [.] hash_search    0.67%  postgres  libc-2.13.so
[.]0x000000000011a3a5    0.66%  postgres  postgres            [.] pgstat_initstats    0.66%  postgres  postgres
  [.] AllocSetFree    0.65%  postgres  libc-2.13.so        [.] __strlen_sse42    0.60%  postgres  libc-2.13.so
[.]_int_free    0.60%  postgres  [kernel.kallsyms]   [k] cpuacct_charge    0.59%  postgres  postgres            [.]
heap_getsysattr   0.59%  postgres  postgres            [.] MemoryContextAllocZero    0.58%  postgres  postgres
 [.] PopActiveSnapshot    0.53%  postgres  libc-2.13.so        [.] __memcmp_sse4_1    0.51%  postgres  postgres
  [.] ReadBuffer_common    0.49%  postgres  postgres            [.] ScanKeywordLookup    0.49%  postgres  postgres
     [.] LockAcquireExtended    0.47%  postgres  [kernel.kallsyms]   [k] update_cfs_shares    0.45%  postgres  postgres
          [.] SearchCatCacheList    0.45%  postgres  postgres            [.] new_list    0.44%  postgres  postgres
     [.] get_relation_info
 

LLC misses:
# Samples: 1M of event 'LLC-load-misses'
# Event count (approx.): 143379713
# Overhead   Command       Shared Object Symbol
# ........  ........  ...............................................................   25.08%  postgres  postgres
     [.] _bt_compare   12.84%  postgres  postgres            [.] PinBuffer    9.18%  postgres  postgres            [.]
LWLockAcquire   6.31%  postgres  postgres            [.] GetSnapshotData    6.08%  postgres  postgres            [.]
heap_hot_search_buffer   5.13%  postgres  postgres            [.] hash_search_with_hash_value    4.85%  postgres
postgres           [.] _bt_checkpage    3.95%  postgres  postgres            [.] _bt_moveright    3.09%  postgres
postgres           [.] heap_page_prune_opt    2.12%  postgres  postgres            [.] slot_deform_tuple    1.98%
postgres postgres            [.] LWLockRelease    1.82%  postgres  libc-2.13.so        [.] __memcmp_sse4_1    1.16%
postgres postgres            [.] ExecProject    1.16%  postgres  postgres            [.] FunctionCall2Coll    0.94%
postgres [kernel.kallsyms]   [k] copy_user_generic_string    0.94%  postgres  [kernel.kallsyms]   [k] tg_load_down
0.78% postgres  [kernel.kallsyms]   [k] find_get_page    0.73%  postgres  postgres            [.]
ProcArrayEndTransaction   0.73%  postgres  postgres            [.] pfree    0.71%  postgres  postgres            [.]
pgstat_report_xact_timestamp   0.69%  postgres  postgres            [.] index_fetch_heap    0.66%  postgres  postgres
        [.] LockAcquireExtended    0.56%  postgres  postgres            [.] LockBuffer    0.45%  postgres  postgres
      [.] slot_getsomeattrs    0.40%  postgres  postgres            [.] _bt_readpage 

So it seems L1 misses are the interesting thing wrt to parsing.

When doing a source/assembly annotation in SearchCatCache about half of
the misses are attributed to the memcpy() directly at the beginning.
In base_yyparse the three biggest offenders (15%, 10.5%, 5.58%) really
seem to be various kinds of table lookups.

So it seems to confirm the various suspicious that the table size might
be rather relevant.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Следующее
От: Josh Kupershmidt
Дата:
Сообщение: Re: discarding duplicate indexes