Обсуждение: Re: [PATCHES] Avg performance for int8/numeric

Поиск
Список
Период
Сортировка

Re: [PATCHES] Avg performance for int8/numeric

От
Mark Kirkwood
Дата:
(Blast - meant to send this to -hackers not -patches....)

Neil Conway wrote:

>
>> (it is still slower than doing sum/count - possibly due to the
>> construct/deconstruct overhead of the numeric transition array).
>
> This would indeed be worth profiling. If it turns out that array
> overhead is significant, I wonder if we could use a composite type for
> the transition variable instead. That might also make it easier to
> represent the "N" value as an int8 rather than a numeric.
>

I've profiled the 2nd patch using the setup indicated below. The first
64 lines of the flat graph are attached. The complete profile is here:

http://homepages.paradise.net.nz/markir/download/postgres/postgres-avg.gprof.gz

Setup:

avg=# \d avgtest
         Table "public.avgtest"
   Column |     Type      | Modifiers
--------+---------------+-----------
   id     | integer       |
   val0   | bigint        |
   val1   | numeric(12,2) |
   val2   | numeric(10,0) |

avg=# analyze verbose avgtest;
INFO:  analyzing "public.avgtest"
INFO:  "avgtest": scanned 3000 of 87689 pages, containing 342138 live
rows and 0 dead rows; 3000 rows in sample, 10000580 estimated total rows
ANALYZE
Time: 252.033 ms
avg=# select avg(val2) from avgtest;
           avg
---------------------
   714285.214285800000
(1 row)

Time: 35196.028 ms
avg=# \q

regards

Mark

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 14.42      2.16     2.16 100002977     0.00     0.00  AllocSetAlloc
  9.08      3.52     1.36 20000000     0.00     0.00  add_abs
  5.54      4.35     0.83 10000000     0.00     0.00  slot_deform_tuple
  5.41      5.16     0.81 60001673     0.00     0.00  AllocSetFree
  4.34      5.81     0.65 10000000     0.00     0.00  construct_md_array
  4.21      6.44     0.63 20000003     0.00     0.00  make_result
  3.54      6.97     0.53 10000000     0.00     0.00  numeric_add
  3.27      7.46     0.49 30000003     0.00     0.00  set_var_from_num
  3.00      7.91     0.45 100002652     0.00     0.00  MemoryContextAlloc
  2.74      8.32     0.41 10000001     0.00     0.00  heapgettup_pagemode
  2.54      8.70     0.38 10000000     0.00     0.00  advance_transition_function
  2.40      9.06     0.36 30000006     0.00     0.00  alloc_var
  2.27      9.40     0.34 10000000     0.00     0.00  do_numeric_avg_accum
  2.00      9.70     0.30 10000001     0.00     0.00  CopyArrayEls
  2.00     10.00     0.30 10000000     0.00     0.00  numeric_inc
  1.94     10.29     0.29 20000002     0.00     0.00  ArrayGetNItems
  1.94     10.58     0.29 10000001     0.00     0.00  deconstruct_array
  1.87     10.86     0.28 20000002     0.00     0.00  ArrayCastAndSet
  1.74     11.12     0.26 60001672     0.00     0.00  pfree
  1.67     11.37     0.25 10000001     0.00     0.00  slot_getattr
  1.60     11.61     0.24 10000000     0.00     0.00  advance_aggregates
  1.54     11.84     0.23 40000006     0.00     0.00  free_var
  1.54     12.07     0.23 10000001     0.00     0.00  datumCopy
  1.47     12.29     0.22 10000001     0.00     0.00  SeqNext
  1.40     12.50     0.21 20000000     0.00     0.00  add_var
  1.34     12.70     0.20 20000003     0.00     0.00  strip_var
  1.34     12.90     0.20 10000001     0.00     0.00  ExecScan
  1.27     13.09     0.19 10000003     0.00     0.00  AllocSetReset
  1.20     13.27     0.18 10000003     0.00     0.00  ExecProcNode
  1.13     13.44     0.17 70000010     0.00     0.00  pg_detoast_datum
  0.93     13.58     0.14 10000000     0.00     0.00  numeric_avg_accum
  0.93     13.72     0.14        2     0.07     6.61  ExecAgg
  0.87     13.85     0.13 10000001     0.00     0.00  datumGetSize
  0.87     13.98     0.13    87860     0.00     0.00  heapgetpage
  0.73     14.09     0.11 10000001     0.00     0.00  DirectFunctionCall2
  0.73     14.20     0.11 10000000     0.00     0.00  construct_array
  0.60     14.29     0.09 10000148     0.00     0.00  DirectFunctionCall1
  0.53     14.37     0.08 10000001     0.00     0.00  ExecStoreTuple
  0.53     14.45     0.08 10000000     0.00     0.00  HeapTupleSatisfiesSnapshot
  0.40     14.51     0.06 10000103     0.00     0.00  heap_getnext
  0.33     14.56     0.05   254419     0.00     0.00  hash_search_with_hash_value
  0.27     14.60     0.04 10000001     0.00     0.00  MemoryContextReset
  0.27     14.64     0.04 10000000     0.00     0.00  ExecEvalVar
  0.27     14.68     0.04 10000000     0.00     0.00  XidInSnapshot
  0.27     14.72     0.04   511482     0.00     0.00  LWLockRelease
  0.27     14.76     0.04   164939     0.00     0.00  hash_any
  0.27     14.80     0.04    87760     0.00     0.00  StrategyGetBuffer
  0.20     14.83     0.03 10000009     0.00     0.00  TransactionIdPrecedes
  0.20     14.86     0.03    87760     0.00     0.00  FileRead
  0.13     14.88     0.02 10000001     0.00     0.00  ExecSeqScan
  0.13     14.90     0.02   511481     0.00     0.00  LWLockAcquire
  0.13     14.92     0.02    88217     0.00     0.00  ReadBuffer
  0.13     14.94     0.02    87760     0.00     0.00  TerminateBufferIO
  0.07     14.95     0.01   175906     0.00     0.00  ResourceOwnerForgetBuffer
  0.07     14.96     0.01   163587     0.00     0.00  get_hash_value
  0.07     14.97     0.01    88019     0.00     0.00  ReleaseBuffer
  0.07     14.98     0.01    87760     0.00     0.00  PinBuffer_Locked
  0.00     14.98     0.00   176868     0.00     0.00  LockBuffer


Re: [PATCHES] Avg performance for int8/numeric

От
"Luke Lonergan"
Дата:
So, if I understand this correctly, we're calling Alloc and ContextAlloc 10
times for every row being summed?

There are approx 10M rows and the profile snippet below shows 100M calls to
each of those.

- Luke


On 11/24/06 4:46 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:

>  time   seconds   seconds    calls   s/call   s/call  name
>  14.42      2.16     2.16 100002977     0.00     0.00  AllocSetAlloc
>   9.08      3.52     1.36 20000000     0.00     0.00  add_abs
>   5.54      4.35     0.83 10000000     0.00     0.00  slot_deform_tuple
>   5.41      5.16     0.81 60001673     0.00     0.00  AllocSetFree
>   4.34      5.81     0.65 10000000     0.00     0.00  construct_md_array
>   4.21      6.44     0.63 20000003     0.00     0.00  make_result
>   3.54      6.97     0.53 10000000     0.00     0.00  numeric_add
>   3.27      7.46     0.49 30000003     0.00     0.00  set_var_from_num
>   3.00      7.91     0.45 100002652     0.00     0.00  MemoryContextAlloc






Re: [PATCHES] Avg performance for int8/numeric

От
Mark Kirkwood
Дата:
Mark Kirkwood wrote:

>>
>
> I've profiled the 2nd patch using the setup indicated below. The first
> 64 lines of the flat graph are attached.

By way of comparison, here is the first 63 lines for:

select sum(val2) from avgtest


Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 10.65      0.80     0.80 50002910     0.00     0.00  AllocSetAlloc
  8.66      1.45     0.65 10000000     0.00     0.00  slot_deform_tuple
  7.86      2.04     0.59 40001617     0.00     0.00  AllocSetFree
  7.46      2.60     0.56  9999999     0.00     0.00  add_abs
  6.13      3.06     0.46  9999999     0.00     0.00  numeric_add
  4.93      3.43     0.37  9999999     0.00     0.00  make_result
  4.53      3.77     0.34 10000000     0.00     0.00  advance_transition_function
  3.46      4.03     0.26 29999998     0.00     0.00  free_var
  3.46      4.29     0.26 10000003     0.00     0.00  AllocSetReset
  3.46      4.55     0.26 10000000     0.00     0.00  advance_aggregates
  3.33      4.80     0.25 50002592     0.00     0.00  MemoryContextAlloc
  3.20      5.04     0.24 10000001     0.00     0.00  heapgettup_pagemode
  3.06      5.27     0.23 19999999     0.00     0.00  set_var_from_num
  2.26      5.44     0.17 10000001     0.00     0.00  slot_getattr
  2.13      5.60     0.16 40001616     0.00     0.00  pfree
  2.13      5.76     0.16 10000003     0.00     0.00  ExecProcNode
  2.13      5.92     0.16 10000001     0.00     0.00  ExecScan
  1.73      6.05     0.13 10000103     0.00     0.00  heap_getnext
  1.60      6.17     0.12 10000001     0.00     0.00  SeqNext
  1.60      6.29     0.12 10000000     0.00     0.00  XidInSnapshot
  1.60      6.41     0.12  9999999     0.00     0.00  add_var
  1.46      6.52     0.11 19999999     0.00     0.00  alloc_var
  1.46      6.63     0.11 10000001     0.00     0.00  datumGetSize
  1.33      6.73     0.10 10000001     0.00     0.00  datumCopy
  1.20      6.82     0.09    87860     0.00     0.00  heapgetpage
  1.20      6.91     0.09        2     0.04     3.03  ExecAgg
  0.93      6.98     0.07 10000000     0.00     0.00  ExecEvalVar
  0.93      7.05     0.07 10000000     0.00     0.00  HeapTupleSatisfiesSnapshot
  0.80      7.11     0.06 20000000     0.00     0.00  pg_detoast_datum
  0.80      7.17     0.06   511486     0.00     0.00  LWLockRelease
  0.80      7.23     0.06   511485     0.00     0.00  LWLockAcquire
  0.67      7.28     0.05 10000001     0.00     0.00  ExecStoreTuple
  0.40      7.31     0.03 10000009     0.00     0.00  TransactionIdPrecedes
  0.40      7.34     0.03 10000001     0.00     0.00  MemoryContextReset
  0.40      7.37     0.03    87761     0.00     0.00  StrategyGetBuffer
  0.27      7.39     0.02 10000001     0.00     0.00  ExecSeqScan
  0.27      7.41     0.02  9999999     0.00     0.00  strip_var
  0.27      7.43     0.02   254355     0.00     0.00  hash_search_with_hash_value
  0.13      7.44     0.01   176862     0.00     0.00  LockBuffer
  0.13      7.45     0.01   175902     0.00     0.00  ResourceOwnerForgetBuffer
  0.13      7.46     0.01   175901     0.00     0.00  UnpinBuffer
  0.13      7.47     0.01   164908     0.00     0.00  hash_any
  0.13      7.48     0.01    88213     0.00     0.00  BufTableLookup
  0.13      7.49     0.01    88015     0.00     0.00  ReleaseBuffer
  0.13      7.50     0.01    87761     0.00     0.00  TerminateBufferIO
  0.13      7.51     0.01                             __divdi3
  0.00      7.51     0.00   175903     0.00     0.00  ResourceOwnerEnlargeBuffers
  0.00      7.51     0.00   175902     0.00     0.00  ResourceOwnerRememberBuffer
  0.00      7.51     0.00   164857     0.00     0.00  tag_hash
  0.00      7.51     0.00   163576     0.00     0.00  get_hash_value
  0.00      7.51     0.00   163174     0.00     0.00  BufTableHashCode
  0.00      7.51     0.00    88214     0.00     0.00  ReleaseAndReadBuffer
  0.00      7.51     0.00    88213     0.00     0.00  ReadBuffer
  0.00      7.51     0.00    87802     0.00     0.00  FileSeek
  0.00      7.51     0.00    87802     0.00     0.00  mdopen
  0.00      7.51     0.00    87761     0.00     0.00  BufTableInsert
  0.00      7.51     0.00    87761     0.00     0.00  FileAccess
  0.00      7.51     0.00    87761     0.00     0.00  FileRead

Re: [PATCHES] Avg performance for int8/numeric

От
"Luke Lonergan"
Дата:
Mark,

On 11/24/06 6:16 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:

> By way of comparison, here is the first 63 lines for:
> 
> select sum(val2) from avgtest

So, sum() is only alloc'ing 5 times for every row being processed, half of
what avg() is doing.

Seems like what we need to do is find a way to reuse the temp heaptuple
between calls.

- Luke




Re: [PATCHES] Avg performance for int8/numeric

От
Mark Kirkwood
Дата:
Luke Lonergan wrote:
> Mark,
> 
> On 11/24/06 6:16 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:
> 
>> By way of comparison, here is the first 63 lines for:
>>
>> select sum(val2) from avgtest
> 
> So, sum() is only alloc'ing 5 times for every row being processed, half of
> what avg() is doing.
> 
> Seems like what we need to do is find a way to reuse the temp heaptuple
> between calls.
> 
>

Yeah - and I'm *guessing* that its due to avg needing to 
deconstruct/construct a 2 element numeric array every call (whereas sum 
needs just a single numeric). So some delving into whether it is 
construct_md_array that can re-use a heaptuple or whether we need to 
look elsewhere...

Also Neil suggested investigating using a single composite type {int8, 
numeric} for the {N,sum(X)} transition values. This could well be a 
faster way to do this (not sure how to make it work yet... but it sounds 
promising...).

Cheers

Mark


Re: [PATCHES] Avg performance for int8/numeric

От
"Simon Riggs"
Дата:
On Sat, 2006-11-25 at 18:57 +1300, Mark Kirkwood wrote:
> Also Neil suggested investigating using a single composite type
> {int8, 
> numeric} for the {N,sum(X)} transition values. This could well be a 
> faster way to do this (not sure how to make it work yet... but it
> sounds 
> promising...).

If that is true it implies that any fixed length array is more expensive
than using a composite type. Is there something to be gained by changing
the basic representation of arrays, rather than rewriting all uses of
them?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: [PATCHES] Avg performance for int8/numeric

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Sat, 2006-11-25 at 18:57 +1300, Mark Kirkwood wrote:
>> Also Neil suggested investigating using a single composite type
>> {int8, 
>> numeric} for the {N,sum(X)} transition values. This could well be a 
>> faster way to do this (not sure how to make it work yet... but it
>> sounds 
>> promising...).

> If that is true it implies that any fixed length array is more expensive
> than using a composite type.

Not sure how you derived that conclusion from this statement, but it
doesn't appear to me to follow at all.  The reason for Neil's suggestion
was to avoid using numeric arithmetic to run a simple counter, and the
reason that this array stuff is expensive is that the array *components*
are variable-length, which is something that no amount of array
redesigning will eliminate.
        regards, tom lane


Re: [PATCHES] Avg performance for int8/numeric

От
Mark Kirkwood
Дата:
Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> On Sat, 2006-11-25 at 18:57 +1300, Mark Kirkwood wrote:
>>> Also Neil suggested investigating using a single composite type
>>> {int8, 
>>> numeric} for the {N,sum(X)} transition values. This could well be a 
>>> faster way to do this (not sure how to make it work yet... but it
>>> sounds 
>>> promising...).
> 
>> If that is true it implies that any fixed length array is more expensive
>> than using a composite type.
> 
> Not sure how you derived that conclusion from this statement, but it
> doesn't appear to me to follow at all.  The reason for Neil's suggestion
> was to avoid using numeric arithmetic to run a simple counter, and the
> reason that this array stuff is expensive is that the array *components*
> are variable-length, which is something that no amount of array
> redesigning will eliminate.
> 

Here is what I think the major contributors to the time spent in avg are:

1/ maintaining sum of squares in the transition array ~ 33%
2/ calling numeric_inc on a numeric counter ~ 10%
3/ deconstruct/construct array of 2 numerics ~ 16%

I derived these by constructing a (deliberately inefficient) version of 
sum that used an array of numerics and calculated extra stuff in its 
transaction array, and then started removing code a bit at a time to see 
what happened (I'm sure there are smarter ways... but this worked ok...).

The current patch does 1/, and doing a composite type of {int8, numeric} 
would let us use a an int8 counter instead of numeric, which would 
pretty much sort out 2/.

The array cost is more tricky - as Tom mentioned the issue is related to 
the variable length nature of the array components, so just changing to 
a composite type may not in itself save any of the (so-called) 'array 
cost'. Having said that - the profiles suggest that we are perhaps doing 
a whole lot more alloc'ing (i.e copying? detoasting?) of memory for 
numerics than perhaps we need... I'm not sure how deeply buried the 
decision about alloc'ing is being made, so doing anything about this may 
be hard.

It looks to me like trying out a composite type is the next obvious step 
to do, and then (once I've figured out how so that) we can check its 
performance again!

Cheers

Mark


Re: [PATCHES] Avg performance for int8/numeric

От
Tom Lane
Дата:
Mark Kirkwood <markir@paradise.net.nz> writes:
> ... Having said that - the profiles suggest that we are perhaps doing 
> a whole lot more alloc'ing (i.e copying? detoasting?) of memory for 
> numerics than perhaps we need...

Yeah.  We've looked at this in the past and not figured out any
particularly desirable answer for variable-size accumulator state.
There is a hack in there for fixed-size pass-by-reference state (see
count(*)).  It's possible that you could get some traction by only doing
a palloc when the state size has to increase --- with a custom state
type it'd be possible to keep track of the allocated size as well as the
actual current length of the numeric sum.  Another idea to consider in
this context is avoiding repeated numeric pack/unpack overhead by
storing the running sum in the "calculation variable" format, but I'm
not sure how deep you'd need to burrow into numeric.c to allow that.

Most of this depends on being able to have a transition state value
that isn't any standard SQL datatype.  We've discussed that recently
in I-forget-what-context, and didn't find a good answer yet.  I think
you can find the thread by searching for a discussion of using type
"internal" as the state datatype --- which is an idea that doesn't work
by itself because of the security holes it'd open in the type system.
        regards, tom lane


Re: [PATCHES] Avg performance for int8/numeric

От
Mark Kirkwood
Дата:
Tom Lane wrote:
> Mark Kirkwood <markir@paradise.net.nz> writes:
>> ... Having said that - the profiles suggest that we are perhaps doing 
>> a whole lot more alloc'ing (i.e copying? detoasting?) of memory for 
>> numerics than perhaps we need...
> 
> Yeah.  We've looked at this in the past and not figured out any
> particularly desirable answer for variable-size accumulator state.
> There is a hack in there for fixed-size pass-by-reference state (see
> count(*)).  It's possible that you could get some traction by only doing
> a palloc when the state size has to increase --- with a custom state
> type it'd be possible to keep track of the allocated size as well as the
> actual current length of the numeric sum.  Another idea to consider in
> this context is avoiding repeated numeric pack/unpack overhead by
> storing the running sum in the "calculation variable" format, but I'm
> not sure how deep you'd need to burrow into numeric.c to allow that.
>

Right - I figured it would probably be hard :-(. It looks worth 
investigating, but might be a more longer term project (for me anyway, 
lots of stuff to read in there!).

> Most of this depends on being able to have a transition state value
> that isn't any standard SQL datatype.  We've discussed that recently
> in I-forget-what-context, and didn't find a good answer yet.  I think
> you can find the thread by searching for a discussion of using type
> "internal" as the state datatype --- which is an idea that doesn't work
> by itself because of the security holes it'd open in the type system.
> 

Interesting, I didn't think of doing that - was considering creating a 
suitable SQL composite type - a bit crass I guess, but I might just try 
that out anyway and see what sort of improvement it gives (we can then 
discuss how to do it properly in the advent that it's good....).

Cheers

Mark



Re: [PATCHES] Avg performance for int8/numeric

От
Tom Lane
Дата:
Mark Kirkwood <markir@paradise.net.nz> writes:
> Tom Lane wrote:
>> Most of this depends on being able to have a transition state value
>> that isn't any standard SQL datatype.  We've discussed that recently
>> in I-forget-what-context, and didn't find a good answer yet.

> Interesting, I didn't think of doing that - was considering creating a 
> suitable SQL composite type - a bit crass I guess, but I might just try 
> that out anyway and see what sort of improvement it gives (we can then 
> discuss how to do it properly in the advent that it's good....).

The thing is that (a) composite types have *at least* as much overhead
as arrays, probably rather more; and (b) a composite type in itself
doesn't allow non-SQL types as components, so still doesn't let you
tackle the idea of keeping the running sum in numeric.c's internal
calculation format.  So I don't think this will prove much --- the only
gain you can get is the count-in-int8-instead-of-numeric bit, which is
interesting but there is much left on the table.
        regards, tom lane


Re: [PATCHES] Avg performance for int8/numeric

От
Mark Kirkwood
Дата:
Tom Lane wrote:
> Mark Kirkwood <markir@paradise.net.nz> writes:
>> Tom Lane wrote:
>>> Most of this depends on being able to have a transition state value
>>> that isn't any standard SQL datatype.  We've discussed that recently
>>> in I-forget-what-context, and didn't find a good answer yet.
> 
>> Interesting, I didn't think of doing that - was considering creating a 
>> suitable SQL composite type - a bit crass I guess, but I might just try 
>> that out anyway and see what sort of improvement it gives (we can then 
>> discuss how to do it properly in the advent that it's good....).
> 
> The thing is that (a) composite types have *at least* as much overhead
> as arrays, probably rather more; and (b) a composite type in itself
> doesn't allow non-SQL types as components, so still doesn't let you
> tackle the idea of keeping the running sum in numeric.c's internal
> calculation format.  So I don't think this will prove much --- the only
> gain you can get is the count-in-int8-instead-of-numeric bit, which is
> interesting but there is much left on the table.
> 

Right - I spent this afternoon coming to pretty much the same conclusion...

So I guess the best way forward is to make do for the time being with 
the savings gained by not calculating sumX2, and revisit avg (and 
variance etc) when we know how to do non-SQL state types.

Cheers

Mark


Re: [PATCHES] Avg performance for int8/numeric

От
"Simon Riggs"
Дата:
On Mon, 2006-11-27 at 18:22 +1300, Mark Kirkwood wrote:
> Tom Lane wrote:
> > Mark Kirkwood <markir@paradise.net.nz> writes:
> >> Tom Lane wrote:
> >>> Most of this depends on being able to have a transition state value
> >>> that isn't any standard SQL datatype.  We've discussed that recently
> >>> in I-forget-what-context, and didn't find a good answer yet.
> > 
> >> Interesting, I didn't think of doing that - was considering creating a 
> >> suitable SQL composite type - a bit crass I guess, but I might just try 
> >> that out anyway and see what sort of improvement it gives (we can then 
> >> discuss how to do it properly in the advent that it's good....).
> > 
> > The thing is that (a) composite types have *at least* as much overhead
> > as arrays, probably rather more; and (b) a composite type in itself
> > doesn't allow non-SQL types as components, so still doesn't let you
> > tackle the idea of keeping the running sum in numeric.c's internal
> > calculation format.  So I don't think this will prove much --- the only
> > gain you can get is the count-in-int8-instead-of-numeric bit, which is
> > interesting but there is much left on the table.
> > 
> 
> Right - I spent this afternoon coming to pretty much the same conclusion...
> 
> So I guess the best way forward is to make do for the time being with 
> the savings gained by not calculating sumX2, and revisit avg (and 
> variance etc) when we know how to do non-SQL state types.

IIRC the calculation format for NUMERIC is simply less padded than the
on-disk version. It should be possible to create it as a normal type
that never gets used apart from this situation.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com