Re: [PATCH] Negative Transition Aggregate Functions (WIP)

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Дата
Msg-id 7553725C-8BB1-4DAA-9815-0A5F963B4988@phlo.org
обсуждение исходный текст
Ответ на Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Ответы Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: [PATCH] Negative Transition Aggregate Functions (WIP)  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
On Apr9, 2014, at 23:17 , Florian Pflug <fgp@phlo.org> wrote:

> On Apr9, 2014, at 21:35 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A quick test says that avg(int4)
>> is about five percent slower than sum(int4), so that's the kind of hit
>> we'd be taking on non-windowed aggregations if we do it like this.
> 
> That's rather surprising though. AFAICS, there's isn't much difference
> between the two transfer functions int4_sum and int4_avg_accum at all.
> 
> The differences come down to (+ denoting things which ought to make
> int4_avg_accum slower compared to int4_sum, - denoting the opposite)
> 
> 1. +) int4_avg_accum calls AggCheckCallContext
> 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
> 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
>       to verify that the state is a 2-element array without NULL entries
> 4. -) int4_sum checks if the input is NULL

I've done a bit of profiling on this (using Instruments.app on OSX). One
thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
since we know that the datum cannot possibly be toasted I think (or if it
was, nodeAgg.c should do the de-toasting).

The profile also attributes a rather large percent of the total runtime of
int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
rid of that would help quite a few transition functions, invertible or not.
That certainly seems doable - we'd need a way to mark functions as internal
support functions, and prevent user-initiated calls of such functions. 
Transition functions marked that way could then safely scribble over their
state arguments without having to consult AggCheckCallContext() first.

I've also compared the disassemblies of int4_sum and int4_avg_accum, and
apart from these differences, they are pretty similar.

Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's
transition type is that it needs to return NULL, not 0, if zero rows
were aggregates. It might seems that it could just use int8 as state
with initial value NULL, but that only works if the transition functions
are non-strict (since the special case of state type == input type doesn't
apply here). And for non-strict transition functions need to deal with
NULL inputs themselves, which means counting the number of non-NULL inputs..

That problem would go away though if we had support for an initalfunction,
which would receive the first input value and initialize the state. In
the case of SUM(), the initial function would be strict, and thus would be
called on the first non-NULL input value, which it'd convert to int8 and
return as the new state. 

However, I still believe the best approach at this point is to just work
on making int4_avg_accum faster. I still see no principal reason what it
has to be noticeably slower - the only additional work it absolutely *has*
to perform is *one* 64-bit increment.

best regards,
Florian Pflug




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Per table autovacuum vacuum cost limit behaviour strange
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: [PATCH] Negative Transition Aggregate Functions (WIP)