Обсуждение: Idea for aggregates

Поиск
Список
Период
Сортировка

Idea for aggregates

От
Greg Stark
Дата:
Simon, Dmitri, Peter Eisentraut, and I were chatting at PGConfNYC and
we had an idea for something interesting to do with aggregates.
Interestingly Simon and I came at this from two different directions
but realized we needed the same functionality to implement what we
wanted.

The basic idea is to separate the all the properties of the aggregate
functions except the final function from the final function into a
separate object. Giving the optimizer the knowledge that multiple
aggregate functions use the share the same basic machinery and
semantics for the state is the magic sauce that's a prerequisite for
the several ideas we were each thinking of.

I'm imagining something like (though I'm really not wedded to this
syntax at all):

CREATE AGGREGATE CLASS class_name ( input_data_type [ , ... ] ) (   SFUNC = sfunc,   STYPE = state_data_type   [ ,
INITCOND= initial_condition ]   [ , SORTOP = sort_operator ]
 
)

CREATE AGGREGATE func_name ( class_name ) (        SFUNC = sfunc
)

The idea then is that this should enable a number of useful optimizations:

1) If the planner sees multiple aggregates in the target list
belonging to the same class and having the same arguments then it
knows it can keep just one transition state varible for all of them.

2) For materialized views the materialized view can keep just the
state variable rather than the final aggregate needed for any
aggregates in the materialized view. This would enable users to query
any other aggregate function in the same class on the same column even
if it wasn't included in the original materialized view definition.

This kind of thing is probably even more important down the road for
incrementally updating materialized views.

3) I envision treating the aggregate classes as real types so you
could do something like

CREATE TABLE user (userid integer, page_timing aggregate class
numeric_agg_class)
And later update the row by doing something like update user set
page_timing = page_timing + $1 where id = $2
And still later call select page_timing.count(), page_timing.sum(),
page_timing.stddev() from user where id = $1

Except of course this syntax wouldn't work quite right. I haven't
thought of a better syntax yet though.

-- 
greg



Re: Idea for aggregates

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> The basic idea is to separate the all the properties of the aggregate
> functions except the final function from the final function into a
> separate object. Giving the optimizer the knowledge that multiple
> aggregate functions use the share the same basic machinery and
> semantics for the state is the magic sauce that's a prerequisite for
> the several ideas we were each thinking of.

Why exactly do you need to invent an "aggregate class" concept for this?
Can't the planner just look in pg_aggregate to see that the
sfunc/stype/initcond are the same for two different aggregates?
        regards, tom lane



Re: Idea for aggregates

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> Well in many cases stype will just be internal for many of them. That
> doesn't mean they're the same.

> Hm, I suppose it might if they have the same sfunc.

> This is actually where I started but we concluded that we needed some
> declaration that the aggregates were actually related and would interpret
> the state the same way and not just that it happened to use the same
> storage format.

Well, in practice you'd need to also compare the input datatype (consider
polymorphic aggregates) and initcond.  But the sfunc isn't told which
finalfunc will be applied, so any aggregates that share the same sfunc and
have the other conditions the same *must* have the identical transition
data behavior.  I don't see any reason to invent new syntax, and there
are good reasons not to if we don't have to.
        regards, tom lane



Re: Idea for aggregates

От
Greg Stark
Дата:
<p dir="ltr">Well in many cases stype will just be internal for many of them. That doesn't mean they're the same.<p
dir="ltr">Hm,I suppose it might if they have the same sfunc.<p dir="ltr">This is actually where I started but we
concludedthat we needed some declaration that the aggregates were actually related and would interpret the state the
sameway and not just that it happened to use the same storage format. 

Re: Idea for aggregates

От
Simon Riggs
Дата:
On 5 April 2014 04:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> Well in many cases stype will just be internal for many of them. That
>> doesn't mean they're the same.
>
>> Hm, I suppose it might if they have the same sfunc.
>
>> This is actually where I started but we concluded that we needed some
>> declaration that the aggregates were actually related and would interpret
>> the state the same way and not just that it happened to use the same
>> storage format.
>
> Well, in practice you'd need to also compare the input datatype (consider
> polymorphic aggregates) and initcond.  But the sfunc isn't told which
> finalfunc will be applied, so any aggregates that share the same sfunc and
> have the other conditions the same *must* have the identical transition
> data behavior.  I don't see any reason to invent new syntax, and there
> are good reasons not to if we don't have to.

Definitely happy not to have additional syntax. So we can just
dynamically group the aggregates together.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services