Обсуждение: Idea for aggregates
Simon, Dmitri, Peter Eisentraut, and I were chatting at PGConfNYC and we had an idea for something interesting to do with aggregates. Interestingly Simon and I came at this from two different directions but realized we needed the same functionality to implement what we wanted. The basic idea is to separate the all the properties of the aggregate functions except the final function from the final function into a separate object. Giving the optimizer the knowledge that multiple aggregate functions use the share the same basic machinery and semantics for the state is the magic sauce that's a prerequisite for the several ideas we were each thinking of. I'm imagining something like (though I'm really not wedded to this syntax at all): CREATE AGGREGATE CLASS class_name ( input_data_type [ , ... ] ) ( SFUNC = sfunc, STYPE = state_data_type [ , INITCOND= initial_condition ] [ , SORTOP = sort_operator ] ) CREATE AGGREGATE func_name ( class_name ) ( SFUNC = sfunc ) The idea then is that this should enable a number of useful optimizations: 1) If the planner sees multiple aggregates in the target list belonging to the same class and having the same arguments then it knows it can keep just one transition state varible for all of them. 2) For materialized views the materialized view can keep just the state variable rather than the final aggregate needed for any aggregates in the materialized view. This would enable users to query any other aggregate function in the same class on the same column even if it wasn't included in the original materialized view definition. This kind of thing is probably even more important down the road for incrementally updating materialized views. 3) I envision treating the aggregate classes as real types so you could do something like CREATE TABLE user (userid integer, page_timing aggregate class numeric_agg_class) And later update the row by doing something like update user set page_timing = page_timing + $1 where id = $2 And still later call select page_timing.count(), page_timing.sum(), page_timing.stddev() from user where id = $1 Except of course this syntax wouldn't work quite right. I haven't thought of a better syntax yet though. -- greg
Greg Stark <stark@mit.edu> writes: > The basic idea is to separate the all the properties of the aggregate > functions except the final function from the final function into a > separate object. Giving the optimizer the knowledge that multiple > aggregate functions use the share the same basic machinery and > semantics for the state is the magic sauce that's a prerequisite for > the several ideas we were each thinking of. Why exactly do you need to invent an "aggregate class" concept for this? Can't the planner just look in pg_aggregate to see that the sfunc/stype/initcond are the same for two different aggregates? regards, tom lane
Greg Stark <stark@mit.edu> writes: > Well in many cases stype will just be internal for many of them. That > doesn't mean they're the same. > Hm, I suppose it might if they have the same sfunc. > This is actually where I started but we concluded that we needed some > declaration that the aggregates were actually related and would interpret > the state the same way and not just that it happened to use the same > storage format. Well, in practice you'd need to also compare the input datatype (consider polymorphic aggregates) and initcond. But the sfunc isn't told which finalfunc will be applied, so any aggregates that share the same sfunc and have the other conditions the same *must* have the identical transition data behavior. I don't see any reason to invent new syntax, and there are good reasons not to if we don't have to. regards, tom lane
<p dir="ltr">Well in many cases stype will just be internal for many of them. That doesn't mean they're the same.<p dir="ltr">Hm,I suppose it might if they have the same sfunc.<p dir="ltr">This is actually where I started but we concludedthat we needed some declaration that the aggregates were actually related and would interpret the state the sameway and not just that it happened to use the same storage format.
On 5 April 2014 04:18, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@mit.edu> writes: >> Well in many cases stype will just be internal for many of them. That >> doesn't mean they're the same. > >> Hm, I suppose it might if they have the same sfunc. > >> This is actually where I started but we concluded that we needed some >> declaration that the aggregates were actually related and would interpret >> the state the same way and not just that it happened to use the same >> storage format. > > Well, in practice you'd need to also compare the input datatype (consider > polymorphic aggregates) and initcond. But the sfunc isn't told which > finalfunc will be applied, so any aggregates that share the same sfunc and > have the other conditions the same *must* have the identical transition > data behavior. I don't see any reason to invent new syntax, and there > are good reasons not to if we don't have to. Definitely happy not to have additional syntax. So we can just dynamically group the aggregates together. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services