Re: Combining Aggregates

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Combining Aggregates
Дата
Msg-id CAKJS1f8MB4E=NMzxcHK5Tz6jBX8avOr7v7hpx57LF-GJQwETsg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Combining Aggregates  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Re: Combining Aggregates  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
On 21 December 2015 at 22:02, David Rowley <david.rowley@2ndquadrant.com> wrote:
 At the moment I plan to make changes as follows:

  1.  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and aggserialtype These will only be required when aggtranstype is INTERNAL. Perhaps we should disallow CREATE AGGREAGET from accepting them for any other type... The return type of aggserialfn should be aggserialtype, and it should take a single parameter of aggtranstype. aggdeserialfn will be the reverse of that. 
  2. Add a new bool field to nodeAgg's state named serialStates. If this is field is set to true then when we're in finalizeAgg = false mode, we'll call the serialfn on the agg state instead of the finalfn. This will allow the serialized state to be stored in the tuple and sent off to the main backend.  The combine agg node should also be set to serialStates = true, so that it knows to deserialize instead of just assuming that the agg state is of type aggtranstype.

I've attached an updated patch which implements this. 

combine_aggs_test_v3.patch can be applied on top of the other patch to have the planner generate partial aggregate plans. It is also possible to further hack this test patch to remove the combine Agg node from planner.c in order to have the intermediate states output. I've also invented a serialize and deseriaize function for avg(NUMERIC), and SUM(NUMERIC). Quite possible this is what we'd want a remote server to send us if we get remote aggregation on a server grid one day in the future. With the patch modified to not generate the combine aggs node we get:

# select sum(x::numeric) from generate_series(1,1000) x(x);

----------------------
 1000 500500 0 1000 0
(1 row)

This is just my serial format that I've come up with for NumericAggState, which is basically: "N sumX maxScale maxScaleCount NaNcount". Perhaps we can come up with something better, maybe just packing the ints and int64s into a bytea type and putting the text version of sumX on the end... I'm sure we can think of something more efficient between us, but I think the serial state should definitely be cross platform e.g if we do the bytea thing, then the ints should be in network byte order so that a server cluster can have a mix of little and big-endian processors.  

I've also left a failing regression test because I'm not quite sure how to deal with it:

--- 283,292 ----
  WHERE p1.prorettype = 'internal'::regtype AND NOT
      'internal'::regtype = ANY (p1.proargtypes);
   oid  |         proname         
! ------+-------------------------
!  3319 | numeric_avg_deserialize
   2304 | internal_in

This is basically making sure that there's only 1 function that takes zero internal types as parameters, but returns an INTERNAL type. I've made it so that numeric_avg_deserialize() can only be called from an aggregate context, which hopefully makes that safe again, but I'm perhaps not imagining hard enough for ways that this could be abused, so I've held off for now in updated the expected output on that to include the new function.

One other part that I'm not too sure on how to deal with is how to set the data type for the Aggrefs when we're not performing finalization on the aggregate node. The return type for the Aggref in this case will be either the transtype, or the serialtype, depending on if we're serializing the states or not. To do this, I've so far just come up with set_partialagg_aggref_types() which is called during setrefs. The only other time that I can think to do this return type update would be when building the partial agg node's target list. I'm open to better ideas on this part.

Apart from the problems I listed above, I'm reasonably happy with the patch now, and I'm ready for someone else to take a serious look at it.

Many thanks

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: parallel joins, and better parallel explain
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Speed up Clog Access by increasing CLOG buffers