Re: Combining Aggregates

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Combining Aggregates
Дата
Msg-id CAKJS1f9hGtS0c1Zc_uGDvNARDXbC4Siie873BJ74Bhu1-YxEtQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 18 December 2015 at 01:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
# select sum(x::numeric) from generate_series(1,3) x(x);
ERROR:  invalid memory alloc request size 18446744072025250716

The reason that this happens is down to the executor always thinking that Aggref returns the aggtype, but in cases where we're not finalizing the aggregate the executor needs to know that we're actually returning aggtranstype instead. Hash aggregates appear to work as the trans value is just stuffed into a hash table, but with plain and sorted aggregates this gets formed into a Tuple again, and forming a tuple naturally requires the types to be set correctly! I'm not sure exactly what the best way to fix this will be. I've hacked something up in the attached test patch which gets around the problem by adding a new aggtranstype to Aggref and also an 'aggskipfinalize'  field which I manually set to true in a bit of a hacky way inside the grouping planner. Then in exprType() for Aggref I conditionally return the aggtype or aggtranstype based on the aggskipfinalize setting.  This is most likely not the way to properly fix this, but I'm running out of steam today to think of how it should be done, so I'm currently very open to ideas on this.

 Ok, so it seems that my mindset was not in parallel process space when I was thinking about this.  I think having the pointer in the Tuple is probably going to be fine for this multiple stage aggregation when that's occurring in a single backend process, but obviously if the memory that the pointer points to belongs to a worker process in a parallel aggregate situation, then bad things will happen.

Now, there has been talk of this previously, on various threads, but I don't believe any final decisions were made on how exactly it should be done. At the moment I plan to make changes as follows:

  1.  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and aggserialtype These will only be required when aggtranstype is INTERNAL. Perhaps we should disallow CREATE AGGREAGET from accepting them for any other type... The return type of aggserialfn should be aggserialtype, and it should take a single parameter of aggtranstype. aggdeserialfn will be the reverse of that. 
  2. Add a new bool field to nodeAgg's state named serialStates. If this is field is set to true then when we're in finalizeAgg = false mode, we'll call the serialfn on the agg state instead of the finalfn. This will allow the serialized state to be stored in the tuple and sent off to the main backend.  The combine agg node should also be set to serialStates = true, so that it knows to deserialize instead of just assuming that the agg state is of type aggtranstype.
I believe this should allow us to not cause any performance regressions by moving away from INTERNAL agg states. It should also be very efficient for internal states such as Int8TransTypeData, as this struct merely has 2 int64 fields which should be very simple to stuff into a bytea varlena type. We don't need to mess around converting the ->count and ->sum into a strings or anything.

Then in order for the planner to allow parallel aggregation all aggregates must:
  1. Not have a DISTINCT or ORDER BY clause
  2. Have a combinefn
  3. If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn.
We can relax the requirement on 3 if we're using 2-stage aggregation, but not parallel aggregation.

Any objections, or better ideas?

That just leaves me to figure out how to set the correct Aggref->aggtype during planning, as now there's 3 possible types:

if (finalizeAggs == false)
{
   if (serialStates == true)
      type = aggserialtype;
  else
     type = aggtranstype;
}
else
  type = prorettype; /* normal case */

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Freeze avoidance of very large table.
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Optimizing away second VACUUM heap scan when only BRIN indexes on table