Re: Statistical aggregate functions are not working with PARTIAL aggregation
От | David Rowley |
---|---|
Тема | Re: Statistical aggregate functions are not working with PARTIAL aggregation |
Дата | |
Msg-id | CAKJS1f_Cc6D+Jm7oo=dQ0kdWqMuzDpWFE3PzR8kPy2DSnc8YsQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Statistical aggregate functions are not working with PARTIALaggregation (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Список | pgsql-hackers |
On Mon, 20 May 2019 at 19:59, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > - numArguments = get_aggregate_argtypes(aggref, inputTypes); > + numTransFnArgs = get_aggregate_argtypes(aggref, transFnInputTypes); > > If the function retrieves argument types of transform functions, > it would be better that the function name is > get_aggregate_transargtypes() and Aggref.aggargtypes has the name > like aggtransargtypes. Probably that would be a better name. > /* Detect how many arguments to pass to the finalfn */ > if (aggform->aggfinalextra) > - peragg->numFinalArgs = numArguments + 1; > + peragg->numFinalArgs = numTransFnArgs + 1; > else > peragg->numFinalArgs = numDirectArgs + 1; > > I can understand the aggfinalextra case, but cannot understand > another. As Andres said I think the code requires an explanation > of why the final args is not numTransFnArgs but *numDirectArgs > plus 1*. numDirectArgs will be 0 for anything apart from order-set aggregate, so in this case, numFinalArgs will become 1, since the final function just accepts the aggregate state. For ordered-set aggregates like, say percentile_cont the finalfn also needs the argument that was passed into the aggregate. e.g percentile_cont(0.5), the final function needs to know about 0.5. In this case 0.5 is the direct arg and the indirect args are what are in the order by clause, x in WITHIN GROUP (ORDER BY x). At least that's my understanding. > + /* > + * When combining there's only one input, the to-be-combined > + * added transition value from below (this node's transition > + * value is counted separately). > + */ > + pertrans->numTransInputs = 1; > > I believe this works but why the member has not been set > correctly by the creator of the aggsplit? Not quite sure what you mean here. We need to set this based on if we're dealing with a combine function or a trans function. > + /* Detect how many arguments to pass to the transfn */ > > I want to have a comment there that explains why what ordered-set > requires is not numTransFnArgs + (#sort cols?), but > "list_length(aggref->args)", or a comment that explanas why they > are compatible to be expplained. Normal aggregate ORDER BY clauses are handle in nodeAgg.c, but ordered-set aggregate's WITHIN GROUP (ORDER BY ..) args are handled in the aggregate's transition function. > > I ended up also renaming aggtransfn to transfn_oid in > > build_pertrans_for_aggref(). Having it called aggtranfn seems a bit > > too close to the pg_aggregate.aggtransfn column which is confusion > > given that we might pass it the value of the aggcombinefn column. > > Is Form_pg_aggregate->aggtransfn different thing from > transfn_oid? It seems very confusing to me apart from the naming. I don't think this is explained very well in the code, so I understand your confusion. Some of the renaming work I've been trying to do is to try to make this more clear. Basically when we're doing the "Finalize Aggregate" stage after having performed a previous "Partial Aggregate", we must re-aggregate all the aggregate states that were made in the Partial Aggregate stage. Some of these states might need to be combined together if they both belong to the same group. That's done with the function mentioned in pg_aggregate.aggcombinefn. Whether we're doing a normal "Aggregate" or a "Finalize Aggregate" the actual work to do is not all that different, the only real difference is that we're aggregating previously aggregated states rather than normal values. Since the rest of the work the same, we run it through the same code in nodeAgg.c, only we use the pg_aggregate.aggcombinefn for "Finalize Aggregate" and use pg_aggregate.aggtransfn for nodes shown as "Aggregate" and "Partial Aggregate" in explain. That's sort of simplified as really a node shown as "Partial Aggregate" in EXPLAIN is just not calling the aggfinalfn and "Finalize Aggregate" is. The code in nodeAgg.c technically supports re-combining previously aggregated states and not finalizing them. You might wonder why you might do that. It was partially a by-product of how the code was written, but also I had in mind about clustered servers aggregating large datasets on remote servers running parallel aggregates on each server then each server sending back the partially aggregated states back to the main server to be re-combined and finalized -- 3-stage aggregation. Changes were made after the initial partial aggregate commit to partially remove the ability to form paths in this shape this in the planner code, but that's mostly just removed support in explain.c and changing bool flags in favour of the AggSplit enum which lacks a combination of values to do that. We'd need an AGGSPLIT_COMBINE_SKIPFINAL_DESERIAL_SERIAL... or something, to make that work. (I'm surprised not to see more AggSplit values for partition-wise aggregate since that shouldn't need to serialize and deserialize the states...) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Matwey V. Kornilov"Дата:
Сообщение: [PATCH v2] Introduce spgist quadtree @<(point,circle) operator