Hi,
On 2017-11-27 16:31:21 -0800, Andres Freund wrote:
> this is part of my work to make expression evaluation JITable. In a lot
> of analytics queries the major bottleneck is transition function
> invocation (makes sense, hardly anyone wants to see billions of
> rows). Therefore for JITing to be really valuable transition function
> stuff needs to be JITable.
>
> Excerpt from the preliminary commit message:
>
> Previously aggregate transition and combination functions were invoked
> by special case code in nodeAgg.c, evaluting input and filters
> separately using the expression evaluation machinery. That turns out
> to not be great for performance for several reasons:
> - repeated expression evaluations have some cost
> - the transition functions invocations are poorly predicted
> - filter and input computation had to be done separately
> - the special case code made it hard to implement JITing of the whole
> transition function invocation
>
> Address this by building one large expression that computes input,
> evaluates filters, and invokes transition functions.
>
> This leads to moderate speedups in queries bottlenecked by aggregate
> computations, and enables large speedups for similar cases once JITing
> is done.
> While this gets rid of a substantial amount of duplication between the
> infrastructure for transition and combine functions, it still increases
> codesize a bit.
There's still two callers of advance_transition_function() left, namely
process_ordered_aggregate_{single,multi}. Rearchitecting this so they
also go through expression-ified transition invocation seems like
material for a separate patch, this is complicated enough...
> Todo / open Questions:
> - Location of transition function building functions. Currently they're
> in execExpr.c. That allows not to expose a bunch of functions local to
> it, but requires exposing some aggregate structs to the world. We
> could go the other way round as well.
I've left this as is.
> - Right now we waste a bunch of time by having to access transition
> states indexed by both grouping set number and the transition state
> offset therein. It'd be nicer if we could cheaply reduce the number of
> indirections, but I can't quite see how without adding additional
> complications.
I've left this as is.
Here's a considerably polished variant of this patch. I plan to do
another round of polishing next week, and then push it, unless somebody
else has comments.
Regards,
Andres