Re: Deduplicate aggregates and transition functions in planner

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Deduplicate aggregates and transition functions in planner
Дата
Msg-id 16a1e564-1fa5-b4ea-c512-9ca133ea4bc0@iki.fi
обсуждение исходный текст
Ответ на Re: Deduplicate aggregates and transition functions in planner  (Andres Freund <andres@anarazel.de>)
Ответы Re: Deduplicate aggregates and transition functions in planner  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 28/10/2020 21:59, Andres Freund wrote:
> On 2020-10-28 21:10:41 +0200, Heikki Linnakangas wrote:
>> Currently, ExecInitAgg() performs quite a lot of work, to deduplicate
>> identical Aggrefs, as well as Aggrefs that can share the same transition
>> state. That doesn't really belong in the executor, we should perform that
>> work in the planner. It doesn't change from one invocation of the plan to
>> another, and it would be nice to reflect the state-sharing in the plan
>> costs.
> 
> Woo! Very glad to see this tackled.
> 
> It wouldn't surprise me to see a small execution time speedup here -
> I've seen the load of the aggno show up in profiles.

I think you'd be hard-pressed to find a real-life query where it 
matters. But if you don't care about real life:

regression=# do $$
begin
   for i in 1..100000 loop
     perform sum(g), sum(g+0), sum(g+1), sum(g+2), sum(g+3), sum(g+4), 
sum(g+5), sum(g+6), sum(g+7), sum(g+8), sum(g+9), sum(g+10) from 
generate_series(1,1) g;
   end loop;
end;
$$;
DO
Time: 1282.701 ms (00:01.283)

vs.

Time: 860.323 ms

with the patch.

>> @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
>>   
>>                   scratch.opcode = EEOP_AGGREF;
>>                   scratch.d.aggref.astate = astate;
>> -                astate->aggref = aggref;
>> +                astate->aggno = aggref->aggno;
>>   
>>                   if (state->parent && IsA(state->parent, AggState))
>>                   {
>>                       AggState   *aggstate = (AggState *) state->parent;
>>   
>> -                    aggstate->aggs = lappend(aggstate->aggs, astate);
>> -                    aggstate->numaggs++;
>> +                    aggstate->aggs = lappend(aggstate->aggs, aggref);
> 
> Hm. Why is aggstate->aggs still built during expression initialization?
> Imo that's a pretty huge wart that also introduces more
> order-of-operation brittleness to executor startup.

The Agg node itself doesn't include any information about the aggregates 
and transition functions. Because of that, ExecInitAgg needs a 
"representive" Aggref for each transition state and agg, to initialize 
the per-trans and per-agg structs. The expression initialization makes 
those Aggrefs available for ExecInitAgg.

Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could 
set each representative Aggref directly in the right per-trans and 
per-agg struct, based on the 'aggno' and 'aggtransno' fields. That 
requires initializing the per-trans and per-agg arrays earlier, and for 
that, we would need to store the # of aggs and transition states in the 
Agg node, like you also suggested. Certainly doable, but on the whole, 
it didn't really seem better to me. Attached is a patch, to demonstrate 
what that looks like, on top of the main patch. It's not complete, 
there's at least one case with hash-DISTINCT for queries like "SELECT 
DISTINCT aggregate(x) ..." where the planner creates an Agg for the 
DISTINCT without aggregates, but the code currently passes numAggs=1 to 
the executor. Some further changes would be needed in the planner, to 
mark the AggPath generated for deduplication differently from the 
AggPaths created for aggregation. Again that's doable, but on the whole 
I prefer the approach to scan the Aggrefs in executor startup, for now.

I'd like to get rid of the "representative Aggrefs" altogether, and pass 
information about the transition and final functions from planner to 
executor in some other form. But that's exactly what got me into the 
refactoring that was ballooning out of hand that I mentioned.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiro Ikeda
Дата:
Сообщение: Re: Add statistics to pg_stat_wal view for wal related parameter tuning
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Parallel copy