Re: Improve hash-agg performance

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Improve hash-agg performance
Дата	4 ноября 2016 г. 16:35:28
Msg-id	20161104133521.ud7dkcewhc6tigas@alap3.anarazel.de обсуждение исходный текст
Ответ на	Re: Improve hash-agg performance (Heikki Linnakangas <hlinnaka@iki.fi>)
Список	pgsql-hackers

Дерево обсуждения

On 2016-11-04 15:18:49 +0200, Heikki Linnakangas wrote:
> On 11/03/2016 01:07 PM, Andres Freund wrote:
> > Hi,
> > 
> > There's two things I found while working on faster expression
> > evaluation, slot deforming and batched execution. As those two issues
> > often seemed quite dominant cost-wise it seemed worthwhile to evaluate
> > them independently.
> > 
> > 1) We atm do one ExecProject() to compute each aggregate's
> >    arguments. Turns out it's noticeably faster to compute the argument
> >    for all aggregates in one go. Both because it reduces the amount of
> >    function call / moves more things into a relatively tight loop, and
> >    because it allows to deform all the required columns at once, rather
> >    than one-by-one.  For a single aggregate it'd be faster to avoid
> >    ExecProject alltogether (i.e. directly evaluate the expression as we
> >    used to), but as soon you have two the new approach is faster.
> 
> Makes sense. If we do your refactoring of ExecEvalExpr into an intermediate
> opcode representation, I assume the performance difference will go away
> anyway.

Precisely.


> > 2) For hash-aggs we right now we store the representative tuple using
> >    the input tuple's format, with unneeded columns set to NULL. That
> >    turns out to be expensive if the aggregated-on columns are not
> >    leading columns, because we have to skip over a potentially large
> >    number of NULLs.  The fix here is to simply use a different tuple
> >    format for the hashtable.  That doesn't cause overhead, because we
> >    already move columns in/out of the hashslot explicitly anyway.

> Heh, I came to the same conclusion a couple of months ago when I was
> profiling the aggregate code. I never got around to finish up and post the
> patch I wrote back then, but here you go, for comparison. It's pretty much
> the same as what you got here. So yeah, seems like a good idea.


> +        /*
> +         * Note that we don't deduplicate key columns. That would complicate
> +         * the comparisons. Don't write silly queries! (Not sure if that would get
> +         * through the parser and optimizer, though).

Hehe. You made me spill more coffee.


Thanks for looking!

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Heikki Linnakangas
Дата: 04 ноября 2016 г., 16:27:10
Сообщение: Re: Improve hash-agg performance

Следующее

От: Tom Lane
Дата: 04 ноября 2016 г., 16:42:38
Сообщение: Re: Gather Merge

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Improve hash-agg performance

Предыдущее

Следующее