Re: Combining Aggregates

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Combining Aggregates
Дата
Msg-id CAKJS1f_HyKNDDhjB2ZAJoTTA4JnqfhUZUjA5xAfpqPsozuKK_g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Combining Aggregates  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On 27 July 2015 at 04:58, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 04/01/2015 06:28 PM, Robert Haas wrote:
On Mon, Mar 30, 2015 at 1:28 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I've been thinking of bumping this patch to the June commitfest as the
patch only exists to provide the basic infrastructure for things like
parallel aggregation, aggregate before join, and perhaps auto updating
materialised views.

It seems unlikely that any of those things will happen for 9.5.

Yeah, I guess so...

Does anybody object to me moving this to June's commitfest?

Not from my side FWIW. I think it actually makes sense.

+1.  I'd like to devote some time to looking at this, but I don't have
the time right now.  The chances that we can do something useful with
it in 9.6 seem good.

And the June commitfest is now in progress.

This patch seems sane to me, as far as it goes. However, there's no planner or executor code to use the aggregate combining for anything. I'm not a big fan of dead code, I'd really like to see something to use this.

Thanks for looking. I partially agree with that, it is a little weird to put in code that's yet to be used. I'd certainly agree 95% if this was the final commitfest of 9.5, but we're in the first commitfest of 9.6 and I think there's a very high probability of this getting used in 9.6, and likely that probability would be even higher if the code is already in. Perhaps it's a little bit in the same situation as to Robert's parallel worker stuff?

 

The main use case people have been talking about is parallel query, but is there some other case this would be useful right now, without the parallel query feature? You and Simon talked about this case:

2. Queries such as:

SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id
= p.product_id GROUP BY p.name;

Such a query could be transformed into:

SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales
GROUP BY product_id) s
INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;

Of course the outer query's SUM and GROUP BY would not be required if there
happened to be a UNIQUE index on product(name), but assuming there's not
then the above should produce the results faster. This of course works ok
for SUM(), but for something like AVG() or STDDEV() the combine/merge
aggregate functions would be required to process those intermediate
aggregate results that were produced by the sub-query.

Any chance you could implement that in the planner?


Yes! I'm actually working on it now and so far have it partially working. Quite likely I'll be able to submit for CF2. There's still some costing tweaks to do. So far it just works for GROUP BY with no aggs. I plan to fix that later using this patch.

I don't want to talk too much about it on this thread, but in a test query which is the one in my example, minus the SUM(qty), with 1 million sales records, and 100 products, performance goes from 350ms to 200ms on my machine, so looking good so far.

Regards

David Rowley
--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: New functions
Следующее
От: David Rowley
Дата:
Сообщение: Re: Combining Aggregates