Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)

Поиск

Список

Период

Сортировка

От	Pavel Stehule
Тема	Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Дата	13 августа 2009 г. 18:09:54
Msg-id	162867790908131109o13d76a94h2f641f1c03599bdd@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) (Hitoshi Harada <umi.tanuki@gmail.com>)
Ответы	Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) (Hitoshi Harada <umi.tanuki@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

2009/8/13 Hitoshi Harada <umi.tanuki@gmail.com>:
> 2009/8/14 Pavel Stehule <pavel.stehule@gmail.com>:
>> I prefered using CTE, because this way was the most short to small
>> bugs less prototype - with full functionality.
>
> You could make it by query rewriting, but as you say the best cleanest
> way is total refactoring of existing nodeAgg. How easy to implement is
> not convincing.
>

I agree. Simply I am not have time and force do it. I would to
concentrate on finishing some plpgsql issues, and then I have to do
some other things than PostgreSQL. There are fully functional
prototype and everybody is welcome to continue in this work.


>>> When we want GROUPING SET(a, b), at first we sort by a and aggregate
>>> then sort by b and aggregate. This is the same as:
>>>
>>> select a, null, count(*) from x group by a
>>> union all
>>> select null, b, count(*) from x group by b
>>>
>>> so nothing better than query rewriting unless we invent something new.
>>>
>> the problem is when x is subquery. Then is better using CTE, because
>> we don't need repeat x evaluation twice. The most typical use case is,
>> so x isn't table.
>
> So we need single scan aggregate as far as possible. Buffering
> subquery's result is possible without CTE node. Tuplestore has that
> functionality but I found the buffered result will be sorted multiple
> times, one way might be to allow tuplesort to perform sort multiple
> times with different keys.
>

yes, I don't afraid multiple evaluation of aggregates. It's cheap.
Problem is multiple table scan. I though about some new version of
aggregates. Current aggregates process row by row with final
operation. Some new kind of aggregates should to work over tuple
store. Internally it get pointer to tupplestore and number of rows.
This should be very fast for functions like median, or array_agg. I
thing so it's similar to window functions - only it not window
function.

If you like to optimalize to speed, then the most faster solution will
be using hash tables - then you don't need tuplestore. For rollup is
possible maybe one single scan - but I am not sure - there are
important fakt - final function cannot modify intermediate data.

Pavel

>
>
> Regards,
>
>
> --
> Hitoshi Harada
>

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 13 августа 2009 г., 18:09:40
Сообщение: Re: Hot standby and synchronous replication status

Следующее

От: David Fetter
Дата: 13 августа 2009 г., 18:26:05
Сообщение: Re: FDW-based dblink

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)

Предыдущее

Следующее