Re: Parallel grouping sets

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Parallel grouping sets
Дата	14 июня 2019 г. 03:02:52
Msg-id	CAKJS1f_rbmZWAiF8TdFM-9xr81PpoTuU5M_phEOd6rGugWeN0g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel grouping sets (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: Parallel grouping sets
Список	pgsql-hackers

Дерево обсуждения

On Fri, 14 Jun 2019 at 11:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Jun 12, 2019 at 10:58:44AM +0800, Richard Guo wrote:

> ># explain (costs off, verbose) select c1, c2, avg(c3) from t2 group by
> >grouping sets((c1,c2), (c1));
> >                          QUERY PLAN
> >--------------------------------------------------------------
> > Finalize GroupAggregate
> >   Output: c1, c2, avg(c3), (gset_id)
> >   Group Key: t2.c1, t2.c2, (gset_id)
> >   ->  Gather Merge
> >         Output: c1, c2, (gset_id), (PARTIAL avg(c3))
> >         Workers Planned: 2
> >         ->  Sort
> >               Output: c1, c2, (gset_id), (PARTIAL avg(c3))
> >               Sort Key: t2.c1, t2.c2, (gset_id)
> >               ->  Partial HashAggregate
> >                     Output: c1, c2, gset_id, PARTIAL avg(c3)
> >                     Hash Key: t2.c1, t2.c2
> >                     Hash Key: t2.c1
> >                     ->  Parallel Seq Scan on public.t2
> >                           Output: c1, c2, c3
> >(15 rows)
> >
>
> OK, I'm not sure I understand the point of this - can you give an
> example which is supposed to benefit from this? Where does the speedup
> came from?

I think this is a bad example since the first grouping set is a
superset of the 2nd. If those were independent and each grouping set
produced a reasonable number of groups then it may be better to do it
this way instead of grouping by all exprs in all grouping sets in the
first phase, as is done by #1.   To do #2 would require that we tag
the aggregate state with the grouping set that belong to, which seem
to be what gset_id is in Richard's output.

In my example upthread the first phase of aggregation produced a group
per input row. Method #2 would work better for that case since it
would only produce 2000 groups instead of 1 million.

Likely both methods would be good to consider, but since #1 seems much
easier than #2, then to me it seems to make sense to start there.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Bruce Momjian
Дата: 14 июня 2019 г., 02:49:48
Сообщение: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)

Следующее

От: "Tsunakawa, Takayuki"
Дата: 14 июня 2019 г., 03:10:07
Сообщение: RE: [PATCH] Speedup truncates of relation forks

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel grouping sets

Предыдущее

Следующее