Re: Parallel grouping sets
От | Tomas Vondra |
---|---|
Тема | Re: Parallel grouping sets |
Дата | |
Msg-id | 20190614004438.e2y7ubdnpobq5t37@development обсуждение исходный текст |
Ответ на | Re: Parallel grouping sets (David Rowley <david.rowley@2ndquadrant.com>) |
Список | pgsql-hackers |
On Fri, Jun 14, 2019 at 12:02:52PM +1200, David Rowley wrote: >On Fri, 14 Jun 2019 at 11:45, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> >> On Wed, Jun 12, 2019 at 10:58:44AM +0800, Richard Guo wrote: > >> ># explain (costs off, verbose) select c1, c2, avg(c3) from t2 group by >> >grouping sets((c1,c2), (c1)); >> > QUERY PLAN >> >-------------------------------------------------------------- >> > Finalize GroupAggregate >> > Output: c1, c2, avg(c3), (gset_id) >> > Group Key: t2.c1, t2.c2, (gset_id) >> > -> Gather Merge >> > Output: c1, c2, (gset_id), (PARTIAL avg(c3)) >> > Workers Planned: 2 >> > -> Sort >> > Output: c1, c2, (gset_id), (PARTIAL avg(c3)) >> > Sort Key: t2.c1, t2.c2, (gset_id) >> > -> Partial HashAggregate >> > Output: c1, c2, gset_id, PARTIAL avg(c3) >> > Hash Key: t2.c1, t2.c2 >> > Hash Key: t2.c1 >> > -> Parallel Seq Scan on public.t2 >> > Output: c1, c2, c3 >> >(15 rows) >> > >> >> OK, I'm not sure I understand the point of this - can you give an >> example which is supposed to benefit from this? Where does the speedup >> came from? > >I think this is a bad example since the first grouping set is a >superset of the 2nd. If those were independent and each grouping set >produced a reasonable number of groups then it may be better to do it >this way instead of grouping by all exprs in all grouping sets in the >first phase, as is done by #1. To do #2 would require that we tag >the aggregate state with the grouping set that belong to, which seem >to be what gset_id is in Richard's output. > Aha! So if we have grouping sets (a,b) and (c,d), then with the first approach we'd do partial aggregate on (a,b,c,d) - which may produce quite a few distinct groups, making it inefficient. But with the second approach, we'd do just (a,b) and (c,d) and mark the rows with gset_id. Neat! >In my example upthread the first phase of aggregation produced a group >per input row. Method #2 would work better for that case since it >would only produce 2000 groups instead of 1 million. > >Likely both methods would be good to consider, but since #1 seems much >easier than #2, then to me it seems to make sense to start there. > Yep. Thanks for the explanation. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: