Re: Final Patch for GROUPING SETS

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Final Patch for GROUPING SETS
Дата
Msg-id 20150501011752.GA3416161@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: Final Patch for GROUPING SETS  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-hackers
On Thu, Apr 30, 2015 at 05:35:26AM +0100, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

>  >> + *      TODO: AGG_HASHED doesn't support multiple grouping sets yet.
> 
>  Andres> Are you intending to resolve this before an eventual commit?
> 
> Original plan was to tackle AGG_HASHED after a working implementation
> was committed;

+1 for that plan.

>  Andres> Possibly after the 'structural' issues are resolved? Or do you
>  Andres> think this can safely be put of for another release?
> 
> I think the feature is useful even without AGG_HASHED, even though that
> means it can sometimes be beaten on performance by using UNION ALL of
> many separate GROUP BYs; but I'd defer to the opinions of others on that
> point.

It will be a tough call, and PostgreSQL has gone each way on some recent
feature.  I recommend considering both GroupAggregate and HashAggregate in all
design discussion but continuing to work toward a first commit implementing
GroupAggregate alone.  With that in the tree, we'll be in a better position to
decide whether to release a feature paused at that stage in its development.
Critical facts are uncertain, so a discussion today would be unproductive.

>  Andres> So, having quickly scanned through the patch, do I understand
>  Andres> correctly that the contentious problems are:
> 
>  Andres> * Arguably this converts the execution *tree* into a DAG. Tom
>  Andres> seems to be rather uncomfortable with that. I am wondering
>  Andres> whether this really is a big deal - essentially this only
>  Andres> happens in a relatively 'isolated' part of the tree right?
>  Andres> I.e. if those chained together nodes were considered one node,
>  Andres> there would not be any loops?  Additionally, the way
>  Andres> parametrized scans works already essentially "violates" the
>  Andres> tree paradigma somewhat.

I agree with your assessment.  That has been contentious.

> The major downsides as I see them with the current approach are:
> 
> 1. It makes plans (and hence explain output) nest very deeply if you
> have complex grouping sets (especially cubes with high dimensionality).
> 
> This can make explain very slow in the most extreme cases

I'm not worried about that.  If anything, the response is to modify explain to
more-quickly/compactly present affected plan trees.

> 2. A union-based approach would have a chance of including AGG_HASHED
> support without any significant code changes,

>   -> CTE x
>      -> entire input subplan here
>   -> Append
>      -> GroupAggregate
>         -> Sort
>            -> CTE Scan x
>      -> GroupAggregate
>         -> Sort
>            -> CTE Scan x
>      -> HashAggregate
>         -> CTE Scan x
>      [...]

This uses 50-67% more I/O than the current strategy, which makes it a dead end
from my standpoint.  Details:
http://www.postgresql.org/message-id/20141221210005.GA1864976@tornado.leadboat.com

>  Andres> Are those the two bigger controversial areas? Or are there
>  Andres> others in your respective views?

> Another controversial item was the introduction of GroupedVar.

I know of no additional controversies to add to this list.

Thanks,
nm



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: proposal: disallow operator "=>" and use it for named parameters
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: adaptive ndistinct estimator v4