Re: Final Patch for GROUPING SETS
От | Noah Misch |
---|---|
Тема | Re: Final Patch for GROUPING SETS |
Дата | |
Msg-id | 20150501011752.GA3416161@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: Final Patch for GROUPING SETS (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Список | pgsql-hackers |
On Thu, Apr 30, 2015 at 05:35:26AM +0100, Andrew Gierth wrote: > >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: > >> + * TODO: AGG_HASHED doesn't support multiple grouping sets yet. > > Andres> Are you intending to resolve this before an eventual commit? > > Original plan was to tackle AGG_HASHED after a working implementation > was committed; +1 for that plan. > Andres> Possibly after the 'structural' issues are resolved? Or do you > Andres> think this can safely be put of for another release? > > I think the feature is useful even without AGG_HASHED, even though that > means it can sometimes be beaten on performance by using UNION ALL of > many separate GROUP BYs; but I'd defer to the opinions of others on that > point. It will be a tough call, and PostgreSQL has gone each way on some recent feature. I recommend considering both GroupAggregate and HashAggregate in all design discussion but continuing to work toward a first commit implementing GroupAggregate alone. With that in the tree, we'll be in a better position to decide whether to release a feature paused at that stage in its development. Critical facts are uncertain, so a discussion today would be unproductive. > Andres> So, having quickly scanned through the patch, do I understand > Andres> correctly that the contentious problems are: > > Andres> * Arguably this converts the execution *tree* into a DAG. Tom > Andres> seems to be rather uncomfortable with that. I am wondering > Andres> whether this really is a big deal - essentially this only > Andres> happens in a relatively 'isolated' part of the tree right? > Andres> I.e. if those chained together nodes were considered one node, > Andres> there would not be any loops? Additionally, the way > Andres> parametrized scans works already essentially "violates" the > Andres> tree paradigma somewhat. I agree with your assessment. That has been contentious. > The major downsides as I see them with the current approach are: > > 1. It makes plans (and hence explain output) nest very deeply if you > have complex grouping sets (especially cubes with high dimensionality). > > This can make explain very slow in the most extreme cases I'm not worried about that. If anything, the response is to modify explain to more-quickly/compactly present affected plan trees. > 2. A union-based approach would have a chance of including AGG_HASHED > support without any significant code changes, > -> CTE x > -> entire input subplan here > -> Append > -> GroupAggregate > -> Sort > -> CTE Scan x > -> GroupAggregate > -> Sort > -> CTE Scan x > -> HashAggregate > -> CTE Scan x > [...] This uses 50-67% more I/O than the current strategy, which makes it a dead end from my standpoint. Details: http://www.postgresql.org/message-id/20141221210005.GA1864976@tornado.leadboat.com > Andres> Are those the two bigger controversial areas? Or are there > Andres> others in your respective views? > Another controversial item was the introduction of GroupedVar. I know of no additional controversies to add to this list. Thanks, nm
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Bruce MomjianДата:
Сообщение: Re: proposal: disallow operator "=>" and use it for named parameters