Re: hashagg slowdown due to spill changes
От | Tomas Vondra |
---|---|
Тема | Re: hashagg slowdown due to spill changes |
Дата | |
Msg-id | 20200625091024.dhslw2gqgqtgjsoh@development обсуждение исходный текст |
Ответ на | Re: hashagg slowdown due to spill changes (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-hackers |
On Wed, Jun 24, 2020 at 05:26:07PM -0700, Melanie Plageman wrote: >On Tue, Jun 23, 2020 at 10:06 AM Andres Freund <andres@anarazel.de> wrote: > >> Hi, >> >> On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote: >> > On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres@anarazel.de> >> wrote: >> > > It's not this patch's fault, but none, really none, of this stuff >> should >> > > be in the executor. >> > > >> > > >> > Were you thinking it could be done in grouping_planner() and then the >> > bitmaps could be saved in the PlannedStmt? >> > Or would you have to wait until query_planner()? Or are you imagining >> > somewhere else entirely? >> >> I haven't thought about it in too much detail, but I would say >> create_agg_plan() et al. I guess there's some argument to be made to do >> it in setrefs.c, because we already do convert_combining_aggrefs() there >> (but I don't like that much). >> >> There's no reason to do it before we actually decided on one specific >> path, so doing it earlier than create_plan() seems unnecessary. And >> having it in agg specific code seems better than putting it into global >> routines. >> >> There's probably an argument for having a bit more shared code between >> create_agg_plan(), create_group_plan() and >> create_groupingsets_plan(). But even just adding a new extract_*_cols() >> call to each of those would probably be ok. >> >> >So, my summary of this point in the context of the other discussion >upthread is: > >Planner should extract the columns that hashagg will need later during >planning. Planner should not have HashAgg/MixedAgg nodes request smaller >targetlists from their children with CP_SMALL_TLIST to avoid unneeded >projection overhead. >Also, even this extraction should only be done when the number of groups >is large enough to suspect a spill. > IMO we should extract the columns irrespectedly of the estimates, otherwise we won't be able to handle underestimates efficiently. > >Not to stir the pot, but I did notice that hashjoin uses CP_SMALL_TLIST >in create_hashjoin_plan() for the inner side sub-tree and the outer side >one if there are multiple batches. I wondered what was different about >that vs hashagg (i.e. why it is okay to do that there). > Yeah. That means that if we have to start batching during execution, we may need to spill much more datai. I'd say that's a hashjoin issue that we should fix too (in v14). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: