Re: Parallel Aggregate

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Parallel Aggregate
Дата
Msg-id e283ddb2-ac35-c169-4c04-c3949d5f46a9@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
Hi,

On 03/20/2016 09:58 AM, David Rowley wrote:
> On 20 March 2016 at 03:19, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> Updated patch is attached.
>>
>> I think this looks structurally correct now, and I think it's doing
>> the right thing as far as parallelism is concerned.  I don't see any
>> obvious problems in the rest of it, either, but I haven't thought
>> hugely deeply about the way you are doing the costing, nor have I
>> totally convinced myself that all of the PathTarget and setrefs stuff
>> is correct.  But I think it's probably pretty close.  I'll study it
>> some more next week.
>
> Thank you for the reviews. The only thing I can think to mention which
> I've not already is that I designed estimate_hashagg_tablesize() to be
> reusable in various places in planner.c, yet I've only made use of it
> in create_grouping_paths(). I would imagine that it might be nice to
> also modify the other places which perform a similar calculation to
> use that function, but I don't think that needs to be done for this
> patch... perhaps a follow-on cleanup.

Hmmm, so how many places that could use the new function are there? I've 
only found these two:
   create_distinct_paths (planner.c)   choose_hashed_setop (prepunion.c)

That doesn't seem like an extremely high number, so perhaps doing it in 
this patch would be fine. However if the function is defined as static, 
choose_hashed_setop won't be able to use it anyway (well, it'll have to 
move the prototype into a header and remove the static).

I wonder why we would need to allow cost_agg=NULL, though? I mean, if 
there are no costing information, wouldn't it be better to either raise 
an error, or at the very least do something like:
    } else        hashentrysize += hash_agg_entry_size(0);

which is what create_distinct_paths does?

I'm not sure changing the meaning of enable_hashagg like this is a good 
idea. It worked as a hard switch before, while with this change that 
would not be the case. Or more accurately - it would not be the case for 
aggregates, but it would still work the old way for other types of 
plans. Not sure that's a particularly good idea.

What about introducing a GUC to enable parallel aggregate, while still 
allowing other types of parallel nodes (I guess that would be handy for 
other types of parallel nodes - it's a bit blunt tool, but tweaking 
max_parallel_degree is even blumter)? e.g. enable_parallelagg?

I do also have a question about parallel aggregate vs. work_mem. 
Nowadays we mostly say to users a query may allocate a multiple of 
work_mem, up to one per node in the plan. Apparently with parallel 
aggregate we'll have to say "multiplied by number of workers", because 
each aggregate worker may allocate up to hashaggtablesize. Is that 
reasonable? Shouldn't we restrict the total size of hash tables in all 
workers somehow?

create_grouping_paths also contains this comment:
 /*  * Generate HashAgg Path providing the estimated hash table size is not  * too big, although if no other Paths were
generatedabove, then we'll  * begrudgingly generate one so that we actually have a Path to work  * with.  */
 

I'm not sure this is a particularly clear comment, I think the old one 
was way much informative despite being a single line:

/* Consider reasons to disable hashing, but only if we can sort instead */

BTW create_grouping_paths probably grew to a size when splitting it into 
smaller pieces would be helpful?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: VS 2015 support in src/tools/msvc
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [patch] Proposal for \crosstabview in psql