Обсуждение: Parallel Aggregate costs don't consider combine/serial/deserial funcs

Поиск
Список
Период
Сортировка

Parallel Aggregate costs don't consider combine/serial/deserial funcs

От
David Rowley
Дата:
Hi,

I realised a few days ago that the parallel aggregate code does not
cost for the combine, serialisation and deserialisation functions at
all.

I've attached a patch which fixes this.

One small point which I was a little unsure of in the attached is,
should the "if (aggref->aggdirectargs)" part of
count_agg_clauses_walker() be within the "if
(!context->combineStates)". I simply couldn't decide. We currently
have no aggregates which this affects anyway, per; select * from
pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
I've left it outwith.

Another thing I thought of is that it's not too nice that I have to
pass 3 bools to count_agg_clauses() in order to tell it what to do. I
was tempted to invent some bitmask flags for this, then modify
create_agg_path() to use the same flags, but I thought I'd better not
cause too much churn with this patch.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Parallel Aggregate costs don't consider combine/serial/deserial funcs

От
Noah Misch
Дата:
On Mon, Apr 11, 2016 at 12:47:29AM +1200, David Rowley wrote:
> Hi,
> 
> I realised a few days ago that the parallel aggregate code does not
> cost for the combine, serialisation and deserialisation functions at
> all.
> 
> I've attached a patch which fixes this.
> 
> One small point which I was a little unsure of in the attached is,
> should the "if (aggref->aggdirectargs)" part of
> count_agg_clauses_walker() be within the "if
> (!context->combineStates)". I simply couldn't decide. We currently
> have no aggregates which this affects anyway, per; select * from
> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
> I've left it outwith.
> 
> Another thing I thought of is that it's not too nice that I have to
> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
> was tempted to invent some bitmask flags for this, then modify
> create_agg_path() to use the same flags, but I thought I'd better not
> cause too much churn with this patch.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.



Re: Parallel Aggregate costs don't consider combine/serial/deserial funcs

От
Robert Haas
Дата:
On Sun, Apr 10, 2016 at 8:47 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I realised a few days ago that the parallel aggregate code does not
> cost for the combine, serialisation and deserialisation functions at
> all.

Oops.

> I've attached a patch which fixes this.

I've committed this patch.  I wonder if it's going to produce compiler
warnings for some people, complaining about possible use of an
uninitialized variable.  That would kind of suck.  I don't much mind
having to insert a dummy assignment to shut the compiler up; a smarter
compiler will just throw it out anyway.  I'm less enthused about a
dummy MemSet.  The compiler is less likely to be able to get rid of
that, and it's more expensive if it doesn't.  But let's see what
happens.

> One small point which I was a little unsure of in the attached is,
> should the "if (aggref->aggdirectargs)" part of
> count_agg_clauses_walker() be within the "if
> (!context->combineStates)". I simply couldn't decide. We currently
> have no aggregates which this affects anyway, per; select * from
> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
> I've left it outwith.

The direct arguments would be evaluated in the worker, but not in the
leader, right?  Or am I confused?

> Another thing I thought of is that it's not too nice that I have to
> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
> was tempted to invent some bitmask flags for this, then modify
> create_agg_path() to use the same flags, but I thought I'd better not
> cause too much churn with this patch.

I'm kinda tempted to say this should be using an enum.  I note that
serialStates has a subtly different meaning here than in some other
places where you have used the same term.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parallel Aggregate costs don't consider combine/serial/deserial funcs

От
David Rowley
Дата:
On 13 April 2016 at 08:52, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Apr 10, 2016 at 8:47 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> I realised a few days ago that the parallel aggregate code does not
>> cost for the combine, serialisation and deserialisation functions at
>> all.
>
> Oops.
>
>> I've attached a patch which fixes this.
>
> I've committed this patch.  I wonder if it's going to produce compiler
> warnings for some people, complaining about possible use of an
> uninitialized variable.  That would kind of suck.  I don't much mind
> having to insert a dummy assignment to shut the compiler up; a smarter
> compiler will just throw it out anyway.  I'm less enthused about a
> dummy MemSet.  The compiler is less likely to be able to get rid of
> that, and it's more expensive if it doesn't.  But let's see what
> happens.

Thanks for committing.

I wondered that too, so checked a couple of compilers and got no
warnings, but the buildfarm should let us know. The other option would
be to palloc() them, and have them set to NULL initially... that's not
very nice either... Another option would be to protect the final
parallel path generation with if (grouped_rel->partial_pathlist &&
grouped_rel->consider_parallel). I'd imagine any compiler smart enough
to work out that uninitialised is not possible would also be able to
remove the check for grouped_rel->consider_parallel, but *shrug*, I
don't often look at the assembly that compilers generate, so I might
be giving them too much credit.

>> One small point which I was a little unsure of in the attached is,
>> should the "if (aggref->aggdirectargs)" part of
>> count_agg_clauses_walker() be within the "if
>> (!context->combineStates)". I simply couldn't decide. We currently
>> have no aggregates which this affects anyway, per; select * from
>> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
>> I've left it outwith.
>
> The direct arguments would be evaluated in the worker, but not in the
> leader, right?  Or am I confused?

That seems right, but I just can't think of how its possible to
parallelise these aggregates anyway.

>> Another thing I thought of is that it's not too nice that I have to
>> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
>> was tempted to invent some bitmask flags for this, then modify
>> create_agg_path() to use the same flags, but I thought I'd better not
>> cause too much churn with this patch.
>
> I'm kinda tempted to say this should be using an enum.  I note that
> serialStates has a subtly different meaning here than in some other
> places where you have used the same term.

hmm, I'm not sure how it's subtly different. Do you mean the
preference towards costing the finalfn when finalizeAggs is true, and
ignoring the serialfn in this case? nodeAgg.c should do the same,
although it'll deserialize in such a case. We can never finalize and
serialize in the same node.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Parallel Aggregate costs don't consider combine/serial/deserial funcs

От
Robert Haas
Дата:
On Tue, Apr 12, 2016 at 5:38 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>>> One small point which I was a little unsure of in the attached is,
>>> should the "if (aggref->aggdirectargs)" part of
>>> count_agg_clauses_walker() be within the "if
>>> (!context->combineStates)". I simply couldn't decide. We currently
>>> have no aggregates which this affects anyway, per; select * from
>>> pg_aggregate where aggcombinefn <> 0 and aggkind <> 'n'; so for now
>>> I've left it outwith.
>>
>> The direct arguments would be evaluated in the worker, but not in the
>> leader, right?  Or am I confused?
>
> That seems right, but I just can't think of how its possible to
> parallelise these aggregates anyway.

Well, if you could ensure that each worker would see a whole group,
you could do it, I think.  But it's probably fine to just leave this
for now.  It's not like it can't be changed if somebody figures out
some cool thing to do in this area.

>>> Another thing I thought of is that it's not too nice that I have to
>>> pass 3 bools to count_agg_clauses() in order to tell it what to do. I
>>> was tempted to invent some bitmask flags for this, then modify
>>> create_agg_path() to use the same flags, but I thought I'd better not
>>> cause too much churn with this patch.
>>
>> I'm kinda tempted to say this should be using an enum.  I note that
>> serialStates has a subtly different meaning here than in some other
>> places where you have used the same term.
>
> hmm, I'm not sure how it's subtly different. Do you mean the
> preference towards costing the finalfn when finalizeAggs is true, and
> ignoring the serialfn in this case? nodeAgg.c should do the same,
> although it'll deserialize in such a case. We can never finalize and
> serialize in the same node.

I mean that, IIUC, in some other places where you use serialStates,
true means that (de)serialization is known to be needed.  Here,
however, it only means it might be needed, contingent on whether the
serial/deserial functions are actually present.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company