Обсуждение: Parallel Aggregate

Поиск

Список

Период

Сортировка

Parallel Aggregate

От

Haribabu Kommi

Дата:

12 октября 2015 г., 05:08:12

Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.

Decision phase:

Based on the following conditions, the parallel aggregate plan is generated.

- check whether the below plan node is Gather + partial seq scan only.

This is because to check whether the plan nodes that are present are
aware of parallelism or not?

- check Are there any projection or qual condition is present in the
Gather node?

If there exists any quals and projection info that is required to
performed in the
Gather node because of the function that can only be executed in
master backends,
the parallel aggregate plan is not chosen.

- check whether the aggregate supports parallelism or not.

As for first patch, I thought of supporting only some aggregates for
this parallel aggregate.
The supported aggregates are mainly the aggregate functions that have
variable length data types as final and transition types. This is to
avoid changing the target list return types. Because of variable
lengths, even the transition type can be returned to backend without
applying the final function in aggregate. To identify the supported
aggregates for parallelism, a new member is added to pg_aggregate
system catalog table.

- currently Group and plain aggregates are only supported for simplicity.

This patch doesn't change anything in aggregate plan decision. If the
planner decides the group
or plain aggregates as the best plan, then we will check whether this
can be converted into
parallel aggregate or not?


Planning phase:

- Generate the target list items that needs to be passed to the child
aggregate nodes,
by separting bare aggregate and group by expressions. This is required
to take care
of any expressions those are involved the target list.

Example:
Output: (sum(id1)), (3 + (sum((id2 - 3)))), (max(id1)), ((count(id1))
- (max(id1)))  ->  Aggregate        Output: sum(id1), sum((id2 - 3)), max(id1), count(id1)

- Don't push the Having clause to the child aggregate node, this needs
to be executed at
the Gather node only, after combining all results from workers with
the matching key,
(and also after the final function is called for the aggregate
function if exists).

- Get the details of the Gather plan and remove its plan node from the
actual plan and prepare
the Gather plan on top of the aggregate plan.


Execution phase:

- By passing some execution flag like EXEC_PARALLEL or something, the
aggregate operations doesn't do the final function calculation in the
worker side.

- Set the single_copy mode as true, in case if the below node of
Gather is a parallel aggregate.

- Add the support of getting a slot from a particular worker. This
support is required to
merge the slots from different workers based on grouping key.

- Merge the slots received from the workers based on the grouping key.
If there is no grouping key,
then merge all slots without waiting for  receiving slots from all workers.

- If there exists a grouping key, backend has to wait till it gets
slots from all workers who are running. Once all slots are received,
they needs to be compared against the grouping key and merged
accordingly. The merged slot needs to be processed further to apply
the final function, qualification and projection.

I will try to provide a POC patch by next commit-fest.

Comments?


Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

12 октября 2015 г., 06:25:33

On 12 October 2015 at 15:07, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.

Decision phase:

Based on the following conditions, the parallel aggregate plan is generated.

- check whether the below plan node is Gather + partial seq scan only.

This is because to check whether the plan nodes that are present are
aware of parallelism or not?

- check Are there any projection or qual condition is present in the
Gather node?

If there exists any quals and projection info that is required to
performed in the
Gather node because of the function that can only be executed in
master backends,
the parallel aggregate plan is not chosen.

- check whether the aggregate supports parallelism or not.

As for first patch, I thought of supporting only some aggregates for
this parallel aggregate.
The supported aggregates are mainly the aggregate functions that have
variable length data types as final and transition types. This is to
avoid changing the target list return types. Because of variable
lengths, even the transition type can be returned to backend without
applying the final function in aggregate. To identify the supported
aggregates for parallelism, a new member is added to pg_aggregate
system catalog table.

- currently Group and plain aggregates are only supported for simplicity.

This patch doesn't change anything in aggregate plan decision. If the
planner decides the group
or plain aggregates as the best plan, then we will check whether this
can be converted into
parallel aggregate or not?

Hi,

I've never previously proposed any implementation for parallel aggregation, but I have previously proposed infrastructure to allow aggregation to happen in multiple steps. It seems your plan sounds very different from what I've proposed.

I attempted to convey my idea on this to the community here http://www.postgresql.org/message-id/CAKJS1f-TmWi-4c5K6CBLRdTfGsVxOJhadefzjE7SWuVBgMSkXA@mail.gmail.com which Simon and I proposed an actual proof of concept patch here https://commitfest.postgresql.org/5/131/

I've since expanded on that work in the form of a WIP patch which implements GROUP BY before JOIN here http://www.postgresql.org/message-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com

It's pretty evident that we both need to align the way we plan to handle this multiple step aggregation, there's no sense at all in having 2 different ways of doing this. Perhaps you could look over my patch and let me know the parts which you disagree with, then we can resolve these together and come up with the best solution for each of us.

It may also be useful for you to glance at how Postgres-XL handles this partial aggregation problem, as it, where possible, will partially aggregate the results on each node, pass the partially aggregates state to the master node to have it perform the final aggregate stage on each of the individual aggregate states from each node. Note that this requires giving the aggregates with internal aggregate states an SQL level type and it also means implementing an input and output function for these types. I've noticed that XL mostly handles this by making the output function build a string something along the lines of <count>:<sum> for aggregates such as AVG(). I believe you'll need something very similar to this to pass the partial states between worker and master process.

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

12 октября 2015 г., 08:26:24

On Mon, Oct 12, 2015 at 2:25 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 October 2015 at 15:07, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> - check whether the aggregate supports parallelism or not.
>>
>> As for first patch, I thought of supporting only some aggregates for
>> this parallel aggregate.
>> The supported aggregates are mainly the aggregate functions that have
>> variable length data types as final and transition types. This is to
>> avoid changing the target list return types. Because of variable
>> lengths, even the transition type can be returned to backend without
>> applying the final function in aggregate. To identify the supported
>> aggregates for parallelism, a new member is added to pg_aggregate
>> system catalog table.
>>
>> - currently Group and plain aggregates are only supported for simplicity.
>>
>> This patch doesn't change anything in aggregate plan decision. If the
>> planner decides the group
>> or plain aggregates as the best plan, then we will check whether this
>> can be converted into
>> parallel aggregate or not?
>
>
> Hi,
>
> I've never previously proposed any implementation for parallel aggregation,
> but I have previously proposed infrastructure to allow aggregation to happen
> in multiple steps. It seems your plan sounds very different from what I've
> proposed.
>
> I attempted to convey my idea on this to the community here
> http://www.postgresql.org/message-id/CAKJS1f-TmWi-4c5K6CBLRdTfGsVxOJhadefzjE7SWuVBgMSkXA@mail.gmail.com
> which Simon and I proposed an actual proof of concept patch here
> https://commitfest.postgresql.org/5/131/

My plan also to use the combine_aggregate_state_v2.patch or similar
that you have proposed to merge the partial aggregate results
and combine them in the backend process. As a POC patch, I just want
to limit this functionality to aggregates that have variable length
datatypes as transition and final arguments.

> I've since expanded on that work in the form of a WIP patch which implements
> GROUP BY before JOIN here
> http://www.postgresql.org/message-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com
>
> It's pretty evident that we both need to align the way we plan to handle
> this multiple step aggregation, there's no sense at all in having 2
> different ways of doing this. Perhaps you could look over my patch and let
> me know the parts which you disagree with, then we can resolve these
> together and come up with the best solution for each of us.

Thanks for the details. I will go through it. From a basic view, this
patch is an
enhancement of combine_aggregate_state_v2.patch.

> It may also be useful for you to glance at how Postgres-XL handles this
> partial aggregation problem, as it, where possible, will partially aggregate
> the results on each node, pass the partially aggregates state to the master
> node to have it perform the final aggregate stage on each of the individual
> aggregate states from each node. Note that this requires giving the
> aggregates with internal aggregate states an SQL level type and it also
> means implementing an input and output function for these types. I've
> noticed that XL mostly handles this by making the output function build a
> string something along the lines of <count>:<sum> for aggregates such as
> AVG(). I believe you'll need something very similar to this to pass the
> partial states between worker and master process.

Yes, we may need something like this, or adding the support of passing internal
datatypes between worker and backend process to support all aggregate functions.

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

Robert Haas

Дата:

13 октября 2015 г., 04:14:35

On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Parallel aggregate is the feature doing the aggregation job parallel
> with the help of Gather and
> partial seq scan nodes. The following is the basic overview of the
> parallel aggregate changes.
>
> Decision phase:
>
> Based on the following conditions, the parallel aggregate plan is generated.
>
> - check whether the below plan node is Gather + partial seq scan only.
>
> This is because to check whether the plan nodes that are present are
> aware of parallelism or not?

This is really not the right way of doing this.  We should do
something more general.  Most likely, parallel aggregate should wait
for Tom's work refactoring the upper planner to use paths.  But either
way, it's not a good idea to limit ourselves to parallel aggregation
only in the case where there is exactly one base table.

One of the things I want to do pretty early on, perhaps in time for
9.6, is create a general notion of partial paths.  A Partial Seq Scan
node creates a partial path.  A Gather node turns a partial path into
a complete path.  A join between a partial path and a complete path
creates a new partial path.  This concept lets us consider,
essentially, pushing joins below Gather nodes.  That's quite powerful
and could make Partial Seq Scan applicable to a much broader variety
of use cases.  If there are worthwhile partial paths for the final
joinrel, and aggregation of that joinrel is needed, we can consider
parallel aggregation using that partial path as an alternative to
sticking a Gather node on there and then aggregating.

> - Set the single_copy mode as true, in case if the below node of
> Gather is a parallel aggregate.

That sounds wrong.  Single-copy mode is for when we need to be certain
of running exactly one copy of the plan.  If you're trying to have
several workers aggregate in parallel, that's exactly what you don't
want.

Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here.  I'm not clear whether that is what you are thinking or
not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

13 октября 2015 г., 07:10:05

On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>> Parallel aggregate is the feature doing the aggregation job parallel
>> with the help of Gather and
>> partial seq scan nodes. The following is the basic overview of the
>> parallel aggregate changes.
>>
>> Decision phase:
>>
>> Based on the following conditions, the parallel aggregate plan is generated.
>>
>> - check whether the below plan node is Gather + partial seq scan only.
>>
>> This is because to check whether the plan nodes that are present are
>> aware of parallelism or not?
>
> This is really not the right way of doing this.  We should do
> something more general.  Most likely, parallel aggregate should wait
> for Tom's work refactoring the upper planner to use paths.  But either
> way, it's not a good idea to limit ourselves to parallel aggregation
> only in the case where there is exactly one base table.

Ok. Thanks for the details.

> One of the things I want to do pretty early on, perhaps in time for
> 9.6, is create a general notion of partial paths.  A Partial Seq Scan
> node creates a partial path.  A Gather node turns a partial path into
> a complete path.  A join between a partial path and a complete path
> creates a new partial path.  This concept lets us consider,
> essentially, pushing joins below Gather nodes.  That's quite powerful
> and could make Partial Seq Scan applicable to a much broader variety
> of use cases.  If there are worthwhile partial paths for the final
> joinrel, and aggregation of that joinrel is needed, we can consider
> parallel aggregation using that partial path as an alternative to
> sticking a Gather node on there and then aggregating.
>
>> - Set the single_copy mode as true, in case if the below node of
>> Gather is a parallel aggregate.
>
> That sounds wrong.  Single-copy mode is for when we need to be certain
> of running exactly one copy of the plan.  If you're trying to have
> several workers aggregate in parallel, that's exactly what you don't
> want.

I mean of setting the flag is to avoid backend executing the child plan.

> Also, I think the path for parallel aggregation should probably be
> something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> path here.  I'm not clear whether that is what you are thinking or
> not.

No. I am thinking of the following way.
Gather->partialagg->some partial path

I want the Gather node to merge the results coming from all workers, otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing something?

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

13 октября 2015 г., 09:53:28

On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Also, I think the path for parallel aggregation should probably be
> something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> path here. I'm not clear whether that is what you are thinking or
> not.

No. I am thinking of the following way.
Gather->partialagg->some partial path

I want the Gather node to merge the results coming from all workers, otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing something?

My assumption is the same as Robert's here.

Unless I've misunderstood, it sounds like you're proposing to add logic into the Gather node to handle final aggregation? That sounds like a modularity violation of the whole node concept.

The handling of the final aggregate stage is not all that different from the initial aggregate stage. The primary difference is just that your calling the combine function instead of the transition function, and the values being aggregated are aggregates states rather than the type of the values which were initially aggregated. The handling of GROUP BY is all the same, yet you only apply the HAVING clause during final aggregation. This is why I ended up implementing this in nodeAgg.c instead of inventing some new node type that's mostly a copy and paste of nodeAgg.c [1]

If you're performing a hash aggregate you need to wait until all the partially aggregated groups are received anyway. If you're doing a sort/agg then you'll need to sort again after the Gather node.

[1] http://www.postgresql.org/message-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Simon Riggs

Дата:

13 октября 2015 г., 10:20:35

On 13 October 2015 at 02:14, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Parallel aggregate is the feature doing the aggregation job parallel
> with the help of Gather and
> partial seq scan nodes. The following is the basic overview of the
> parallel aggregate changes.
>
> Decision phase:
>
> Based on the following conditions, the parallel aggregate plan is generated.
>
> - check whether the below plan node is Gather + partial seq scan only.
>
> This is because to check whether the plan nodes that are present are
> aware of parallelism or not?

This is really not the right way of doing this. We should do
something more general. Most likely, parallel aggregate should wait
for Tom's work refactoring the upper planner to use paths. But either
way, it's not a good idea to limit ourselves to parallel aggregation
only in the case where there is exactly one base table.

What we discussed at PgCon was this rough flow of work

* Pathify upper Planner (Tom) WIP

* Aggregation push down (David) Prototype

* Parallel Aggregates

Parallel infrastructure is also required for aggregation, though that dependency looks further ahead than the above at present.

Parallel aggregates do look like they can make it into 9.6, but there's not much slack left in the critical path.

One of the things I want to do pretty early on, perhaps in time for
9.6, is create a general notion of partial paths. A Partial Seq Scan
node creates a partial path. A Gather node turns a partial path into
a complete path. A join between a partial path and a complete path
creates a new partial path. This concept lets us consider,
essentially, pushing joins below Gather nodes. That's quite powerful
and could make Partial Seq Scan applicable to a much broader variety
of use cases. If there are worthwhile partial paths for the final
joinrel, and aggregation of that joinrel is needed, we can consider
parallel aggregation using that partial path as an alternative to
sticking a Gather node on there and then aggregating.

Some form of partial plan makes sense. A better word might be "strand".

> - Set the single_copy mode as true, in case if the below node of
> Gather is a parallel aggregate.

That sounds wrong. Single-copy mode is for when we need to be certain
of running exactly one copy of the plan. If you're trying to have
several workers aggregate in parallel, that's exactly what you don't
want.

Also, I think the path for parallel aggregation should probably be
something like FinalizeAgg -> Gather -> PartialAgg -> some partial
path here. I'm not clear whether that is what you are thinking or
not.

Yes, but not sure of names.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

13 октября 2015 г., 10:57:40

On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > Also, I think the path for parallel aggregation should probably be
>> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
>> > path here.  I'm not clear whether that is what you are thinking or
>> > not.
>>
>> No. I am thinking of the following way.
>> Gather->partialagg->some partial path
>>
>> I want the Gather node to merge the results coming from all workers,
>> otherwise
>> it may be difficult to merge at parent of gather node. Because in case
>> the partial
>> group aggregate is under the Gather node, if any of two workers are
>> returning
>> same group key data, we need to compare them and combine it to make it a
>> single group. If we are at Gather node, it is possible that we can
>> wait till we get
>> slots from all workers. Once all workers returns the slots we can compare
>> and merge the necessary slots and return the result. Am I missing
>> something?
>
>
> My assumption is the same as Robert's here.
> Unless I've misunderstood, it sounds like you're proposing to add logic into
> the Gather node to handle final aggregation? That sounds like a modularity
> violation of the whole node concept.
>
> The handling of the final aggregate stage is not all that different from the
> initial aggregate stage. The primary difference is just that your calling
> the combine function instead of the transition function, and the values

Yes, you are correct, till now i am thinking of using transition types as the
approach, because of that reason only I proposed it as Gather node to handle
the finalize aggregation.

> being aggregated are aggregates states rather than the type of the values
> which were initially aggregated. The handling of GROUP BY is all the same,
> yet you only apply the HAVING clause during final aggregation. This is why I
> ended up implementing this in nodeAgg.c instead of inventing some new node
> type that's mostly a copy and paste of nodeAgg.c [1]

After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.

Finalize [hash] aggregate       -> Gather             -> Partial [hash] aggregate

The data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.

For group aggregate support of parallel aggregate, the plan should be
as follows.

Finalize Group aggregate   ->sort       -> Gather             -> Partial group aggregate                  ->sort

The data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.

With this approach, we no need to change anything in Gather node. Is
my understanding correct?

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

20 октября 2015 г., 13:23:24

On 13 October 2015 at 20:57, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > Also, I think the path for parallel aggregation should probably be
>> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
>> > path here. I'm not clear whether that is what you are thinking or
>> > not.
>>
>> No. I am thinking of the following way.
>> Gather->partialagg->some partial path
>>
>> I want the Gather node to merge the results coming from all workers,
>> otherwise
>> it may be difficult to merge at parent of gather node. Because in case
>> the partial
>> group aggregate is under the Gather node, if any of two workers are
>> returning
>> same group key data, we need to compare them and combine it to make it a
>> single group. If we are at Gather node, it is possible that we can
>> wait till we get
>> slots from all workers. Once all workers returns the slots we can compare
>> and merge the necessary slots and return the result. Am I missing
>> something?
>
>
> My assumption is the same as Robert's here.
> Unless I've misunderstood, it sounds like you're proposing to add logic into
> the Gather node to handle final aggregation? That sounds like a modularity
> violation of the whole node concept.
>
> The handling of the final aggregate stage is not all that different from the
> initial aggregate stage. The primary difference is just that your calling
> the combine function instead of the transition function, and the values

Yes, you are correct, till now i am thinking of using transition types as the
approach, because of that reason only I proposed it as Gather node to handle
the finalize aggregation.

> being aggregated are aggregates states rather than the type of the values
> which were initially aggregated. The handling of GROUP BY is all the same,
> yet you only apply the HAVING clause during final aggregation. This is why I
> ended up implementing this in nodeAgg.c instead of inventing some new node
> type that's mostly a copy and paste of nodeAgg.c [1]

After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.

Finalize [hash] aggregate
-> Gather
-> Partial [hash] aggregate

The data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.

For group aggregate support of parallel aggregate, the plan should be
as follows.

Finalize Group aggregate
->sort
-> Gather
-> Partial group aggregate
->sort

The data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.

With this approach, we no need to change anything in Gather node. Is
my understanding correct?

Our understandings are aligned.

Regards

David Rowley

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

03 декабря 2015 г., 08:18:42

On 20 October 2015 at 23:23, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 13 October 2015 at 20:57, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > Also, I think the path for parallel aggregation should probably be
>> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
>> > path here. I'm not clear whether that is what you are thinking or
>> > not.
>>
>> No. I am thinking of the following way.
>> Gather->partialagg->some partial path
>>
>> I want the Gather node to merge the results coming from all workers,
>> otherwise
>> it may be difficult to merge at parent of gather node. Because in case
>> the partial
>> group aggregate is under the Gather node, if any of two workers are
>> returning
>> same group key data, we need to compare them and combine it to make it a
>> single group. If we are at Gather node, it is possible that we can
>> wait till we get
>> slots from all workers. Once all workers returns the slots we can compare
>> and merge the necessary slots and return the result. Am I missing
>> something?
>
>
> My assumption is the same as Robert's here.
> Unless I've misunderstood, it sounds like you're proposing to add logic into
> the Gather node to handle final aggregation? That sounds like a modularity
> violation of the whole node concept.
>
> The handling of the final aggregate stage is not all that different from the
> initial aggregate stage. The primary difference is just that your calling
> the combine function instead of the transition function, and the values

Yes, you are correct, till now i am thinking of using transition types as the
approach, because of that reason only I proposed it as Gather node to handle
the finalize aggregation.

> being aggregated are aggregates states rather than the type of the values
> which were initially aggregated. The handling of GROUP BY is all the same,
> yet you only apply the HAVING clause during final aggregation. This is why I
> ended up implementing this in nodeAgg.c instead of inventing some new node
> type that's mostly a copy and paste of nodeAgg.c [1]

After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.

Finalize [hash] aggregate
-> Gather
-> Partial [hash] aggregate

The data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.

For group aggregate support of parallel aggregate, the plan should be
as follows.

Finalize Group aggregate
->sort
-> Gather
-> Partial group aggregate
->sort

The data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.

With this approach, we no need to change anything in Gather node. Is
my understanding correct?

Our understandings are aligned.

Hi,

I just wanted to cross post here to mark that I've posted an updated patch for combining aggregate states:

http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com

I also wanted to check if you've managed to make any progress on Parallel Aggregation? I'm very interested in this myself and would like to progress with it, if you're not already doing so.

My current thinking is that most of the remaining changes required for parallel aggregation, after applying the combine aggregate state patch, will be in the exact area that Tom will be making changes for the upper planner path-ification work. I'm not all that certain if we should hold off for that or not.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

03 декабря 2015 г., 09:25:22

On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> Hi,
>
> I just wanted to cross post here to mark that I've posted an updated patch
> for combining aggregate states:
> http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com
>
> I also wanted to check if you've managed to make any progress on Parallel
> Aggregation? I'm very interested in this myself and would like to progress
> with it, if you're not already doing so.

Yes, the parallel aggregate basic patch is almost ready.
This patch is based on your earlier combine state patch.
I will post it to community with in a week or so.

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

03 декабря 2015 г., 10:06:34

On 3 December 2015 at 19:24, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> Hi,
>
> I just wanted to cross post here to mark that I've posted an updated patch
> for combining aggregate states:
> http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com
>
> I also wanted to check if you've managed to make any progress on Parallel
> Aggregation? I'm very interested in this myself and would like to progress
> with it, if you're not already doing so.

Yes, the parallel aggregate basic patch is almost ready.
This patch is based on your earlier combine state patch.
I will post it to community with in a week or so.

That's great news!

Also note that there's some bug fixes in the patch I just posted on the other thread for combining aggregate states:

For example: values[Anum_pg_aggregate_aggcombinefn - 1] = ObjectIdGetDatum(combinefn);

was missing from AggregateCreate().

It might be worth diffing to the updated patch just to pull in anything else that's changed.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

11 декабря 2015 г., 09:42:45

On Thu, Dec 3, 2015 at 6:06 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 3 December 2015 at 19:24, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>> >
>> > Hi,
>> >
>> > I just wanted to cross post here to mark that I've posted an updated
>> > patch
>> > for combining aggregate states:
>> >
>> > http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=T271xbyMZjRzWQBjEixiqRF-oLH_u-Zw@mail.gmail.com
>> >
>> > I also wanted to check if you've managed to make any progress on
>> > Parallel
>> > Aggregation? I'm very interested in this myself and would like to
>> > progress
>> > with it, if you're not already doing so.
>>
>> Yes, the parallel aggregate basic patch is almost ready.
>> This patch is based on your earlier combine state patch.
>> I will post it to community with in a week or so.
>
>
> That's great news!
>
> Also note that there's some bug fixes in the patch I just posted on the
> other thread for combining aggregate states:
>
> For example:  values[Anum_pg_aggregate_aggcombinefn - 1] =
> ObjectIdGetDatum(combinefn);
> was missing from AggregateCreate().
>
> It might be worth diffing to the updated patch just to pull in anything else
> that's changed.

Here I attached a POC patch of parallel aggregate based on combine
aggregate patch. This patch contains the combine aggregate changes
also. This patch generates and executes the parallel aggregate plan
as discussed in earlier threads.

Changes:

1. The aggregate reference in Finalize aggregate is getting overwritten
with OUTER_VAR reference. But to do the final aggregate we need the
aggregate here, so currently by checking the combine states it is avoided.

2. Check whether the aggregate functions that are present in the targetlist
and qual can be executed parallel or not? Based on this the targetlist is
formed to pass it to partial aggregate.

3. Replaces the seq scan as the lefttree with partial aggregate plan and
generate full parallel aggregate plan.

Todo:
1. Needs a code cleanup, it is just a prototype.
2. Explain plan with proper instrumentation data.
3. Performance test to observe the effect of parallel aggregate.
4. Need to separate combine aggregate patch with additional changes
done.

Regards,
Hari Babu
Fujitsu Australia

Вложения

parallelagg_poc.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

11 декабря 2015 г., 18:00:30

On Fri, Dec 11, 2015 at 1:42 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> Here I attached a POC patch of parallel aggregate based on combine
> aggregate patch. This patch contains the combine aggregate changes
> also. This patch generates and executes the parallel aggregate plan
> as discussed in earlier threads.

Pretty cool.  I'm pretty sure there's some stuff in this patch that's
not right in detail, but I think this is an awfully exciting
direction.

I'd like to commit David Rowley's patch from the other thread first,
and then deal with this one afterwards.  The only thing I feel
strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
for clarity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

12 декабря 2015 г., 00:42:12

On 12 December 2015 at 04:00, Robert Haas <robertmhaas@gmail.com> wrote:

I'd like to commit David Rowley's patch from the other thread first,
and then deal with this one afterwards. The only thing I feel
strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
for clarity.

I have addressed that in my local copy. I'm now just working on adding some test code which uses the new infrastructure. Perhaps I'll just experiment with the parallel aggregate stuff instead now.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

12 декабря 2015 г., 16:28:46

On Sat, Dec 12, 2015 at 8:42 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 December 2015 at 04:00, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> I'd like to commit David Rowley's patch from the other thread first,
>> and then deal with this one afterwards.  The only thing I feel
>> strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
>> for clarity.
>
>
> I have addressed that in my local copy. I'm now just working on adding some
> test code which uses the new infrastructure. Perhaps I'll just experiment
> with the parallel aggregate stuff instead now.
>

Here I attached a patch with following changes, i feel it is better to
include them as part
of combine aggregate patch.

1. Added Missing outfuncs.c changes for newly added variables in
Aggref structure
2. Keeping the aggregate function in final aggregate stage to do the
final aggregate
on the received tuples from all workers.

Patch still needs a fix for correcting the explain plan output issue.

postgres=# explain analyze verbose select count(*), sum(f1) from tbl
where f1 % 100 = 0 group by f3;
                                                               QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Finalize HashAggregate  (cost=1853.75..1853.76 rows=1 width=12)
(actual time=92.428..92.429 rows=1 loops=1)
   Output: pg_catalog.count(*), pg_catalog.sum((sum(f1))), f3
   Group Key: tbl.f3
   ->  Gather  (cost=0.00..1850.00 rows=500 width=12) (actual
time=92.408..92.416 rows=3 loops=1)
         Output: f3, (count(*)), (sum(f1))


Regards,
Hari Babu
Fujitsu Australia

Вложения

set_ref_final_agg.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

13 декабря 2015 г., 05:17:12

On Fri, Dec 11, 2015 at 4:42 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 December 2015 at 04:00, Robert Haas <robertmhaas@gmail.com> wrote:
>> I'd like to commit David Rowley's patch from the other thread first,
>> and then deal with this one afterwards.  The only thing I feel
>> strongly needs to be changed in that patch is CFUNC -> COMBINEFUNC,
>> for clarity.
>
>
> I have addressed that in my local copy. I'm now just working on adding some
> test code which uses the new infrastructure.

Excellent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

14 декабря 2015 г., 06:04:02

On Fri, Dec 11, 2015 at 5:42 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> 3. Performance test to observe the effect of parallel aggregate.

Here I attached the performance test report of parallel aggregate.
Summary of the result is:
1. Parallel aggregate is not giving any improvement or having
very less overhead compared to parallel scan in case of low
selectivity.

2. Parallel aggregate is performing well more than 60% compared
to parallel scan because of very less data transfer overhead as the
hash aggregate operation is reducing the number of tuples that
are required to be transferred from workers to backend.

The parallel aggregate plan is depends on below parallel seq scan.
In case if parallel seq scan plan is not generated because of more
tuple transfer overhead cost in case of higher selectivity, then
parallel aggregate is also not possible. But with parallel aggregate
the number of records that are required to be transferred from
worker to backend may reduce compared to parallel seq scan. So
the overall cost of parallel aggregate may be better.

To handle this problem, how about the following way?

Having an one more member in RelOptInfo called
cheapest_parallel_path used to store the parallel path that is possible.
where ever the parallel plan is possible, this value will be set with
the possible parallel plan. If parallel plan is not possible in the parent
nodes, then this will be set as NULL. otherwise again calculate the
parallel plan at this node based on the below parallel plan node.

Once the entire paths are finalized, in grouping planner, prepare a
plan for normal aggregate and parallel aggregate. Compare these
two costs and decide the cheapest cost plan.

I didn't yet evaluated the feasibility of the above solution. suggestions?

Regards,
Hari Babu
Fujitsu Australia

Вложения

performance_test_result.xlsx

Re: Parallel Aggregate

От

Paul Ramsey

Дата:

15 декабря 2015 г., 00:04:46

On Thu, Dec 10, 2015 at 10:42 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
> Here I attached a POC patch of parallel aggregate based on combine
> aggregate patch. This patch contains the combine aggregate changes
> also. This patch generates and executes the parallel aggregate plan
> as discussed in earlier threads.

I tried this out using PostGIS with no great success. I used a very
simple aggregate for geometry union because the combine function is
just the same as the transfer function for this case (I also mark
ST_Area() as parallel safe, so that the planner will attempt to
parallelize the query)..
 CREATE AGGREGATE ST_MemUnion (  basetype = geometry,  sfunc = ST_Union,  cfunc = ST_Union,  stype = geometry );

Unfortunately attempting a test causes memory corruption and death.
 select riding, st_area(st_memunion(geom)) from vada group by riding;

The explain looks OK:
                                     QUERY PLAN
---------------------------------------------------------------------------------------Finalize HashAggregate
(cost=220629.47..240380.26rows=79 width=1189)  Group Key: riding  ->  Gather  (cost=0.00..807.49 rows=8792 width=1189)
     Number of Workers: 1        ->  Partial HashAggregate  (cost=220628.59..220629.38 rows=79

width=1189)              Group Key: riding              ->  Parallel Seq Scan on vada  (cost=0.00..806.61
rows=8792 width=1189)

But the run dies.

NOTICE:  SRID value -32897 converted to the officially unknown SRID value 0
ERROR:  Unknown geometry type: 2139062143 - Invalid type

From the message it looks like geometry gets corrupted at some point,
causing a read to fail on very screwed up metadata.

P.

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

16 декабря 2015 г., 08:12:12

On Tue, Dec 15, 2015 at 8:04 AM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> But the run dies.
>
> NOTICE:  SRID value -32897 converted to the officially unknown SRID value 0
> ERROR:  Unknown geometry type: 2139062143 - Invalid type
>
> From the message it looks like geometry gets corrupted at some point,
> causing a read to fail on very screwed up metadata.

Thanks for the test. There was some problem in advance_combination_function
in handling pass by reference data. Here I attached updated patch with the fix.


Regards,
Hari Babu
Fujitsu Australia

Вложения

parallelagg_poc_v2.patch

Re: Parallel Aggregate

От

David Rowley

Дата:

16 декабря 2015 г., 13:59:59

On 16 December 2015 at 18:11, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Dec 15, 2015 at 8:04 AM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> But the run dies.
>
> NOTICE: SRID value -32897 converted to the officially unknown SRID value 0
> ERROR: Unknown geometry type: 2139062143 - Invalid type
>
> From the message it looks like geometry gets corrupted at some point,
> causing a read to fail on very screwed up metadata.

Thanks for the test. There was some problem in advance_combination_function
in handling pass by reference data. Here I attached updated patch with the fix.

Thanks for fixing this.

I've been playing around with this patch and looking at the code, and I just have a few general questions that I'd like to ask to see if there's any good answers for them yet.

One thing I noticed is that you're only enabling Parallel aggregation when there's already a Gather node in the plan. Perhaps this is fine for a proof of concept, but I'm wondering how we can move forward from this to something that can be committed. As of how the patch is today, it means that you don't get a parallel agg plan for;

Query 1: select count(*) from mybigtable;

but you do for;

Query 2: select count(*) from mybigtable where <some fairly selective clause>;

since the <some fairly selective clause> allows parallel seq scan to win over a seq scan as it does not have as much cost to moving tuples from the worker process into the main process. If that Gather node is found the code shuffles the plan around so that the partial agg node is below it and sticks a Finalize Agg node below the Gather. I imagine you wrote this with the intention of finding something better later, once we see that it all can work, and cool, it seems to work!

I'm not quite sure what the general solution is to improve on this is this yet, as it's not really that great if we can't get parallel aggregation on Query 1, but we can on Query 2.

In master today we seem to aim to parallelise at the path level, which seems fine if we only aim to have SeqScan as the only parallel enabled node, but once we have several other nodes parallel enabled, we might, for instance, want to start parallelising whole plan tree branches, if all nodes in that branch happen to support parallelism.

I'm calling what we have today "keyhole parallelism", because we enable parallelism while looking at a tiny part of the picture. I get the impression that the bigger picture has been overlooked as perhaps it's more complex and keyhole parallelism at least allows us to get something in that's parallelised, but this patch indicates to me that we're already hitting the limits of that, should we rethink? I'm concerned as I've come to learn that changing is sort of thing after a release is much harder as people start to find cases where performance regresses which makes it much more difficult to improve things.

Also my apologies if I've missed some key conversation about how all of the above is intended to work. Please feel free to kick me into line if that's the case.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

18 декабря 2015 г., 21:39:07

On Wed, Dec 16, 2015 at 5:59 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> One thing I noticed is that you're only enabling Parallel aggregation when
> there's already a Gather node in the plan. Perhaps this is fine for a proof
> of concept, but I'm wondering how we can move forward from this to something
> that can be committed.

As far as that goes, I think the infrastructure introduced by the
parallel join patch will be quite helpful here.  That introduces the
concept of a "partial path" - that is, a path that needs a Gather node
in order to be completed.  And that's exactly what you need here:
after join planning, if there's a partial path available for the final
rel, then you can consider
FinalizeAggregate->Gather->PartialAggregate->[the best partial path].
Of course, whether a partial path is available or not, you can
consider Aggregate->[the best regular old path].

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

21 декабря 2015 г., 07:24:24

On Sat, Dec 19, 2015 at 5:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Dec 16, 2015 at 5:59 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> One thing I noticed is that you're only enabling Parallel aggregation when
>> there's already a Gather node in the plan. Perhaps this is fine for a proof
>> of concept, but I'm wondering how we can move forward from this to something
>> that can be committed.
>
> As far as that goes, I think the infrastructure introduced by the
> parallel join patch will be quite helpful here.  That introduces the
> concept of a "partial path" - that is, a path that needs a Gather node
> in order to be completed.  And that's exactly what you need here:
> after join planning, if there's a partial path available for the final
> rel, then you can consider
> FinalizeAggregate->Gather->PartialAggregate->[the best partial path].
> Of course, whether a partial path is available or not, you can
> consider Aggregate->[the best regular old path].

Thanks for the details.
Generated partial aggregate plan on top of partial path list that is available.
The code changes are took from the parallel join patch for reference.

Instead of generating parallel aggregate plan on top of partial path list
if exists, how about checking the cost of normal aggregate and parallel
aggregate and decide which one best?

The parallel aggregate patch is now separated from combine aggregate patch.
The latest combine aggregate patch is also attached in the mail for reference
as parallel aggregate patch depends on it.

Attached latest performance report. Parallel aggregate is having some overhead
in case of low selectivity.This can be avoided with the help of cost comparison
between normal and parallel aggregates.

Regards,
Hari Babu
Fujitsu Australia

Вложения

Re: Parallel Aggregate

От

David Rowley

Дата:

21 декабря 2015 г., 10:48:19

On 21 December 2015 at 17:23, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

Attached latest performance report. Parallel aggregate is having some overhead
in case of low selectivity.This can be avoided with the help of cost comparison
between normal and parallel aggregates.

Hi, Thanks for posting an updated patch.

Would you be able to supply a bit more detail on your benchmark? I'm surprised by the slowdown reported with the high selectivity version. It gives me the impression that the benchmark might be producing lots of groups which need to be pushed through the tuple queue to the main process. I think it would be more interesting to see benchmarks with varying number of groups, rather than scan selectivity. Selectivity was important for parallel seqscan, but less so for this, as it's aggregated groups we're sending to main process, not individual tuples.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

21 декабря 2015 г., 13:34:02

On Mon, Dec 21, 2015 at 6:48 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 21 December 2015 at 17:23, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>>
>> Attached latest performance report. Parallel aggregate is having some
>> overhead
>> in case of low selectivity.This can be avoided with the help of cost
>> comparison
>> between normal and parallel aggregates.
>>
>
> Hi, Thanks for posting an updated patch.
>
> Would you be able to supply a bit more detail on your benchmark? I'm
> surprised by the slowdown reported with the high selectivity version. It
> gives me the impression that the benchmark might be producing lots of groups
> which need to be pushed through the tuple queue to the main process. I think
> it would be more interesting to see benchmarks with varying number of
> groups, rather than scan selectivity. Selectivity was important for parallel
> seqscan, but less so for this, as it's aggregated groups we're sending to
> main process, not individual tuples.

Yes the query is producing more groups according to the selectivity.
For example - scan selectivity - 400000, the number of groups - 400

Following is the query:

SELECT tenpoCord,       SUM(yokinZandaka)   AS yokinZandakaxGOUKEI,       SUM(kashikoshiZandaka)   AS
kashikoshiZandakaxGOUKEI,      SUM(kouzasuu)     AS kouzasuuxGOUKEI,       SUM(sougouKouzasuu) AS sougouKouzasuuxGOUKEI
FROM public.test01 WHERE tenpoCord       <= '001'  AND       kamokuCord       = '01'   AND       kouzaKatujyoutaiCord =
'0'
GROUP BY kinkoCord,tenpoCord;


Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

Paul Ramsey

Дата:

21 декабря 2015 г., 18:16:26

On December 21, 2015 at 2:33:56 AM, Haribabu Kommi (kommi.haribabu@gmail.com) wrote:

Yes the query is producing more groups according to the selectivity.
For example - scan selectivity - 400000, the number of groups - 400

Following is the query:

SELECT tenpoCord,
SUM(yokinZandaka) AS yokinZandakaxGOUKEI,
SUM(kashikoshiZandaka) AS kashikoshiZandakaxGOUKEI,
SUM(kouzasuu) AS kouzasuuxGOUKEI,
SUM(sougouKouzasuu) AS sougouKouzasuuxGOUKEI
FROM public.test01
WHERE tenpoCord <= '001' AND
kamokuCord = '01' AND
kouzaKatujyoutaiCord = '0'
GROUP BY kinkoCord,tenpoCord;

Shouldn’t parallel aggregate come into play regardless of scan selectivity? I know in PostGIS land there’s a lot of stuff like:

SELECT ST_Union(geom) FROM t GROUP BY areacode;

Basically, in the BI case, there’s often no filter at all. Hoping that’s considered a prime case for parallel agg :)

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

22 декабря 2015 г., 02:35:41

On Tue, Dec 22, 2015 at 2:16 AM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Shouldn’t parallel aggregate come into play regardless of scan selectivity?
> I know in PostGIS land there’s a lot of stuff like:
>
> SELECT ST_Union(geom) FROM t GROUP BY areacode;
>
> Basically, in the BI case, there’s often no filter at all. Hoping that’s
> considered a prime case for parallel agg :)

Yes, the latest patch attached in the thread addresses this issue.
But it still lacks of proper cost calculation and comparison with
original aggregate cost.

The parallel aggregate selects only when the number of groups
should be at least less than 1/4 of rows that are getting selected.
Otherwise, doing aggregation two times for more number of
records leads to performance drop compared to original aggregate.

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

22 декабря 2015 г., 02:38:20

On 22 December 2015 at 04:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

Shouldn’t parallel aggregate come into play regardless of scan selectivity?

I'd say that the costing should take into account the estimated number of groups.

The more tuples that make it into each group, the more attractive parallel grouping should seem. In the extreme case if there's 1 tuple per group, then it's not going to be of much use to use parallel agg, this would be similar to a scan with 100% selectivity. So perhaps the costings for it can be modeled around a the parallel scan costing, but using the estimated groups instead of the estimated tuples.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

23 декабря 2015 г., 21:12:53

On Mon, Dec 21, 2015 at 6:38 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 22 December 2015 at 04:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>>
>> Shouldn’t parallel aggregate come into play regardless of scan
>> selectivity?
>
> I'd say that the costing should take into account the estimated number of
> groups.
>
> The more tuples that make it into each group, the more attractive parallel
> grouping should seem. In the extreme case if there's 1 tuple per group, then
> it's not going to be of much use to use parallel agg, this would be similar
> to a scan with 100% selectivity. So perhaps the costings for it can be
> modeled around a the parallel scan costing, but using the estimated groups
> instead of the estimated tuples.

Generally, the way that parallel costing is supposed to work (with the
parallel join patch, anyway) is that you've got the same nodes costed
the same way you would otherwise, but the row counts are lower because
you're only processing 1/Nth of the rows.  That's probably not exactly
the whole story here, but it's something to think about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

21 января 2016 г., 08:26:44

On Thu, Dec 24, 2015 at 5:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 21, 2015 at 6:38 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 22 December 2015 at 04:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>>>
>>> Shouldn’t parallel aggregate come into play regardless of scan
>>> selectivity?
>>
>> I'd say that the costing should take into account the estimated number of
>> groups.
>>
>> The more tuples that make it into each group, the more attractive parallel
>> grouping should seem. In the extreme case if there's 1 tuple per group, then
>> it's not going to be of much use to use parallel agg, this would be similar
>> to a scan with 100% selectivity. So perhaps the costings for it can be
>> modeled around a the parallel scan costing, but using the estimated groups
>> instead of the estimated tuples.
>
> Generally, the way that parallel costing is supposed to work (with the
> parallel join patch, anyway) is that you've got the same nodes costed
> the same way you would otherwise, but the row counts are lower because
> you're only processing 1/Nth of the rows.  That's probably not exactly
> the whole story here, but it's something to think about.

Here I attached update parallel aggregate patch on top of recent commits
of combine aggregate and parallel join patch. It still lacks of cost comparison
code to compare both parallel and normal aggregates.


Regards,
Hari Babu
Fujitsu Australia

Вложения

parallelagg_poc_v5.patch

Re: Parallel Aggregate

От

David Rowley

Дата:

21 января 2016 г., 23:44:50

On 21 January 2016 at 18:26, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Here I attached update parallel aggregate patch on top of recent commits
> of combine aggregate and parallel join patch. It still lacks of cost comparison
> code to compare both parallel and normal aggregates.

Thanks for the updated patch.

I'm just starting to look over this now.

# create table t1 as select x from generate_series(1,1000000) x(x);
# vacuum ANALYZE t1;
# set max_parallel_degree =8;
# explain select sum(x) from t1;                              QUERY PLAN
-------------------------------------------------------------------------Aggregate  (cost=9633.33..9633.34 rows=1
width=4) ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667 width=4)

(2 rows)

I'm not quite sure what's happening here yet as I've not ran it
through my debugger, but how can we have a Parallel Seq Scan without a
Gather node? It appears to give correct results, so I can only assume
it's not actually a parallel scan at all.

Let's check:

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';relname | seq_scan
---------+----------t1      |        0
(1 row)

# explain analyze select sum(x) from t1;                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=9633.33..9633.34 rows=1 width=4) (actual

time=161.820..161.821 rows=1 loops=1)  ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.051..85.348 rows=1000000 loops=1)Planning time: 0.040 msExecution time: 161.861 ms
(4 rows)

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';relname | seq_scan
---------+----------t1      |        1
(1 row)

Only 1 scan.

# explain analyze select * from t1 where x=1;                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------Gather
(cost=1000.00..10633.43rows=1 width=4) (actual

time=0.231..49.105 rows=1 loops=1)  Number of Workers: 2  ->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=0
width=4)
(actual time=29.060..45.302 rows=0 loops=3)        Filter: (x = 1)        Rows Removed by Filter: 333333Planning time:
0.049msExecution time: 51.438 ms

(7 rows)

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';relname | seq_scan
---------+----------t1      |        4
(1 row)

3 more scans. This one seems to actually be parallel, and makes sense
based on "Number of Workers: 2"

Also looking at the patch:

+bool
+aggregates_allow_partial(Node *clause)
+{

In the latest patch that I sent on the combine aggregates thread:
http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=RzQ3Kg6RnPN-fzzhdDiyvg@mail.gmail.com
I made it so there's 3 possible return values from this function. As
your patch stands now, if I create an aggregate function with an
INTERNAL state with a combine function set, then this patch might try
to parallel aggregate that and pass around the pointer to the internal
state in the Tuple going from the worker to the main process, when the
main process dereferences this pointer we'll get a segmentation
violation. So I'd say you should maybe use a modified version of my
latest aggregates_allow_partial() and check for PAT_ANY, and only
parallelise the aggregate if you get that value.  If the use of
partial aggregate was within a single process then you could be quite
content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
that checks for serial and deserial functions, since that's not in
yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
are found which have combine functions set.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

22 января 2016 г., 07:25:54

On Fri, Jan 22, 2016 at 7:44 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 21 January 2016 at 18:26, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> Here I attached update parallel aggregate patch on top of recent commits
>> of combine aggregate and parallel join patch. It still lacks of cost comparison
>> code to compare both parallel and normal aggregates.
>
> Thanks for the updated patch.
>
> I'm just starting to look over this now.
>
> # create table t1 as select x from generate_series(1,1000000) x(x);
> # vacuum ANALYZE t1;
> # set max_parallel_degree =8;
> # explain select sum(x) from t1;
>                                QUERY PLAN
> -------------------------------------------------------------------------
>  Aggregate  (cost=9633.33..9633.34 rows=1 width=4)
>    ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667 width=4)
> (2 rows)
>
> I'm not quite sure what's happening here yet as I've not ran it
> through my debugger, but how can we have a Parallel Seq Scan without a
> Gather node? It appears to give correct results, so I can only assume
> it's not actually a parallel scan at all.
>
> Let's check:
>
> # select relname,seq_scan from pg_stat_user_tables where relname ='t1';
>  relname | seq_scan
> ---------+----------
>  t1      |        0
> (1 row)
>
> # explain analyze select sum(x) from t1;
>                                                         QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=9633.33..9633.34 rows=1 width=4) (actual
> time=161.820..161.821 rows=1 loops=1)
>    ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667
> width=4) (actual time=0.051..85.348 rows=1000000 loops=1)
>  Planning time: 0.040 ms
>  Execution time: 161.861 ms
> (4 rows)
>
> # select relname,seq_scan from pg_stat_user_tables where relname ='t1';
>  relname | seq_scan
> ---------+----------
>  t1      |        1
> (1 row)
>
> Only 1 scan.
>
>
> # explain analyze select * from t1 where x=1;
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Gather  (cost=1000.00..10633.43 rows=1 width=4) (actual
> time=0.231..49.105 rows=1 loops=1)
>    Number of Workers: 2
>    ->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=0 width=4)
> (actual time=29.060..45.302 rows=0 loops=3)
>          Filter: (x = 1)
>          Rows Removed by Filter: 333333
>  Planning time: 0.049 ms
>  Execution time: 51.438 ms
> (7 rows)
>
> # select relname,seq_scan from pg_stat_user_tables where relname ='t1';
>  relname | seq_scan
> ---------+----------
>  t1      |        4
> (1 row)
>
> 3 more scans. This one seems to actually be parallel, and makes sense
> based on "Number of Workers: 2"


The problem was the gather path that is generated on partial path list is
not getting added to path list, because of which, there is  a mismatch in
sorted path and cheapest_path, so it leads to a wrong plan.

For temporarily, I marked the sorted_path and cheapest_path as same
and it works fine.


> Also looking at the patch:
>
> +bool
> +aggregates_allow_partial(Node *clause)
> +{
>
> In the latest patch that I sent on the combine aggregates thread:
> http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=RzQ3Kg6RnPN-fzzhdDiyvg@mail.gmail.com
> I made it so there's 3 possible return values from this function. As
> your patch stands now, if I create an aggregate function with an
> INTERNAL state with a combine function set, then this patch might try
> to parallel aggregate that and pass around the pointer to the internal
> state in the Tuple going from the worker to the main process, when the
> main process dereferences this pointer we'll get a segmentation
> violation. So I'd say you should maybe use a modified version of my
> latest aggregates_allow_partial() and check for PAT_ANY, and only
> parallelise the aggregate if you get that value.  If the use of
> partial aggregate was within a single process then you could be quite
> content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
> that checks for serial and deserial functions, since that's not in
> yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
> are found which have combine functions set.
>

I took the suggested code changes from combine aggregate patch and
changed accordingly.

Along with these changes, I added a float8 combine function to see
how it works under parallel aggregate, it is working fine for float4, but
giving small data mismatch with float8 data type.

postgres=# select avg(f3), avg(f4) from tbl;
       avg        |       avg
------------------+------------------
 1.10000002384186 | 100.123449999879
(1 row)

postgres=# set enable_parallelagg = true;
SET
postgres=# select avg(f3), avg(f4) from tbl;
       avg        |       avg
------------------+------------------
 1.10000002384186 | 100.123449999918
(1 row)

Column - f3 - float4
Column - f4 - float8

similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
aggregates. Any special care is needed for float8 datatype?


Regards,
Hari Babu
Fujitsu Australia

Вложения

Re: Parallel Aggregate

От

David Rowley

Дата:

22 января 2016 г., 14:13:53

On 22 January 2016 at 17:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Along with these changes, I added a float8 combine function to see
> how it works under parallel aggregate, it is working fine for float4, but
> giving small data mismatch with float8 data type.
>
> postgres=# select avg(f3), avg(f4) from tbl;
>        avg        |       avg
> ------------------+------------------
>  1.10000002384186 | 100.123449999879
> (1 row)
>
> postgres=# set enable_parallelagg = true;
> SET
> postgres=# select avg(f3), avg(f4) from tbl;
>        avg        |       avg
> ------------------+------------------
>  1.10000002384186 | 100.123449999918
> (1 row)
>
> Column - f3 - float4
> Column - f4 - float8
>
> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
> aggregates. Any special care is needed for float8 datatype?

I'm not sure if this is what's going on here, as I don't really know
the range of numbers that you've used to populate f4 with. It would be
good to know, does "f4" contain negative values too?

It's not all that hard to demonstrate the instability of addition with
float8. Take the following example:

create table d (d float8);
insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231);

# select sum(d order by random()) from d;sum
-----  0
(1 row)

same query, once more.

# select sum(d order by random()) from d;  sum
----------2.3e-013
(1 row)

Here the result just depends on the order which the numbers have been
added. You may need to execute a few more times to see the result
change.

Perhaps a good test would be to perform a sum(f4 order by random()) in
serial mode, and see if you're getting a stable result from the
numbers that you have populated the table with.

If that's the only problem at play here, then I for one am not worried
about it, as the instability already exists today depending on which
path is chosen to scan the relation. For example an index scan is
likely not to return rows in the same order as a seq scan.

We do also warn about this in the manual: "Inexact means that some
values cannot be converted exactly to the internal format and are
stored as approximations, so that storing and retrieving a value might
show slight discrepancies. Managing these errors and how they
propagate through calculations is the subject of an entire branch of
mathematics and computer science and will not be discussed here,
except for the following points:" [1]

[1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

23 января 2016 г., 05:00:03

On Fri, Jan 22, 2016 at 10:13 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 22 January 2016 at 17:25, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> Along with these changes, I added a float8 combine function to see
>> how it works under parallel aggregate, it is working fine for float4, but
>> giving small data mismatch with float8 data type.
>>
>> postgres=# select avg(f3), avg(f4) from tbl;
>>        avg        |       avg
>> ------------------+------------------
>>  1.10000002384186 | 100.123449999879
>> (1 row)
>>
>> postgres=# set enable_parallelagg = true;
>> SET
>> postgres=# select avg(f3), avg(f4) from tbl;
>>        avg        |       avg
>> ------------------+------------------
>>  1.10000002384186 | 100.123449999918
>> (1 row)
>>
>> Column - f3 - float4
>> Column - f4 - float8
>>
>> similar problem for all float8 var_pop, var_samp, stddev_pop and stddev_samp
>> aggregates. Any special care is needed for float8 datatype?
>
> I'm not sure if this is what's going on here, as I don't really know
> the range of numbers that you've used to populate f4 with. It would be
> good to know, does "f4" contain negative values too?

No negative values are present in the f4 column.
Following are the SQL statements,

create table tbl(f1 int, f2 char(100), f3 float4, f4 float8);
insert into tbl values(generate_series(1,100000), 'Fujitsu', 1.1, 100.12345);


> It's not all that hard to demonstrate the instability of addition with
> float8. Take the following example:
>
> create table d (d float8);
> insert into d values(1223123223412324.2231),(0.00000000000023),(-1223123223412324.2231);
>
> # select sum(d order by random()) from d;
>  sum
> -----
>    0
> (1 row)
>
> same query, once more.
>
> # select sum(d order by random()) from d;
>    sum
> ----------
>  2.3e-013
> (1 row)
>
> Here the result just depends on the order which the numbers have been
> added. You may need to execute a few more times to see the result
> change.
>
> Perhaps a good test would be to perform a sum(f4 order by random()) in
> serial mode, and see if you're getting a stable result from the
> numbers that you have populated the table with.
>
> If that's the only problem at play here, then I for one am not worried
> about it, as the instability already exists today depending on which
> path is chosen to scan the relation. For example an index scan is
> likely not to return rows in the same order as a seq scan.
>
> We do also warn about this in the manual: "Inexact means that some
> values cannot be converted exactly to the internal format and are
> stored as approximations, so that storing and retrieving a value might
> show slight discrepancies. Managing these errors and how they
> propagate through calculations is the subject of an entire branch of
> mathematics and computer science and will not be discussed here,
> except for the following points:" [1]
>
>
> [1] http://www.postgresql.org/docs/devel/static/datatype-numeric.html
>

Thanks for the detailed explanation. Now I understood.

Here I attached updated patch with additional combine function for
two stage aggregates also.


Regards,
Hari Babu
Fujitsu Australia

Вложения

additional_combine_fns_v1.patch

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

25 января 2016 г., 03:56:33

On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
> Here I attached updated patch with additional combine function for
> two stage aggregates also.

A wrong combine function was added in pg_aggregate.h in the earlier
patch that leading to
initdb problem. Corrected one is attached.

Regards,
Hari Babu
Fujitsu Australia

Вложения

additional_combine_fns_v2.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

07 февраля 2016 г., 18:00:33

On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>> Here I attached updated patch with additional combine function for
>> two stage aggregates also.
>
> A wrong combine function was added in pg_aggregate.h in the earlier
> patch that leading to
> initdb problem. Corrected one is attached.

I'm not entirely sure I know what's going on here, but I'm pretty sure
that it makes no sense for the new float8_pl function to reject
non-aggregate callers at the beginning and then have a comment at the
end indicating what it does when not invoked as an aggregate.
Similarly for the other new function.

It would be a lot more clear what this patch was trying to accomplish
if the new functions had header comments explaining their purpose -
not what they do, but why they exist.

float8_regr_pl is labeled in pg_proc.h as an aggregate transition
function, but I'm wondering if it should say combine function.

The changes to pg_aggregate.h include a large number of
whitespace-only changes which are unacceptable.  Please change only
the lines that need to be changed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Robert Haas

Дата:

08 февраля 2016 г., 01:01:48

 On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>  [ new patch ]

This patch contains a number of irrelevant hunks that really ought not
to be here and make the patch harder to understand, like this:

-                        * Generate appropriate target list for
scan/join subplan; may be
-                        * different from tlist if grouping or
aggregation is needed.
+                        * Generate appropriate target list for
subplan; may be different from
+                        * tlist if grouping or aggregation is needed.

Please make a habit of getting rid of that sort of thing before submitting.

Generally, I'm not quite sure I understand the code here.  It seems to
me that what we ought to be doing is that grouping_planner, right
after considering using a presorted path (that is, just after the if
(sorted_path) block between lines 1822-1850), ought to then consider
using a partial path.  For the moment, it need not consider the
possibility that there may be a presorted partial path, because we
don't have any way to generate those yet.  (I have plans to fix that,
but not in time for 9.6.)  So it can just consider doing a Partial
Aggregate on the cheapest partial path using an explicit sort, or
hashing; then, above the Gather, it can finalize either by hashing or
by sorting and grouping.

The trick is that there's no path representation of an aggregate, and
there won't be until Tom finishes his upper planner path-ification
work.  But it seems to me we can work around that.  Set best_path to
the cheapest partial path, add a partial aggregate rather than a
regular one around where it says "Insert AGG or GROUP node if needed,
plus an explicit sort step if necessary", and then push a Gather node
and a Finalize Aggregate onto the result.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

08 февраля 2016 г., 04:22:10

On Mon, Feb 8, 2016 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Jan 24, 2016 at 7:56 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>> On Sat, Jan 23, 2016 at 12:59 PM, Haribabu Kommi
>> <kommi.haribabu@gmail.com> wrote:
>>> Here I attached updated patch with additional combine function for
>>> two stage aggregates also.
>>
>> A wrong combine function was added in pg_aggregate.h in the earlier
>> patch that leading to
>> initdb problem. Corrected one is attached.
>
> I'm not entirely sure I know what's going on here, but I'm pretty sure
> that it makes no sense for the new float8_pl function to reject
> non-aggregate callers at the beginning and then have a comment at the
> end indicating what it does when not invoked as an aggregate.
> Similarly for the other new function.
>
> It would be a lot more clear what this patch was trying to accomplish
> if the new functions had header comments explaining their purpose -
> not what they do, but why they exist.

I added some header comments explaining the need of these functions
and when they will be used? These combine functions are necessary
to float4 and float8 for parallel aggregation.

> float8_regr_pl is labeled in pg_proc.h as an aggregate transition
> function, but I'm wondering if it should say combine function.

corrected.

> The changes to pg_aggregate.h include a large number of
> whitespace-only changes which are unacceptable.  Please change only
> the lines that need to be changed.

I try to align the other rows according to the new combine function addition,
that leads to a white space problem, I will take care of such things in future.
Here I attached updated patch with the corrections.

Regards,
Hari Babu
Fujitsu Australia

Вложения

additional_combine_fns_v3.patch

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

08 февраля 2016 г., 12:44:28

On Mon, Feb 8, 2016 at 9:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>  On Thu, Jan 21, 2016 at 11:25 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>>  [ new patch ]
>
> This patch contains a number of irrelevant hunks that really ought not
> to be here and make the patch harder to understand, like this:
>
> -                        * Generate appropriate target list for
> scan/join subplan; may be
> -                        * different from tlist if grouping or
> aggregation is needed.
> +                        * Generate appropriate target list for
> subplan; may be different from
> +                        * tlist if grouping or aggregation is needed.
>
> Please make a habit of getting rid of that sort of thing before submitting.

sure. I will take of such things in future.

> Generally, I'm not quite sure I understand the code here.  It seems to
> me that what we ought to be doing is that grouping_planner, right
> after considering using a presorted path (that is, just after the if
> (sorted_path) block between lines 1822-1850), ought to then consider
> using a partial path.  For the moment, it need not consider the
> possibility that there may be a presorted partial path, because we
> don't have any way to generate those yet.  (I have plans to fix that,
> but not in time for 9.6.)  So it can just consider doing a Partial
> Aggregate on the cheapest partial path using an explicit sort, or
> hashing; then, above the Gather, it can finalize either by hashing or
> by sorting and grouping.
>
> The trick is that there's no path representation of an aggregate, and
> there won't be until Tom finishes his upper planner path-ification
> work.  But it seems to me we can work around that.  Set best_path to
> the cheapest partial path, add a partial aggregate rather than a
> regular one around where it says "Insert AGG or GROUP node if needed,
> plus an explicit sort step if necessary", and then push a Gather node
> and a Finalize Aggregate onto the result.

Thanks, i will update the patch accordingly. Along with those changes,
I will try to calculate the cost involved in normal aggregate without
generating the plan and compare it against the parallel cost plan before
generating the actual plan. Because with less number of groups
normal aggregate is performing better than parallel aggregate in tests.


Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

Robert Haas

Дата:

13 февраля 2016 г., 07:51:31

On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:future.
> Here I attached updated patch with the corrections.

So, what about the main patch, for parallel aggregation itself?  I'm
reluctant to spend too much time massaging combine functions if we
don't have the infrastructure to use them.

This patch removes the comment from float8_regr_sxx in pg_proc.h for
no apparent reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

17 февраля 2016 г., 07:51:16

On Sat, Feb 13, 2016 at 3:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 7, 2016 at 8:21 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:future.
>> Here I attached updated patch with the corrections.
>
> So, what about the main patch, for parallel aggregation itself?  I'm
> reluctant to spend too much time massaging combine functions if we
> don't have the infrastructure to use them.

Here I attached a draft patch based on previous discussions. It still needs
better comments and optimization.

Overview:
1. Before creating the plan for the best path, verify whether parallel aggregate
plan is possible or not? If possible check whether it is the cheapest plan
compared to normal aggregate? If parallel is cheaper then replace the best
path with the cheapest_partial_path.

2. while generating parallel aggregate plan, first generate targetlist of
partial aggregate by generating bare aggregate references and group by
expressions.

3. Change the aggref->aggtype with aggtranstype in the partial aggregate
targetlist to return a proper tuple data from worker.

4. Generate partial aggregate node using the generated targetlist.

5. Add gather and finalize aggregate nodes on top of partial aggregate plan.

To do:
1. Optimize the aggregate cost calculation mechanism, currently it is used
many times.
2. Better comments and etc.

Please verify whether the patch is in the right direction as per your
expectation?

Regards,
Hari Babu
Fujitsu Australia

Вложения

parallelagg_poc_v7.patch

Re: Parallel Aggregate

От

David Rowley

Дата:

04 марта 2016 г., 07:00:31

On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Here I attached a draft patch based on previous discussions. It still needs
> better comments and optimization.

Over in [1] Tom posted a large change to the grouping planner which
causes large conflict with the parallel aggregation patch. I've been
looking over Tom's patch and reading the related thread and I've
observed 3 things:

1. Parallel Aggregate will be much easier to write and less code to
base it up top of Tom's upper planner changes. The latest patch does
add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
be required after Tom pushes the changes to the upper planner.
2. If we apply parallel aggregate before Tom's upper planner changes
go in, then Tom needs to reinvent it again when rebasing his patch.
This seems senseless, so this is why I did this work.
3. Based on the thread, most people are leaning towards getting Tom's
changes in early to allow a bit more settle time before beta, and
perhaps also to allow other patches to go in after (e.g this)

So, I've done a bit of work and I've rewritten the parallel aggregate
code to base it on top of Tom's patch posted in [1]. There's a few
things that are left unsolved at this stage.

1. exprType() for Aggref still returns the aggtype, where it needs to
return the trans type for partial agg nodes, this need to return the
trans type rather than the aggtype. I had thought I might fix this by
adding a proxy node type that sits in the targetlist until setrefs.c
where it can be plucked out and replaced by the Aggref. I need to
investigate this further.
2. There's an outstanding bug relating to HAVING clause not seeing the
right state of aggregation and returning wrong results. I've not had
much time to look into this yet, but I suspect its an existing bug
that's already in master from my combine aggregate patch. I will
investigate this on Sunday.

In regards to the patch, there's a few things worth mentioning here:

1. I've had to add a parallel_degree parameter to create_group_path()
and create_agg_path(). I think Tom is going to make changes to his
patch so that the Path's parallel_degree is propagated to subnodes,
this should allow me to remove this parameter and just use
parallel_degree the one from the subpath.
2. I had to add a new parameter to pass an optional row estimate to
cost_gather() as I don't have a RelOptInfo available to get a row
estimate from which represents the state after partial aggregation. I
thought this change was ok, but I'll listen to anyone who thinks of a
better way to do it.
3. The code never attempts to mix and match Grouping Agg and Hash Agg
plans. e.g it could be an idea to perform Partial Hash Aggregate ->
Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
stage. I just thought doing this is more complex than what's really
needed, but if someone can think of a case where this would be a great
win then I'll listen, but you have to remember we don't have any
pre-sorted partial paths at this stage, so an explicit sort is
required *always*. This might change if someone invented partial btree
index scans... but until then...

Due to the existence of the outstanding issues above, I feel like I
might be posting the patch a little earlier, but wanted to do so since
this is quite a hot area in the code at the moment and I wanted to
post for transparency.

To apply the patch please apply [1] first.

[1] http://www.postgresql.org/message-id/3795.1456689808@sss.pgh.pa.us

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

parallel_aggregation_d6850a9f_2016-03-04.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

04 марта 2016 г., 21:25:52

On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> Here I attached a draft patch based on previous discussions. It still needs
>> better comments and optimization.
>
> Over in [1] Tom posted a large change to the grouping planner which
> causes large conflict with the parallel aggregation patch. I've been
> looking over Tom's patch and reading the related thread and I've
> observed 3 things:
>
> 1. Parallel Aggregate will be much easier to write and less code to
> base it up top of Tom's upper planner changes. The latest patch does
> add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
> be required after Tom pushes the changes to the upper planner.
> 2. If we apply parallel aggregate before Tom's upper planner changes
> go in, then Tom needs to reinvent it again when rebasing his patch.
> This seems senseless, so this is why I did this work.
> 3. Based on the thread, most people are leaning towards getting Tom's
> changes in early to allow a bit more settle time before beta, and
> perhaps also to allow other patches to go in after (e.g this)
>
> So, I've done a bit of work and I've rewritten the parallel aggregate
> code to base it on top of Tom's patch posted in [1].

Great!

> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
> stage. I just thought doing this is more complex than what's really
> needed, but if someone can think of a case where this would be a great
> win then I'll listen, but you have to remember we don't have any
> pre-sorted partial paths at this stage, so an explicit sort is
> required *always*. This might change if someone invented partial btree
> index scans... but until then...

Actually, Rahila Syed is working on that.  But it's not done yet, so
presumably will not go into 9.6.

I don't really see the logic of this, though.  Currently, Gather
destroys the input ordering, so it seems preferable for the
finalize-aggregates stage to use a hash aggregate whenever possible,
whatever the partial-aggregate stage did.  Otherwise, we need an
explicit sort.  Anyway, it seems like the two stages should be costed
and decided on their own merits - there's no reason to chain the two
decisions together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

06 марта 2016 г., 14:22:25

On Fri, Mar 4, 2016 at 3:00 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
>> Here I attached a draft patch based on previous discussions. It still needs
>> better comments and optimization.
>
> Over in [1] Tom posted a large change to the grouping planner which
> causes large conflict with the parallel aggregation patch. I've been
> looking over Tom's patch and reading the related thread and I've
> observed 3 things:
>
> 1. Parallel Aggregate will be much easier to write and less code to
> base it up top of Tom's upper planner changes. The latest patch does
> add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
> be required after Tom pushes the changes to the upper planner.
> 2. If we apply parallel aggregate before Tom's upper planner changes
> go in, then Tom needs to reinvent it again when rebasing his patch.
> This seems senseless, so this is why I did this work.
> 3. Based on the thread, most people are leaning towards getting Tom's
> changes in early to allow a bit more settle time before beta, and
> perhaps also to allow other patches to go in after (e.g this)
>
> So, I've done a bit of work and I've rewritten the parallel aggregate
> code to base it on top of Tom's patch posted in [1]. There's a few
> things that are left unsolved at this stage.
>
> 1. exprType() for Aggref still returns the aggtype, where it needs to
> return the trans type for partial agg nodes, this need to return the
> trans type rather than the aggtype. I had thought I might fix this by
> adding a proxy node type that sits in the targetlist until setrefs.c
> where it can be plucked out and replaced by the Aggref. I need to
> investigate this further.
> 2. There's an outstanding bug relating to HAVING clause not seeing the
> right state of aggregation and returning wrong results. I've not had
> much time to look into this yet, but I suspect its an existing bug
> that's already in master from my combine aggregate patch. I will
> investigate this on Sunday.
>

Thanks for updating the patch. Here I attached updated patch
with the additional changes,

1. Now parallel aggregation works with expressions along with aggregate
functions also.
2. Aggref return the trans type instead of agg type, this change adds
the support of parallel aggregate to float aggregates, still it needs a
fix in _equalAggref function.

Pending:
1. Explain plan needs to be corrected for parallel grouping similar like
parallel aggregate.

To apply this patch, first apply the patch in [1]

[1] - http://www.postgresql.org/message-id/14172.1457228315@sss.pgh.pa.us

Regards,
Hari Babu
Fujitsu Australia

Вложения

parallelagg_v1.patch

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

07 марта 2016 г., 08:20:25

On Sun, Mar 6, 2016 at 10:21 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
> Pending:
> 1. Explain plan needs to be corrected for parallel grouping similar like
> parallel aggregate.

Here I attached update patch with further changes,
1. Explain plan changes for parallel grouping

2. Temporary fix for float aggregate types in _equalAggref because of
a change in aggtype to trans type, otherwise the parallel aggregation
plan failure in set_plan_references. whenever the aggtype is not matched,
it verifies with the trans type also.

3. Generates parallel path for all partial paths and add it to the path_list,
based on the cheapest_path, the plan is chosen.

To apply this patch, first apply the patch in [1]

[1] - http://www.postgresql.org/message-id/14172.1457228315@sss.pgh.pa.us

Regards,
Hari Babu
Fujitsu Australia

Вложения

parallelagg_v2.patch

Re: Parallel Aggregate

От

Tom Lane

Дата:

07 марта 2016 г., 08:39:15

Haribabu Kommi <kommi.haribabu@gmail.com> writes:
> 2. Temporary fix for float aggregate types in _equalAggref because of
> a change in aggtype to trans type, otherwise the parallel aggregation
> plan failure in set_plan_references. whenever the aggtype is not matched,
> it verifies with the trans type also.

That is a completely unacceptable kluge.  Quite aside from being ugly as
sin, it probably breaks more things than it fixes, first because it breaks
the fundamental semantics of equal() across the board, and second because
it puts catalog lookups into equal(), which *will* cause problems.  You
can not expect that this will get committed, not even as a "temporary fix".
        regards, tom lane

Re: Parallel Aggregate

От

David Rowley

Дата:

07 марта 2016 г., 10:57:36

On 7 March 2016 at 18:19, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Here I attached update patch with further changes,
> 1. Explain plan changes for parallel grouping

Perhaps someone might disagree with me, but I'm not all that sure I
really get the need for that. With nodeAgg.c we're doing something
fundamentally different in Partial mode as we are in Finalize mode,
that's why I wanted to give an indication of that in the explain.c
originally. A query with no Aggregate functions using nodeGroup.c
there is no special handling in the executor for partial and final
stages, so I really don't see why we need to give the impression that
there is in EXPLAIN.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

08 марта 2016 г., 01:15:54

On 5 March 2016 at 07:25, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
>> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
>> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
>> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
>> stage. I just thought doing this is more complex than what's really
>> needed, but if someone can think of a case where this would be a great
>> win then I'll listen, but you have to remember we don't have any
>> pre-sorted partial paths at this stage, so an explicit sort is
>> required *always*. This might change if someone invented partial btree
>> index scans... but until then...
>
> Actually, Rahila Syed is working on that.  But it's not done yet, so
> presumably will not go into 9.6.
>
> I don't really see the logic of this, though.  Currently, Gather
> destroys the input ordering, so it seems preferable for the
> finalize-aggregates stage to use a hash aggregate whenever possible,
> whatever the partial-aggregate stage did.  Otherwise, we need an
> explicit sort.  Anyway, it seems like the two stages should be costed
> and decided on their own merits - there's no reason to chain the two
> decisions together.

Thanks for looking at this.
I've attached an updated patch which re-bases the whole patch on top
of the upper planner changes which have just been committed.
In this version create_grouping_paths() does now consider mixed
strategies of hashed and sorted, although I have a few concerns with
the code that I've written. I'm solely posting this early to minimise
any duplicate work.

My concerns are:
1. Since there's no cheapest_partial_path in RelOptInfo the code is
currently considering every partial_path for parallel hash aggregate.
With normal aggregation we only ever use the cheapest path, so this
may not be future proof. As of today we do only have at most one
partial path in the list, but there's no reason to code this with that
assumption. I didn't put in much effort to improve this as I see code
in generate_gather_paths() which also makes assumptions about there
just being 1 partial path. Perhaps we should expand RelOptInfo to
track the cheapest partial path? or maybe allpaths.c should have a
function to fetch the cheapest out of the list?

2. In mixed parallel aggregate mode, when the query has no aggregate
functions, the code currently will use a nodeAgg for AGG_SORTED
strategy rather than a nodeGroup, as it would in serial agg mode. This
probably needs to be changed.

3. Nothing in create_grouping_paths() looks at the force_parallel_mode
GUC. I had a quick look at this GUC and was a bit surprised to see 3
possible states, but no explanation of what they do, so I've not added
code which pays attention to this setting yet. I'd imagine this is
just a matter of skipping serial path generation when parallel is
possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
what FORCE_PARALLEL_REGRESS is for yet.

The setrefs.c parts of the patch remain completely broken. I've not
had time to look at this again yet, sorry.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

parallel_aggregation_cc75f61_2016-03-08.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

08 марта 2016 г., 18:06:44

On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> My concerns are:
> 1. Since there's no cheapest_partial_path in RelOptInfo the code is
> currently considering every partial_path for parallel hash aggregate.
> With normal aggregation we only ever use the cheapest path, so this
> may not be future proof. As of today we do only have at most one
> partial path in the list, but there's no reason to code this with that
> assumption. I didn't put in much effort to improve this as I see code
> in generate_gather_paths() which also makes assumptions about there
> just being 1 partial path. Perhaps we should expand RelOptInfo to
> track the cheapest partial path? or maybe allpaths.c should have a
> function to fetch the cheapest out of the list?

The first one in the list will be the cheapest; why not just look at
that?  Sorted partial paths are interesting if some subsequent path
construction step can make use of that sort ordering, but they're
never interesting from the point of view of matching the query's
pathkeys because of the fact that Gather is order-destroying.

> 3. Nothing in create_grouping_paths() looks at the force_parallel_mode
> GUC. I had a quick look at this GUC and was a bit surprised to see 3
> possible states, but no explanation of what they do, so I've not added
> code which pays attention to this setting yet. I'd imagine this is
> just a matter of skipping serial path generation when parallel is
> possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
> what FORCE_PARALLEL_REGRESS is for yet.

The GUC is documented just like all the other GUCs are documented.
Maybe that's not enough, but I don't think "no explanation of what
they do" is accurate.  But I don't see why this patch should need to
care about force_parallel_mode at all.  force_parallel_mode is about
making queries that wouldn't choose to run in parallel do on their own
do so anyway, whereas this patch is about making more queries able to
do more work in parallel.

> The setrefs.c parts of the patch remain completely broken. I've not
> had time to look at this again yet, sorry.

I hope you get time soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

09 марта 2016 г., 00:26:22

On 9 March 2016 at 04:06, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> My concerns are:
>> 1. Since there's no cheapest_partial_path in RelOptInfo the code is
>> currently considering every partial_path for parallel hash aggregate.
>> With normal aggregation we only ever use the cheapest path, so this
>> may not be future proof. As of today we do only have at most one
>> partial path in the list, but there's no reason to code this with that
>> assumption. I didn't put in much effort to improve this as I see code
>> in generate_gather_paths() which also makes assumptions about there
>> just being 1 partial path. Perhaps we should expand RelOptInfo to
>> track the cheapest partial path? or maybe allpaths.c should have a
>> function to fetch the cheapest out of the list?
>
> The first one in the list will be the cheapest; why not just look at
> that?  Sorted partial paths are interesting if some subsequent path
> construction step can make use of that sort ordering, but they're
> never interesting from the point of view of matching the query's
> pathkeys because of the fact that Gather is order-destroying.

In this case a sorted partial path is useful as the partial agg node
sits below Gather. The sorted input is very interesting for the
partial agg node with a strategy of AGG_SORTED. In most cases with
parallel aggregate it's the partial stage that will take the most
time, so if we do get pre-sorted partial paths, this will be very good
indeed for parallel agg.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

09 марта 2016 г., 16:11:46

On Tue, Mar 8, 2016 at 4:26 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> The first one in the list will be the cheapest; why not just look at
>> that?  Sorted partial paths are interesting if some subsequent path
>> construction step can make use of that sort ordering, but they're
>> never interesting from the point of view of matching the query's
>> pathkeys because of the fact that Gather is order-destroying.
>
> In this case a sorted partial path is useful as the partial agg node
> sits below Gather. The sorted input is very interesting for the
> partial agg node with a strategy of AGG_SORTED. In most cases with
> parallel aggregate it's the partial stage that will take the most
> time, so if we do get pre-sorted partial paths, this will be very good
> indeed for parallel agg.

OK.  So then you probably want to consider the cheapest one, which
will be first.  And then, if there's one that has the pathkeys you
want, also consider that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

10 марта 2016 г., 03:58:57

On Mon, Mar 7, 2016 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Haribabu Kommi <kommi.haribabu@gmail.com> writes:
>> 2. Temporary fix for float aggregate types in _equalAggref because of
>> a change in aggtype to trans type, otherwise the parallel aggregation
>> plan failure in set_plan_references. whenever the aggtype is not matched,
>> it verifies with the trans type also.
>
> That is a completely unacceptable kluge.  Quite aside from being ugly as
> sin, it probably breaks more things than it fixes, first because it breaks
> the fundamental semantics of equal() across the board, and second because
> it puts catalog lookups into equal(), which *will* cause problems.  You
> can not expect that this will get committed, not even as a "temporary fix".

I am not able to find a better solution to handle this problem, i will provide
the details of the problem and why I did the change, so if you can provide
some point where to look into, that would be helpful.

In parallel aggregate, as the aggregate operation is divided into two steps
such as finalize and partial aggregate. The partial aggregate is executed
in the worker and returns the results of transition data which is of type
aggtranstype. This can work fine even if we don't change the targetlist
aggref return type from aggtype to aggtranstype for aggregates whose
aggtype is a variable length data type. The output slot that is generated
with variable length type, so even if we send the aggtrantype data that
is also a variable length, this can work.

But when it comes to the float aggregates, the aggtype is a fixed length
and aggtranstype is a variable length data type. so if we try to change
the aggtype of a aggref in set_plan_references function with aggtrantype
only the partial aggregate targetlist is getting changed, because the
set_plan_references works from top of the plan.

To avoid this problem, I changed the target list type during the partial
aggregate path generation itself and that leads to failure in _equalAggref
function in set_plan_references. Because of which I put the temporary
fix.

Do you have any point in handling this problem?

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

10 марта 2016 г., 14:42:37

On 9 March 2016 at 04:06, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> 3. Nothing in create_grouping_paths() looks at the force_parallel_mode
>> GUC. I had a quick look at this GUC and was a bit surprised to see 3
>> possible states, but no explanation of what they do, so I've not added
>> code which pays attention to this setting yet. I'd imagine this is
>> just a matter of skipping serial path generation when parallel is
>> possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
>> what FORCE_PARALLEL_REGRESS is for yet.
>
> The GUC is documented just like all the other GUCs are documented.
> Maybe that's not enough, but I don't think "no explanation of what
> they do" is accurate.  But I don't see why this patch should need to
> care about force_parallel_mode at all.  force_parallel_mode is about
> making queries that wouldn't choose to run in parallel do on their own
> do so anyway, whereas this patch is about making more queries able to
> do more work in parallel.

Hmm, it appears I only looked as far as the enum declaration, which I
expected to have something. Perhaps I'm just not used to looking up
the manual for things relating to code.

The one reason that I asked about force_parallel_mode was that I
assumed there was some buildfarm member running somewhere that
switches this on and runs the regression tests. I figured that if it
exists for other parallel features, then it probably should for this
too. Can you explain why you think this should be handled differently?

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

10 марта 2016 г., 16:56:08

On Thu, Mar 10, 2016 at 6:42 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Hmm, it appears I only looked as far as the enum declaration, which I
> expected to have something. Perhaps I'm just not used to looking up
> the manual for things relating to code.

I don't mind adding some comments there, it just didn't seem all that
important to me.  Feel free to propose something.

> The one reason that I asked about force_parallel_mode was that I
> assumed there was some buildfarm member running somewhere that
> switches this on and runs the regression tests. I figured that if it
> exists for other parallel features, then it probably should for this
> too. Can you explain why you think this should be handled differently?

Yeah, I think Noah set up such a buildfarm member, but I can't
remember the name of it off-hand.

I think running the tests with this patch and
force_parallel_mode=regress, max_parallel_degree>0 is a good idea, but
I don't expect it to turn up too much.  That configuration is mostly
designed to test whether the basic parallelism infrastructure works or
breaks things.  It's not intended to test whether your parallel query
plans are any good - you have to write your own tests for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Amit Langote

Дата:

10 марта 2016 г., 17:09:08

On Thu, Mar 10, 2016 at 10:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 10, 2016 at 6:42 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> The one reason that I asked about force_parallel_mode was that I
>> assumed there was some buildfarm member running somewhere that
>> switches this on and runs the regression tests. I figured that if it
>> exists for other parallel features, then it probably should for this
>> too. Can you explain why you think this should be handled differently?
>
> Yeah, I think Noah set up such a buildfarm member, but I can't
> remember the name of it off-hand.

mandrill [1]?

Thanks,
Amit

[1] http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=mandrill&br=HEAD

Re: Parallel Aggregate

От

David Rowley

Дата:

10 марта 2016 г., 17:39:36

On 8 March 2016 at 11:15, David Rowley <david.rowley@2ndquadrant.com> wrote:
> The setrefs.c parts of the patch remain completely broken. I've not
> had time to look at this again yet, sorry.

Ok, so again, apologies for previously sending such a broken patch.
I've since managed to free up a bit of time to work on this, which now
consists of a good bit more than the sum total of my weekly lunch
hours.

The attached patch is a much more complete patch, and quite possible
now review worthy.

I set about solving the setrefs.c problem by inventing a PartialAggref
node type which wraps up Aggrefs when they're in a agg node which has
finalizeAggs = false. This node type exists all the way until executor
init time, where it's then removed and replaced with the underlying
Aggref. This seems to solve the targetlist return type issue.
I'd really like some feedback on this method of solving that problem.

I've also fixed numerous other bugs, including the HAVING clause problem.

Things left to do:
1. Make a final pass of the patch not at 3am.
2. Write some tests.
3. I've probably missed a few places that should handle T_PartialAggref.

A couple of things which I'm not 100% happy with.

1. make_partialgroup_input_target() doing lookups to the syscache.
Perhaps this job can be offloaded to a new function in a more suitable
location. Ideally the Aggref would already store the required
information, but I don't see a great place to be looking that up.
2. I don't really like how I had to add tlist to
create_grouping_paths(), but I didn't want to go to the trouble of
calculating the partial agg PathTarget if Parallel Aggregation is not
possible, as this does do syscache lookups, so it's not free, so I'd
rather only do it if we're actually going to add some parallel paths.
3. Something about the force_parallel_mode GUC. I'll think about this
when I start to think about how to test this, as likely I'll need
that, else I'd have to create tables bigger than we'd want to in the
regression tests.

I've also attached an .sql file with an aggregate function aimed to
test the new PartialAggref stuff works properly, as previously it
seemed to work by accident with sum(int).

Just some numbers to maybe make this more interesting:

create table t (num int not null, one int not null, ten int not null,
hundred int not null, thousand int not null, tenk int not null,
hundredk int not null, million int not null);
insert into t select
x,1,x%10+1,x%100+1,x%1000+1,x%10000+1,x%100000+1,x%1000000 from
generate_series(1,10000000)x(x);

-- Serial Plan
# explain select sum(num) from t;
                            QUERY PLAN
-------------------------------------------------------------------
 Aggregate  (cost=198530.00..198530.01 rows=1 width=8)
   ->  Seq Scan on t  (cost=0.00..173530.00 rows=10000000 width=4)

# select sum(num) from t;
      sum
----------------
 50000005000000
(1 row)
Time: 1036.119 ms

# set max_parallel_degree=4;

-- Parallel Plan
# explain select sum(num) from t;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=105780.52..105780.53 rows=1 width=8)
   ->  Gather  (cost=105780.00..105780.51 rows=5 width=8)
         Number of Workers: 4
         ->  Partial Aggregate  (cost=104780.00..104780.01 rows=1 width=8)
               ->  Parallel Seq Scan on t  (cost=0.00..98530.00
rows=2500000 width=4)
(5 rows)

# select sum(num) from t;
      sum
----------------
 50000005000000
(1 row)

Time: 379.117 ms

I'll try and throw a bit parallel aggregate work to a 4 socket / 64
core server which I have access to... just for fun.

Reviews are now welcome.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Parallel Aggregate

От

David Rowley

Дата:

12 марта 2016 г., 06:31:41

On 11 March 2016 at 03:39, David Rowley <david.rowley@2ndquadrant.com> wrote:
> A couple of things which I'm not 100% happy with.
>
> 1. make_partialgroup_input_target() doing lookups to the syscache.
> Perhaps this job can be offloaded to a new function in a more suitable
> location. Ideally the Aggref would already store the required
> information, but I don't see a great place to be looking that up.

I've made some changes and moved this work off to a new function in tlist.c.

> 2. I don't really like how I had to add tlist to
> create_grouping_paths(), but I didn't want to go to the trouble of
> calculating the partial agg PathTarget if Parallel Aggregation is not
> possible, as this does do syscache lookups, so it's not free, so I'd
> rather only do it if we're actually going to add some parallel paths.

This is now fixed. The solution was much easier after 49635d7b.

> 3. Something about the force_parallel_mode GUC. I'll think about this
> when I start to think about how to test this, as likely I'll need
> that, else I'd have to create tables bigger than we'd want to in the
> regression tests.

On further analysis it seems that this GUC does not do what I thought
it did, which will be why Robert said that I don't need to think about
it here. The GUC just seems to add a Gather node at the base of the
plan tree, when possible. Which leaves me a bit lost when it comes to
how to write tests for this... It seems like I need to add at least
136k rows to a test table to get a Parallel Aggregate plan, which I
think is a no-go for the regression test suite... that's with
parallel_setup_cost=0;

It would be nice if that GUC just, when enabled, preferred the
cheapest parallel plan (when available), rather than hacking in a
Gather node into the plan's root. This should have the same result in
many cases anyway, and would allow me to test this without generating
oversized tables in the regression tests.

I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.

This patch also fixes a couple of bugs, one in the cost estimates for
the number of groups that will be produced by the final aggregate,
also there was a missing copyObject() in the setrefs.c code which
caused a Var not found in targetlist problem in setrefs.c for plans
with more than 1 partial aggregate node... I had to modify the planner
to get it to add an additional aggregate node to test this (separate
test patch for this is attached).

Comments/Reviews/Testing all welcome.

Thanks

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Parallel Aggregate

От

David Rowley

Дата:

14 марта 2016 г., 00:45:04

On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've attached an updated patch which is based on commit 7087166,
> things are really changing fast in the grouping path area at the
> moment, but hopefully the dust is starting to settle now.

The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

parallel_aggregation_015971a_2016-03-14.patch

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

14 марта 2016 г., 03:33:44

On Mon, Mar 14, 2016 at 8:44 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.

The setrefs.c fix for updating the finalize-aggregate target list is nice.
I tested all the float aggregates and are working fine.

Overall the patch is fine. I will do some test and provide the update later.

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

James Sewell

Дата:

14 марта 2016 г., 04:16:51

Hi,

I've done some testing with one of my data sets in an 8VPU virtual environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)

FROM fact_agg_2015_12

GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25 million records.

Explain with a max_parallel_degree of 8 tells me that the query will only use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)

Group Key: (date_trunc('DAY'::text, pageview))

-> Gather (cost=765878.46..808069.86 rows=414512 width=16)

Number of Workers: 6

-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16)

Group Key: date_trunc('DAY'::text, pageview)

-> Parallel Seq Scan on fact_agg_2015_12 (cost=0.00..743769.76 rows=4221741 width=12)

I am getting the following timings (everything was cached before I started tested). I didn't average the runtime, but I ran each one three times and took the middle value.

max_parallel_degree runtime

0 11693.537 ms

1 6387.937 ms

2 4328.629 ms

3 3292.376 ms

4 2743.148 ms

5 2278.449 ms

6 2000.599 ms

I'm pretty happy!

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Mon, Mar 14, 2016 at 8:44 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've attached an updated patch which is based on commit 7087166,
> things are really changing fast in the grouping path area at the
> moment, but hopefully the dust is starting to settle now.

The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Parallel Aggregate

От

David Rowley

Дата:

14 марта 2016 г., 04:30:43

On 14 March 2016 at 14:16, James Sewell <james.sewell@lisasoft.com> wrote:

I've done some testing with one of my data sets in an 8VPU virtual environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25 million records.

Explain with a max_parallel_degree of 8 tells me that the query will only use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
Group Key: (date_trunc('DAY'::text, pageview))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16)
Group Key: date_trunc('DAY'::text, pageview)
-> Parallel Seq Scan on fact_agg_2015_12 (cost=0.00..743769.76 rows=4221741 width=12)

Great! Thanks for testing this.

If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual number of Gather rows come out at 105? I'd just like to get an idea of my cost estimate for the Gather are going to be accurate for real world data sets.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

14 марта 2016 г., 04:52:47

Hi,

Happy to test, really looking forward to seeing this stuff in core.

The explain analyze is below:

Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16) (actual time=2282.092..2282.202 rows=15 loops=1)

Group Key: (date_trunc('DAY'::text, pageview_start_tstamp))

-> Gather (cost=765878.46..808069.86 rows=414512 width=16) (actual time=2281.749..2282.060 rows=105 loops=1)

Number of Workers: 6

-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16) (actual time=2276.879..2277.030 rows=15 loops=7)

Group Key: date_trunc('DAY'::text, pageview_start_tstamp)

-> Parallel Seq Scan on celebrus_fact_agg_1_p2015_12 (cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631

.650 rows=3618887 loops=7)

One question - how is the upper limit of workers chosen?

James Sewell,
Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Mon, Mar 14, 2016 at 12:30 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 14 March 2016 at 14:16, James Sewell <james.sewell@lisasoft.com> wrote:
I've done some testing with one of my data sets in an 8VPU virtual environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25 million records.

Explain with a max_parallel_degree of 8 tells me that the query will only use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
Group Key: (date_trunc('DAY'::text, pageview))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16)
Group Key: date_trunc('DAY'::text, pageview)
-> Parallel Seq Scan on fact_agg_2015_12 (cost=0.00..743769.76 rows=4221741 width=12)

Great! Thanks for testing this.

If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual number of Gather rows come out at 105? I'd just like to get an idea of my cost estimate for the Gather are going to be accurate for real world data sets.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

14 марта 2016 г., 05:31:06

On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
> One question - how is the upper limit of workers chosen?

See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.

There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation.  This probably needs to be a good deal more sophisticated, but we
* need something here for now.
*/

So this will likely see some revision at some point, after 9.6.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

14 марта 2016 г., 06:39:09

Cool,

I've been testing how this works with partitioning (which seems to be strange, but I'll post separately about that) and something odd seems to be going on now with the parallel triggering:

postgres=# create table a as select * from base_p2015_11;

SELECT 20000000

postgres=# select * from a limit 1;

ts | count | a | b | c | d | e

----------------------------+-------+-----+------+------+------+---

2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 |

(1 row)

postgres-# \d a

Table "datamart_owner.a"

Column | Type | Modifiers

--------+-----------------------------+-----------

ts | timestamp without time zone |

count | integer |

a | integer |

b | integer |

c | integer |

d | integer |

e | integer |

postgres=# select pg_size_pretty(pg_relation_size('a'));

pg_size_pretty

----------------

1149 MB

postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);

QUERY PLAN

----------------------------------------------------------------------------------------------

Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16)

Group Key: (date_trunc('DAY'::text, ts))

-> Sort (cost=218242.96..218245.96 rows=1200 width=16)

Sort Key: (date_trunc('DAY'::text, ts))

-> Gather (cost=218059.08..218181.58 rows=1200 width=16)

Number of Workers: 5

-> Partial HashAggregate (cost=217059.08..217061.58 rows=200 width=16)

Group Key: date_trunc('DAY'::text, ts)

-> Parallel Seq Scan on a (cost=0.00..197059.06 rows=4000005 width=12)

(9 rows)

postgres=# analyze a;

postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);

QUERY PLAN

--------------------------------------------------------------------------

GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16)

Group Key: (date_trunc('DAY'::text, ts))

-> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12)

Sort Key: (date_trunc('DAY'::text, ts))

-> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12)

(5 rows)

Unsure what's happening here.

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
> One question - how is the upper limit of workers chosen?

See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.

There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation. This probably needs to be a good deal more sophisticated, but we
* need something here for now.
*/

So this will likely see some revision at some point, after 9.6.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

14 марта 2016 г., 07:05:39

On 14 March 2016 at 16:39, James Sewell <james.sewell@lisasoft.com> wrote:
>
> I've been testing how this works with partitioning (which seems to be strange, but I'll post separately about that)
andsomething odd seems to be going on now with the parallel triggering:

>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 20000000
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>                                           QUERY PLAN
> ----------------------------------------------------------------------------------------------
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>    Group Key: (date_trunc('DAY'::text, ts))
>    ->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>          Sort Key: (date_trunc('DAY'::text, ts))
>          ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>                Number of Workers: 5
>                ->  Partial HashAggregate  (cost=217059.08..217061.58 rows=200 width=16)
>                      Group Key: date_trunc('DAY'::text, ts)
>                      ->  Parallel Seq Scan on a  (cost=0.00..197059.06 rows=4000005 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>                                 QUERY PLAN
> --------------------------------------------------------------------------
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=20000024 width=16)
>    Group Key: (date_trunc('DAY'::text, ts))
>    ->  Sort  (cost=3164211.55..3214211.61 rows=20000024 width=12)
>          Sort Key: (date_trunc('DAY'::text, ts))
>          ->  Seq Scan on a  (cost=0.00..397059.30 rows=20000024 width=12)
> (5 rows)
>
> Unsure what's happening here.

This just comes down to the fact that PostgreSQL is quite poor at
estimating the number of groups that will be produced by the
expression date_trunc('DAY',ts). Due to lack of stats when you run the
query before ANALYZE, PostgreSQL just uses a hardcoded guess of 200,
which it thinks will fit quite nicely in the HashAggregate node's hash
table. After you run ANALYZE this estimate goes up to 20000024, and
the grouping planner thinks that's a little to much be storing in a
hash table, based on the size of your your work_mem setting, so it
uses a Sort plan instead.

Things to try:
1. alter table a add column ts_date date; update a set ts_date =
date_trunc('DAY',ts); vacuum full analyze ts;
2. or, create index on a (date_trunc('DAY',ts)); analyze a;
3. or for testing, set the work_mem higher.

So, basically, it's no fault of this patch. It's just there's no real
good way for the planner to go estimating something like
date_trunc('DAY',ts) without either adding a column which explicitly
stores that value (1), or collecting stats on the expression (2), or
teaching the planner about the internals of that function, which is
likely just never going to happen. (3) is just going to make the
outlook of a hash plan look a little brighter, although you'll likely
need a work_mem of over 1GB to make the plan change.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

14 марта 2016 г., 07:05:49

Hi again,

I've been playing around with inheritance combined with this patch. Currently it looks like you are taking max(parallel_degree) from all the child tables and using that for the number of workers.

For large machines it makes much more sense to use sum(parallel_degree) - but I've just seen this comment in the code:

* Decide what parallel degree to request for this append path. For

* now, we just use the maximum parallel degree of any member. It

* might be useful to use a higher number if the Append node were

* smart enough to spread out the workers, but it currently isn't.

Does this mean that even though we are aggregating in parallel, we are only operating on one child table at a time currently?

Cheers,

James Sewell,
Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Mon, Mar 14, 2016 at 2:39 PM, James Sewell <james.sewell@lisasoft.com> wrote:

Cool,

I've been testing how this works with partitioning (which seems to be strange, but I'll post separately about that) and something odd seems to be going on now with the parallel triggering:

postgres=# create table a as select * from base_p2015_11;
SELECT 20000000

postgres=# select * from a limit 1;
ts | count | a | b | c | d | e
----------------------------+-------+-----+------+------+------+---
2015-11-26 21:10:04.856828 | 860 | 946 | 1032 | 1118 | 1204 |
(1 row)

postgres-# \d a
Table "datamart_owner.a"
Column | Type | Modifiers
--------+-----------------------------+-----------
ts | timestamp without time zone |
count | integer |
a | integer |
b | integer |
c | integer |
d | integer |
e | integer |

postgres=# select pg_size_pretty(pg_relation_size('a'));
pg_size_pretty
----------------
1149 MB

postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
----------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=218242.96..218245.96 rows=1200 width=16)
Sort Key: (date_trunc('DAY'::text, ts))
-> Gather (cost=218059.08..218181.58 rows=1200 width=16)
Number of Workers: 5
-> Partial HashAggregate (cost=217059.08..217061.58 rows=200 width=16)
Group Key: date_trunc('DAY'::text, ts)
-> Parallel Seq Scan on a (cost=0.00..197059.06 rows=4000005 width=12)
(9 rows)

postgres=# analyze a;

postgres=# explain select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--------------------------------------------------------------------------
GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16)
Group Key: (date_trunc('DAY'::text, ts))
-> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12)
Sort Key: (date_trunc('DAY'::text, ts))
-> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12)
(5 rows)

Unsure what's happening here.

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
> One question - how is the upper limit of workers chosen?

See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.

There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation. This probably needs to be a good deal more sophisticated, but we
* need something here for now.
*/

So this will likely see some revision at some point, after 9.6.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

14 марта 2016 г., 07:19:05

On Mon, Mar 14, 2016 at 3:05 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

Things to try:
1. alter table a add column ts_date date; update a set ts_date =
date_trunc('DAY',ts); vacuum full analyze ts;
2. or, create index on a (date_trunc('DAY',ts)); analyze a;
3. or for testing, set the work_mem higher.

Ah, that makes sense.

Tried with a BTREE index, and it works as perfectly but the index is 428MB - which is a bit rough.

Removed that and put on a BRIN index, same result for 48kB - perfect!

Thanks for the help,

James

Re: Parallel Aggregate

От

David Rowley

Дата:

14 марта 2016 г., 07:21:03

On 14 March 2016 at 17:05, James Sewell <james.sewell@lisasoft.com> wrote:
>
> Hi again,
>
> I've been playing around with inheritance combined with this patch. Currently it looks like you are taking
max(parallel_degree)from all the child tables and using that for the number of workers.
 
>
> For large machines it makes much more sense to use sum(parallel_degree) - but I've just seen this comment in the
code:
>
>     /*
>      * Decide what parallel degree to request for this append path.  For
>      * now, we just use the maximum parallel degree of any member.  It
>      * might be useful to use a higher number if the Append node were
>      * smart enough to spread out the workers, but it currently isn't.
>      */
>
> Does this mean that even though we are aggregating in parallel, we are only operating on one child table at a time
currently?

There is nothing in the planner yet, or any patch that I know of to
push the Partial Aggregate node to below an Append node. That will
most likely come in 9.7.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

14 марта 2016 г., 22:53:53

On Sun, Mar 13, 2016 at 5:44 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.

I haven't fully studied every line of this yet, but here are a few comments:

+               case T_PartialAggref:
+                       coll = InvalidOid; /* XXX is this correct? */
+                       break;

I doubt it.  More generally, why are we inventing PartialAggref
instead of reusing Aggref?  The code comments seem to contain no
indication as to why we shouldn't need all the same details for
PartialAggref that we do for Aggref, instead of only a small subset of
them.  Hmm... actually, it looks like PartialAggref is intended to
wrap Aggref, but that seems like a weird design.  Why not make Aggref
itself DTRT?   There's not enough explanation in the patch of what is
going on here and why.
               }
+               if (can_parallel)
+               {

Seems like a blank line would be in order.

I don't see where this applies has_parallel_hazard or anything
comparable to the aggregate functions.  I think it needs to do that.

+       /* XXX +1 ? do we expect the main process to actually do real work? */
+       numPartialGroups = Min(numGroups, subpath->rows) *
+                                               (subpath->parallel_degree + 1);

I'd leave out the + 1, but I don't think it matters much.

+                                       aggstate->finalizeAggs == true)

We usually just say if (a) not if (a == true) when it's a boolean.
Similarly !a rather than a == false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Paul Ramsey

Дата:

14 марта 2016 г., 22:56:20

On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
>> One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.

Does the cost of the aggregate function come into this calculation at
all? In PostGIS land, much smaller numbers of rows can generate loads
that would be effective to parallelize (worker time much >> than
startup cost).

P

Re: Parallel Aggregate

От

Robert Haas

Дата:

14 марта 2016 г., 23:54:52

On Mon, Mar 14, 2016 at 3:56 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> On Sun, Mar 13, 2016 at 7:31 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 14 March 2016 at 14:52, James Sewell <james.sewell@lisasoft.com> wrote:
>>> One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>
> Does the cost of the aggregate function come into this calculation at
> all? In PostGIS land, much smaller numbers of rows can generate loads
> that would be effective to parallelize (worker time much >> than
> startup cost).

Unfortunately, no - only the table size.  This is a problem, and needs
to be fixed.  However, it's probably not going to get fixed for 9.6.
:-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

James Sewell

Дата:

15 марта 2016 г., 01:24:46

On Tuesday, 15 March 2016, Robert Haas <robertmhaas@gmail.com> wrote:

> Does the cost of the aggregate function come into this calculation at
> all? In PostGIS land, much smaller numbers of rows can generate loads
> that would be effective to parallelize (worker time much >> than
> startup cost).

Unfortunately, no - only the table size. This is a problem, and needs
to be fixed. However, it's probably not going to get fixed for 9.6.
:-(

Any chance of getting a GUC (say min_parallel_degree) added to allow setting the initial value of parallel_degree, then changing the small relation check to also pass if parallel_degree > 1?

That way you could set min_parallel_degree on a query by query basis if you are running aggregates which you know will take a lot of CPU.

I suppose it wouldn't make much sense at all to set globally though, so it could just confuse matters.

Cheers,

Re: Parallel Aggregate

От

Robert Haas

Дата:

15 марта 2016 г., 01:32:16

On Mon, Mar 14, 2016 at 6:24 PM, James Sewell <james.sewell@lisasoft.com> wrote:
> Any chance of getting a GUC (say min_parallel_degree) added to allow setting
> the initial value of parallel_degree, then changing the small relation check
> to also pass if parallel_degree > 1?
>
> That way you could set min_parallel_degree on a query by query basis if you
> are running aggregates which you know will take a lot of CPU.
>
> I suppose it wouldn't make much sense at all to set globally though, so it
> could just confuse matters.

I kind of doubt this would work well, but somebody could write a patch
for it and try it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Tomas Vondra

Дата:

15 марта 2016 г., 01:39:44

Hi,

On 03/13/2016 10:44 PM, David Rowley wrote:
> On 12 March 2016 at 16:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a
> possible uninitialised variable.

I've looked at this patch today. The patch seems quite solid, but I do 
have a few minor comments (or perhaps questions, given that this is the 
first time I looked at the patch).

1) exprCollation contains this bit:
-----------------------------------
    case T_PartialAggref:        coll = InvalidOid; /* XXX is this correct? */        break;

I doubt this is the right thing to do. Can we actually get to this piece 
of code? I haven't tried too hard, but regression tests don't seem to 
trigger this piece of code.

Moreover, if we're handling PartialAggref in exprCollation(), maybe we 
should handle it also in exprInputCollation and exprSetCollation?

And if we really need the collation there, why not to fetch the actual 
collation from the nested Aggref? Why should it be InvalidOid?

2) partial_aggregate_walker
---------------------------

I think this should follow the naming convention that clearly identifies 
the purpose of the walker, not what kind of nodes it is supposed to 
walk. So it should be:
    aggregates_allow_partial_walker

3) create_parallelagg_path
--------------------------

I do agree with the logic that under-estimates are more dangerous than 
over-estimates, so the current estimate is safer. But I think this would 
be a good place to apply the formula I proposed a few days ago (or 
rather the one Dean Rasheed proposed in response).

That is, we do know that there are numGroups in total, and each parallel 
worker sees subpath->rows then it's expected to see
    sel = (subpath->rows / rel->tuples);    perGroup = (rel->tuples / numGroups);    workerGroups = numGroups * (1 -
powl(1- s, perGroup));    numPartialGroups = numWorkers * workerGroups

It's probably better to see Dean's message from 13/3.

4) Is clauses.h the right place for PartialAggType?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

15 марта 2016 г., 02:52:40

On Tue, Mar 15, 2016 at 9:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I kind of doubt this would work well, but somebody could write a patch
for it and try it out.

OK I'll give this a go today and report back.

Would the eventual plan be to use pg_proc.procost for the functions from each aggregate concerned? If so I might have a peek at that too, although I imagine I won't get far.

Cheers,

Re: Parallel Aggregate

От

David Rowley

Дата:

15 марта 2016 г., 02:56:37

On 15 March 2016 at 08:53, Robert Haas <robertmhaas@gmail.com> wrote:
> I haven't fully studied every line of this yet, but here are a few comments:
>
> +               case T_PartialAggref:
> +                       coll = InvalidOid; /* XXX is this correct? */
> +                       break;
>
> I doubt it.

Thanks for looking at this.

Yeah, I wasn't so sure of the collation thing either, so stuck a
reminder on there. The way I'm seeing it at the moment is that since
the partial aggregate is never displayed to the user, and we never
perform equality comparison on them (since HAVING is applied in the
final aggregate stage), then my line of thought was that the collation
should not matter. Over on the combine aggregate states thread I'm
doing work to make the standard serialize functions use bytea, and
bytea don't allow collate:

# create table c (b bytea collate "en_NZ");
ERROR:  collations are not supported by type bytea
LINE 1: create table c (b bytea collate "en_NZ");

I previously did think of reusing the Aggref's collation, but I ended
up leaning towards the more "does not matter" side of the argument. Of
course, I may have completely failed to think of some important
reason, which is why I left that comment, so it might provoke some
thought with someone else with more collation knowledge.

> More generally, why are we inventing PartialAggref
> instead of reusing Aggref?  The code comments seem to contain no
> indication as to why we shouldn't need all the same details for
> PartialAggref that we do for Aggref, instead of only a small subset of
> them.  Hmm... actually, it looks like PartialAggref is intended to
> wrap Aggref, but that seems like a weird design.  Why not make Aggref
> itself DTRT?   There's not enough explanation in the patch of what is
> going on here and why.

A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become.

/** PartialAggref** When partial aggregation is required in a plan, the nodes from the partial* aggregate node, up
untilthe finalize aggregate node must pass the partially* aggregated states up the plan tree. In regards to target list
construction*in setrefs.c, this requires that exprType() returns the state's type rather* than the final aggregate
value'stype, and since exprType() for Aggref is* coded to return the aggtype, this is not correct for us. We can't fix
this*by going around modifying the Aggref to change it's return type as setrefs.c* requires searching for that Aggref
usingequals() which compares all fields* in Aggref, and changing the aggtype would cause such a comparison to fail.* To
getaround this problem we wrap the Aggref up in a PartialAggref, this* allows exprType() to return the correct type and
wecan handle a* PartialAggref in setrefs.c by just peeking inside the PartialAggref to check* the underlying Aggref.
ThePartialAggref lives as long as executor start-up,* where it's removed and replaced with it's underlying Aggref.*/

typedef struct PartialAggref

does that help explain it better?

>                 }
> +               if (can_parallel)
> +               {
>
> Seems like a blank line would be in order.

Fixed.

> I don't see where this applies has_parallel_hazard or anything
> comparable to the aggregate functions.  I think it needs to do that.

Not sure what you mean here.

>
> +       /* XXX +1 ? do we expect the main process to actually do real work? */
> +       numPartialGroups = Min(numGroups, subpath->rows) *
> +                                               (subpath->parallel_degree + 1);
> I'd leave out the + 1, but I don't think it matters much.

Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.

In the meantime I've removed the + 1, as it's not correct to do
subpath->rows * (subpath->parallel_degree + 1), since it was divided
by subpath->parallel_degree in the first place, we'd end up with an
extra worker's worth of rows for queries which estimate a larger
number of groups than partial path rows.

> +                                       aggstate->finalizeAggs == true)
>
> We usually just say if (a) not if (a == true) when it's a boolean.
> Similarly !a rather than a == false.

hmm, thanks. It appears that I've not been all that consistent in that
area. I didn't know that was convention. I see that some of my way
have crept into the explain.c changes already :/

I will send an updated patch once I address Tomas' concerns too.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

15 марта 2016 г., 03:48:22

On 15 March 2016 at 11:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> I've looked at this patch today. The patch seems quite solid, but I do have
> a few minor comments (or perhaps questions, given that this is the first
> time I looked at the patch).
>
> 1) exprCollation contains this bit:
> -----------------------------------
>
>     case T_PartialAggref:
>         coll = InvalidOid; /* XXX is this correct? */
>         break;
>
> I doubt this is the right thing to do. Can we actually get to this piece of
> code? I haven't tried too hard, but regression tests don't seem to trigger
> this piece of code.

Thanks for looking at this.

Yeah, it's there because it it being called during setrefs.c in
makeVarFromTargetEntry() via fix_combine_agg_expr_mutator(), so it's
required when building the new target list for Aggref->args to point
to the underlying Aggref's Var.

As for the collation, I'm still not convinced if it's right or wrong.
I know offlist you mentioned about string_agg() and sorting, but
there' no code that'll sort the agg's state. The only possible thing
that gets sorted there is the group by key.

> Moreover, if we're handling PartialAggref in exprCollation(), maybe we
> should handle it also in exprInputCollation and exprSetCollation?

hmm, maybe, that I'm not sure about. I don't see where we'd call
exprSetCollation() for this, but I think I need to look at
exprInputCollation()

> And if we really need the collation there, why not to fetch the actual
> collation from the nested Aggref? Why should it be InvalidOid?

It seems quite random to me to do that. If the trans type is bytea,
why would it be useful to inherit the collation from the aggregate?
I'm not confident I'm right with InvalidOId... I just don't think we
can pretend the collation is the same as the Aggref's.

> 2) partial_aggregate_walker
> ---------------------------
>
> I think this should follow the naming convention that clearly identifies the
> purpose of the walker, not what kind of nodes it is supposed to walk. So it
> should be:
>
>     aggregates_allow_partial_walker

Agreed and changed.

> 3) create_parallelagg_path
> --------------------------
>
> I do agree with the logic that under-estimates are more dangerous than
> over-estimates, so the current estimate is safer. But I think this would be
> a good place to apply the formula I proposed a few days ago (or rather the
> one Dean Rasheed proposed in response).
>
> That is, we do know that there are numGroups in total, and each parallel
> worker sees subpath->rows then it's expected to see
>
>     sel = (subpath->rows / rel->tuples);
>     perGroup = (rel->tuples / numGroups);
>     workerGroups = numGroups * (1 - powl(1 - s, perGroup));
>     numPartialGroups = numWorkers * workerGroups
>
> It's probably better to see Dean's message from 13/3.

I think what I have works well when there's a small number of groups,
as there's a good chance that each worker will see at least 1 tuple
from each group. However I understand that will become increasingly
unlikely with a larger number of groups, which is why I capped it to
the total input rows, but even in cases before that cap is reached I
think it will still overestimate. I'd need to analyze the code above
to understand it better, but my initial reaction is that, you're
probably right, but I don't think I want to inherit the fight for
this. Perhaps it's better to wait until GROUP BY estimate improvement
patch gets in, and change this, or if this gets in first, then you can
include this change in your patch. I'm not trying to brush off the
work, I just would rather it didn't delay parallel aggregate.

> 4) Is clauses.h the right place for PartialAggType?

I'm not sure that it is to be honest. I just put it there because the
patch never persisted the value of a PartialAggType in any struct
field anywhere and checks it later in some other file. In all places
where we use PartialAggType we're also calling
aggregates_allow_partial(), which does require clauses.h. So that's
why it ended up there... I think I'll leave it there until someone
gives me a good reason to move it.

An updated patch will follow soon.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

15 марта 2016 г., 04:19:03

On 15 March 2016 at 11:24, James Sewell <james.sewell@lisasoft.com> wrote:
> On Tuesday, 15 March 2016, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> > Does the cost of the aggregate function come into this calculation at
>> > all? In PostGIS land, much smaller numbers of rows can generate loads
>> > that would be effective to parallelize (worker time much >> than
>> > startup cost).
>>
>> Unfortunately, no - only the table size.  This is a problem, and needs
>> to be fixed.  However, it's probably not going to get fixed for 9.6.
>> :-(
>
>
> Any chance of getting a GUC (say min_parallel_degree) added to allow setting
> the initial value of parallel_degree, then changing the small relation check
> to also pass if parallel_degree > 1?
>
> That way you could set min_parallel_degree on a query by query basis if you
> are running aggregates which you know will take a lot of CPU.
>
> I suppose it wouldn't make much sense at all to set globally though, so it
> could just confuse matters.

I agree that it would be nice to have more influence on this decision,
but let's start a new thread for that. I don't want this one getting
bloated with debates on that. It's not code I'm planning on going
anywhere near for this patch.

I'll start a thread...

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

15 марта 2016 г., 16:40:24

On 15 March 2016 at 13:48, David Rowley <david.rowley@2ndquadrant.com> wrote:
> An updated patch will follow soon.

I've attached an updated patch which addresses some of Robert's and
Tomas' concerns.
I've not done anything about the exprCollation() yet, as I'm still
unsure of what it should do. I just don't see why returning the
Aggref's collation is correct, and we have nothing else to return.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

15 марта 2016 г., 23:23:32

On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> More generally, why are we inventing PartialAggref
>> instead of reusing Aggref?  The code comments seem to contain no
>> indication as to why we shouldn't need all the same details for
>> PartialAggref that we do for Aggref, instead of only a small subset of
>> them.  Hmm... actually, it looks like PartialAggref is intended to
>> wrap Aggref, but that seems like a weird design.  Why not make Aggref
>> itself DTRT?   There's not enough explanation in the patch of what is
>> going on here and why.
>
> A comment does explain this, but perhaps it's not good enough, so I've
> rewritten it to become.
>
> /*
>  * PartialAggref
>  *
>  * When partial aggregation is required in a plan, the nodes from the partial
>  * aggregate node, up until the finalize aggregate node must pass the partially
>  * aggregated states up the plan tree. In regards to target list construction
>  * in setrefs.c, this requires that exprType() returns the state's type rather
>  * than the final aggregate value's type, and since exprType() for Aggref is
>  * coded to return the aggtype, this is not correct for us. We can't fix this
>  * by going around modifying the Aggref to change it's return type as setrefs.c
>  * requires searching for that Aggref using equals() which compares all fields
>  * in Aggref, and changing the aggtype would cause such a comparison to fail.
>  * To get around this problem we wrap the Aggref up in a PartialAggref, this
>  * allows exprType() to return the correct type and we can handle a
>  * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
>  * the underlying Aggref. The PartialAggref lives as long as executor start-up,
>  * where it's removed and replaced with it's underlying Aggref.
>  */
> typedef struct PartialAggref
>
> does that help explain it better?

I still think that's solving the problem the wrong way.  Why can't
exprType(), when applied to the Aggref, do something like this?

{   Aggref *aref = (Aggref *) expr;   if (aref->aggpartial)       return aref->aggtranstype;   else       return
aref->aggtype;
}

The obvious answer is "well, because those fields don't exist in
Aggref".  But shouldn't they?  From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.

>> I don't see where this applies has_parallel_hazard or anything
>> comparable to the aggregate functions.  I think it needs to do that.
>
> Not sure what you mean here.

If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table.  It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.

>> +       /* XXX +1 ? do we expect the main process to actually do real work? */
>> +       numPartialGroups = Min(numGroups, subpath->rows) *
>> +                                               (subpath->parallel_degree + 1);
>> I'd leave out the + 1, but I don't think it matters much.
>
> Actually I meant to ask you about this. I see that subpath->rows is
> divided by the Path's parallel_degree, but it seems the main process
> does some work too, so this is why I added + 1, as during my tests
> using a query which produces 10 groups, and had 4 workers, I noticed
> that Gather was getting 50 groups back, rather than 40, I assumed this
> is because the main process is helping too, but from my reading of the
> parallel query threads I believe this is because the Gather, instead
> of sitting around idle tries to do a bit of work too, if it appears
> that nothing else is happening quickly enough. I should probably go
> read nodeGather.c to learn that though.

Yes, the main process does do some work, but less and less as the
query gets more complicated.  See comments in cost_seqscan().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

16 марта 2016 г., 00:50:17

On 16 March 2016 at 09:23, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> A comment does explain this, but perhaps it's not good enough, so I've
>> rewritten it to become.
>>
>> /*
>>  * PartialAggref
>>  *
>>  * When partial aggregation is required in a plan, the nodes from the partial
>>  * aggregate node, up until the finalize aggregate node must pass the partially
>>  * aggregated states up the plan tree. In regards to target list construction
>>  * in setrefs.c, this requires that exprType() returns the state's type rather
>>  * than the final aggregate value's type, and since exprType() for Aggref is
>>  * coded to return the aggtype, this is not correct for us. We can't fix this
>>  * by going around modifying the Aggref to change it's return type as setrefs.c
>>  * requires searching for that Aggref using equals() which compares all fields
>>  * in Aggref, and changing the aggtype would cause such a comparison to fail.
>>  * To get around this problem we wrap the Aggref up in a PartialAggref, this
>>  * allows exprType() to return the correct type and we can handle a
>>  * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
>>  * the underlying Aggref. The PartialAggref lives as long as executor start-up,
>>  * where it's removed and replaced with it's underlying Aggref.
>>  */
>> typedef struct PartialAggref
>>
>> does that help explain it better?
>
> I still think that's solving the problem the wrong way.  Why can't
> exprType(), when applied to the Aggref, do something like this?
>
> {
>     Aggref *aref = (Aggref *) expr;
>     if (aref->aggpartial)
>         return aref->aggtranstype;
>     else
>         return aref->aggtype;
> }
>
> The obvious answer is "well, because those fields don't exist in
> Aggref".  But shouldn't they?  From here, it looks like PartialAggref
> is a cheap hack around not having whacked Aggref around hard for
> partial aggregation.

We could do it that way if we left the aggpartial field out of the
equals() check, but I think we go at length to not do that. Just look
at what's done for all of the location fields. In any case if we did
that then it might not actually be what we want all of the time...
Perhaps in some cases we'd want equals() to return false when the
aggpartial does not match, and in other cases we'd want it to return
true. There's no way to control that behaviour, so to get around the
setrefs.c problem I created the wrapper node type, which I happen to
think is quite clean. Just see Tom's comments about Haribabu's temp
fix for the problem where he put some hacks into the equals for aggref
in [1].

>>> I don't see where this applies has_parallel_hazard or anything
>>> comparable to the aggregate functions.  I think it needs to do that.
>>
>> Not sure what you mean here.
>
> If the aggregate isn't parallel-safe, you can't do this optimization.
> For example, imagine an aggregate function written in PL/pgsql that
> for some reason writes data to a side table.  It's
> has_parallel_hazard's job to check the parallel-safety properties of
> the functions used in the query.

Sorry, I do know what you mean by that. I might have been wrong to
assume that the parallelModeOK check did this. I will dig into how
that is set exactly.

>>> +       /* XXX +1 ? do we expect the main process to actually do real work? */
>>> +       numPartialGroups = Min(numGroups, subpath->rows) *
>>> +                                               (subpath->parallel_degree + 1);
>>> I'd leave out the + 1, but I don't think it matters much.
>>
>> Actually I meant to ask you about this. I see that subpath->rows is
>> divided by the Path's parallel_degree, but it seems the main process
>> does some work too, so this is why I added + 1, as during my tests
>> using a query which produces 10 groups, and had 4 workers, I noticed
>> that Gather was getting 50 groups back, rather than 40, I assumed this
>> is because the main process is helping too, but from my reading of the
>> parallel query threads I believe this is because the Gather, instead
>> of sitting around idle tries to do a bit of work too, if it appears
>> that nothing else is happening quickly enough. I should probably go
>> read nodeGather.c to learn that though.
>
> Yes, the main process does do some work, but less and less as the
> query gets more complicated.  See comments in cost_seqscan().

Thanks

[1] http://www.postgresql.org/message-id/10158.1457329140@sss.pgh.pa.us

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 01:00:20

On Tue, Mar 15, 2016 at 5:50 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> I still think that's solving the problem the wrong way.  Why can't
>> exprType(), when applied to the Aggref, do something like this?
>>
>> {
>>     Aggref *aref = (Aggref *) expr;
>>     if (aref->aggpartial)
>>         return aref->aggtranstype;
>>     else
>>         return aref->aggtype;
>> }
>>
>> The obvious answer is "well, because those fields don't exist in
>> Aggref".  But shouldn't they?  From here, it looks like PartialAggref
>> is a cheap hack around not having whacked Aggref around hard for
>> partial aggregation.
>
> We could do it that way if we left the aggpartial field out of the
> equals() check, but I think we go at length to not do that. Just look
> at what's done for all of the location fields. In any case if we did
> that then it might not actually be what we want all of the time...
> Perhaps in some cases we'd want equals() to return false when the
> aggpartial does not match, and in other cases we'd want it to return
> true. There's no way to control that behaviour, so to get around the
> setrefs.c problem I created the wrapper node type, which I happen to
> think is quite clean. Just see Tom's comments about Haribabu's temp
> fix for the problem where he put some hacks into the equals for aggref
> in [1].

I don't see why we would need to leave aggpartial out of the equals()
check.  I must be missing something.

>>>> I don't see where this applies has_parallel_hazard or anything
>>>> comparable to the aggregate functions.  I think it needs to do that.
>>>
>>> Not sure what you mean here.
>>
>> If the aggregate isn't parallel-safe, you can't do this optimization.
>> For example, imagine an aggregate function written in PL/pgsql that
>> for some reason writes data to a side table.  It's
>> has_parallel_hazard's job to check the parallel-safety properties of
>> the functions used in the query.
>
> Sorry, I do know what you mean by that. I might have been wrong to
> assume that the parallelModeOK check did this. I will dig into how
> that is set exactly.

Hmm, sorry, I wasn't very accurate, there.  The parallelModeOK check
will handle indeed the case where there are parallel-unsafe functions,
but it will not handle the case where there are parallel-restricted
functions.  In that latter case, the query can still use parallelism
someplace, but the parallel-restricted functions cannot be executed
beneath the Gather.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

16 марта 2016 г., 01:56:00

On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't see why we would need to leave aggpartial out of the equals()
> check.  I must be missing something.

See fix_combine_agg_expr_mutator()

This piece of code:

/*
* Aggrefs for partial aggregates are wrapped up in a PartialAggref,
* we need to look into the PartialAggref to find the Aggref within.
*/
foreach(lc, context->subplan_itlist->tlist)
{
PartialAggref *paggref;
tle = (TargetEntry *) lfirst(lc);
paggref = (PartialAggref *) tle->expr;

if (IsA(paggref, PartialAggref) &&
equal(paggref->aggref, aggref))
break;
}

if equals() compared the aggpartial then this code would fail to find
the Aggref in the subnode due to the aggpartial field being true on
one and false on the other Aggref.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 02:58:15

On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
>> I don't see why we would need to leave aggpartial out of the equals()
>> check.  I must be missing something.
>
> See fix_combine_agg_expr_mutator()
>
> This piece of code:
>
> /*
> * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
> * we need to look into the PartialAggref to find the Aggref within.
> */
> foreach(lc, context->subplan_itlist->tlist)
> {
> PartialAggref *paggref;
> tle = (TargetEntry *) lfirst(lc);
> paggref = (PartialAggref *) tle->expr;
>
> if (IsA(paggref, PartialAggref) &&
> equal(paggref->aggref, aggref))
> break;
> }
>
> if equals() compared the aggpartial then this code would fail to find
> the Aggref in the subnode due to the aggpartial field being true on
> one and false on the other Aggref.

...and why would one be true and the other false?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

16 марта 2016 г., 03:04:41

On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I don't see why we would need to leave aggpartial out of the equals()
>>> check.  I must be missing something.
>>
>> See fix_combine_agg_expr_mutator()
>>
>> This piece of code:
>>
>> /*
>> * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
>> * we need to look into the PartialAggref to find the Aggref within.
>> */
>> foreach(lc, context->subplan_itlist->tlist)
>> {
>> PartialAggref *paggref;
>> tle = (TargetEntry *) lfirst(lc);
>> paggref = (PartialAggref *) tle->expr;
>>
>> if (IsA(paggref, PartialAggref) &&
>> equal(paggref->aggref, aggref))
>> break;
>> }
>>
>> if equals() compared the aggpartial then this code would fail to find
>> the Aggref in the subnode due to the aggpartial field being true on
>> one and false on the other Aggref.
>
> ...and why would one be true and the other false?

One would be the combine aggregate (having aggpartial = false), and
the one in the subnode would be the partial aggregate (having
aggpartial = true)
Notice in create_grouping_paths() I build a partial aggregate version
of the PathTarget named partial_group_target, this one goes into the
partial agg node, and Gather node. In this case the aggpartial will be
set differently for the Aggrefs in each of the two PathTargets, so it
would not be possible in setrefs.c to find the correct target list
entry in the subnode by using equal(). It'll just end up triggering
the elog(ERROR, "Aggref not found in subplan target list"); error.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 03:42:48

On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 15, 2016 at 6:55 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> On 16 March 2016 at 11:00, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> I don't see why we would need to leave aggpartial out of the equals()
>>>> check.  I must be missing something.
>>>
>>> See fix_combine_agg_expr_mutator()
>>>
>>> This piece of code:
>>>
>>> /*
>>> * Aggrefs for partial aggregates are wrapped up in a PartialAggref,
>>> * we need to look into the PartialAggref to find the Aggref within.
>>> */
>>> foreach(lc, context->subplan_itlist->tlist)
>>> {
>>> PartialAggref *paggref;
>>> tle = (TargetEntry *) lfirst(lc);
>>> paggref = (PartialAggref *) tle->expr;
>>>
>>> if (IsA(paggref, PartialAggref) &&
>>> equal(paggref->aggref, aggref))
>>> break;
>>> }
>>>
>>> if equals() compared the aggpartial then this code would fail to find
>>> the Aggref in the subnode due to the aggpartial field being true on
>>> one and false on the other Aggref.
>>
>> ...and why would one be true and the other false?
>
> One would be the combine aggregate (having aggpartial = false), and
> the one in the subnode would be the partial aggregate (having
> aggpartial = true)
> Notice in create_grouping_paths() I build a partial aggregate version
> of the PathTarget named partial_group_target, this one goes into the
> partial agg node, and Gather node. In this case the aggpartial will be
> set differently for the Aggrefs in each of the two PathTargets, so it
> would not be possible in setrefs.c to find the correct target list
> entry in the subnode by using equal(). It'll just end up triggering
> the elog(ERROR, "Aggref not found in subplan target list"); error.

OK, I get it now.  I still don't like it very much.  There's no
ironclad requirement that we use equal() here as opposed to some
bespoke comparison function with the exact semantics we need, and ISTM
that getting rid of PartialAggref would shrink this patch down quite a
bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

16 марта 2016 г., 03:56:05

On 16 March 2016 at 13:42, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
>>> ...and why would one be true and the other false?
>>
>> One would be the combine aggregate (having aggpartial = false), and
>> the one in the subnode would be the partial aggregate (having
>> aggpartial = true)
>> Notice in create_grouping_paths() I build a partial aggregate version
>> of the PathTarget named partial_group_target, this one goes into the
>> partial agg node, and Gather node. In this case the aggpartial will be
>> set differently for the Aggrefs in each of the two PathTargets, so it
>> would not be possible in setrefs.c to find the correct target list
>> entry in the subnode by using equal(). It'll just end up triggering
>> the elog(ERROR, "Aggref not found in subplan target list"); error.
>
> OK, I get it now.  I still don't like it very much.  There's no
> ironclad requirement that we use equal() here as opposed to some
> bespoke comparison function with the exact semantics we need, and ISTM
> that getting rid of PartialAggref would shrink this patch down quite a
> bit.

Well that might work. I'd not thought of doing it that way. The only
issue that I can foresee with that is that when new fields are added
to Aggref in the future, we might miss updating that custom comparison
function to include them.

Should I update the patch to use the method you describe?

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 04:08:36

On Tue, Mar 15, 2016 at 8:55 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 16 March 2016 at 13:42, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> ...and why would one be true and the other false?
>>>
>>> One would be the combine aggregate (having aggpartial = false), and
>>> the one in the subnode would be the partial aggregate (having
>>> aggpartial = true)
>>> Notice in create_grouping_paths() I build a partial aggregate version
>>> of the PathTarget named partial_group_target, this one goes into the
>>> partial agg node, and Gather node. In this case the aggpartial will be
>>> set differently for the Aggrefs in each of the two PathTargets, so it
>>> would not be possible in setrefs.c to find the correct target list
>>> entry in the subnode by using equal(). It'll just end up triggering
>>> the elog(ERROR, "Aggref not found in subplan target list"); error.
>>
>> OK, I get it now.  I still don't like it very much.  There's no
>> ironclad requirement that we use equal() here as opposed to some
>> bespoke comparison function with the exact semantics we need, and ISTM
>> that getting rid of PartialAggref would shrink this patch down quite a
>> bit.
>
> Well that might work. I'd not thought of doing it that way. The only
> issue that I can foresee with that is that when new fields are added
> to Aggref in the future, we might miss updating that custom comparison
> function to include them.

That's true, but it doesn't seem like that big a deal.  A code comment
in the Aggref definition seems like sufficient insurance against such
a mistake.

> Should I update the patch to use the method you describe?

Well, my feeling is that is going to make this a lot smaller and
simpler, so I think so.  But if you disagree strongly, let's discuss
further.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

16 марта 2016 г., 04:24:01

On 16 March 2016 at 14:08, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 15, 2016 at 8:55 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 16 March 2016 at 13:42, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Mar 15, 2016 at 8:04 PM, David Rowley
>>> <david.rowley@2ndquadrant.com> wrote:
>>>> On 16 March 2016 at 12:58, Robert Haas <robertmhaas@gmail.com> wrote:
>>>>> ...and why would one be true and the other false?
>>>>
>>>> One would be the combine aggregate (having aggpartial = false), and
>>>> the one in the subnode would be the partial aggregate (having
>>>> aggpartial = true)
>>>> Notice in create_grouping_paths() I build a partial aggregate version
>>>> of the PathTarget named partial_group_target, this one goes into the
>>>> partial agg node, and Gather node. In this case the aggpartial will be
>>>> set differently for the Aggrefs in each of the two PathTargets, so it
>>>> would not be possible in setrefs.c to find the correct target list
>>>> entry in the subnode by using equal(). It'll just end up triggering
>>>> the elog(ERROR, "Aggref not found in subplan target list"); error.
>>>
>>> OK, I get it now.  I still don't like it very much.  There's no
>>> ironclad requirement that we use equal() here as opposed to some
>>> bespoke comparison function with the exact semantics we need, and ISTM
>>> that getting rid of PartialAggref would shrink this patch down quite a
>>> bit.
>>
>> Well that might work. I'd not thought of doing it that way. The only
>> issue that I can foresee with that is that when new fields are added
>> to Aggref in the future, we might miss updating that custom comparison
>> function to include them.
>
> That's true, but it doesn't seem like that big a deal.  A code comment
> in the Aggref definition seems like sufficient insurance against such
> a mistake.
>
>> Should I update the patch to use the method you describe?
>
> Well, my feeling is that is going to make this a lot smaller and
> simpler, so I think so.  But if you disagree strongly, let's discuss
> further.

Not strongly. It means that Aggref will need another field to store
the transtype and/or the serialtype (for the follow-on patches in
Combining Aggregates thread)
The only other issue which I don't think I've addressed yet is target
list width estimates. Probably that can just pay attention to the
aggpartial flag too, and if I fix that then likely the Aggref needs to
carry around a aggtransspace field too, which we really only need to
populate when the Aggref is in partial mode... it would be wasteful to
bother looking that up from the catalogs if we're never going to use
the Aggref in partial mode, and it seems weird to leave it as zero and
only populate it when we need it. So... if I put on my object oriented
hat and think about this.... My brain says that Aggref should inherit
from PartialAggref.... and that's what I have now... or at least some
C variant. So I really do think it's cleaner and easier to understand
by keeping the ParialAggref.

I see on looking at exprType() that we do have other node types which
conditionally return a different type, but seeing that does not fill
me with joy.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 05:04:33

On Tue, Mar 15, 2016 at 9:23 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>>> Should I update the patch to use the method you describe?
>>
>> Well, my feeling is that is going to make this a lot smaller and
>> simpler, so I think so.  But if you disagree strongly, let's discuss
>> further.
>
> Not strongly. It means that Aggref will need another field to store
> the transtype and/or the serialtype (for the follow-on patches in
> Combining Aggregates thread)
> The only other issue which I don't think I've addressed yet is target
> list width estimates. Probably that can just pay attention to the
> aggpartial flag too, and if I fix that then likely the Aggref needs to
> carry around a aggtransspace field too, which we really only need to
> populate when the Aggref is in partial mode... it would be wasteful to
> bother looking that up from the catalogs if we're never going to use
> the Aggref in partial mode, and it seems weird to leave it as zero and
> only populate it when we need it. So... if I put on my object oriented
> hat and think about this.... My brain says that Aggref should inherit
> from PartialAggref.... and that's what I have now... or at least some
> C variant. So I really do think it's cleaner and easier to understand
> by keeping the ParialAggref.
>
> I see on looking at exprType() that we do have other node types which
> conditionally return a different type, but seeing that does not fill
> me with joy.

I don't think I'd be objecting if you made PartialAggref a real
alternative to Aggref.  But that's not what you've got here.  A
PartialAggref is just a wrapper around an underlying Aggref that
changes the interpretation of it - and I think that's not a good idea.
If you want to have Aggref and PartialAggref as truly parallel node
types, that seems cool, and possibly better than what you've got here
now.  Alternatively, Aggref can do everything.  But I don't think we
should go with this wrapper concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

16 марта 2016 г., 13:49:51

On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't think I'd be objecting if you made PartialAggref a real
> alternative to Aggref.  But that's not what you've got here.  A
> PartialAggref is just a wrapper around an underlying Aggref that
> changes the interpretation of it - and I think that's not a good idea.
> If you want to have Aggref and PartialAggref as truly parallel node
> types, that seems cool, and possibly better than what you've got here
> now.  Alternatively, Aggref can do everything.  But I don't think we
> should go with this wrapper concept.

Ok, I've now gotten rid of the PartialAggref node, and I'm actually
quite happy with how it turned out. I made
search_indexed_tlist_for_partial_aggref() to follow-on the series of
other search_indexed_tlist_for_* functions and have made it behave the
same way, by returning the newly created Var instead of doing that in
fix_combine_agg_expr_mutator(), as the last version did.

Thanks for the suggestion.

New patch attached.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-16.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 14:57:33

On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
>> I don't think I'd be objecting if you made PartialAggref a real
>> alternative to Aggref.  But that's not what you've got here.  A
>> PartialAggref is just a wrapper around an underlying Aggref that
>> changes the interpretation of it - and I think that's not a good idea.
>> If you want to have Aggref and PartialAggref as truly parallel node
>> types, that seems cool, and possibly better than what you've got here
>> now.  Alternatively, Aggref can do everything.  But I don't think we
>> should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.

Cool!  Why not initialize aggpartialtype always?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Amit Kapila

Дата:

16 марта 2016 г., 15:19:27

On Wed, Mar 16, 2016 at 4:19 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref. But that's not what you've got here. A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now. Alternatively, Aggref can do everything. But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>

Few assorted comments:

+ * Determine if it's possible to perform aggregation in parallel using

+ * multiple worker processes. We can permit this when there's at least one

+ * partial_path in input_rel, but not if the query has grouping sets,

+ * (although this likely just requires a bit more thought). We must also

+ * ensure that any aggregate functions which are present in either the

+ * target list, or in the HAVING clause all support parallel mode.

+ */

+ can_parallel = false;

+ if ((parse->hasAggs || parse->groupClause != NIL) &&

+ input_rel->partial_pathlist != NIL &&

+ parse->groupingSets == NIL &&

+ root->glob->parallelModeOK)

I think here you need to use has_parallel_hazard() with second parameter as false to ensure expressions are parallel safe. glob->parallelModeOK flag indicates that there is no parallel unsafe expression, but it can still contain parallel restricted expression.

AggPath *

create_agg_path(PlannerInfo *root,

@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,

List *groupClause,

List *qual,

const AggClauseCosts

*aggcosts,

- double numGroups)

+ double numGroups,

bool combineStates,

+ bool finalizeAggs)

Don't you need to set parallel_aware flag in this function as we do for create_seqscan_path()?

postgres=# explain select count(*) from t1;

QUERY PLAN

--------------------------------------------------------------------------------

------

Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)

-> Gather (cost=45420.35..45420.56 rows=2 width=8)

Number of Workers: 2

-> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)

-> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988 wid

th=0)

(5 rows)

Isn't it better to call it as Parallel Aggregate instead of Partial Aggregate. Initialy, we have kept Partial for seqscan, but later on we changed to Parallel Seq Scan, so I am not able to think why it is better to call Partial incase of Aggregates.

+ * Likewise for any partial paths, although this case is more simple as

* we don't track the cheapest path.

+ */

+ foreach(lc, current_rel->partial_pathlist)

{

+ Path *subpath = (Path *) lfirst(lc);

+ Assert(subpath->param_info ==

NULL);

+ lfirst(lc) = apply_projection_to_path(root, current_rel,

subpath, scanjoin_target);

+ }

Can't we do this by teaching apply_projection_to_path() as done in the latest patch posted by me to push down the target list beneath workers [1].

+ /*

+ * If we find any aggs with an internal transtype then we must ensure

+ * that

pointers to aggregate states are not passed to other processes,

+ * therefore we set the maximum degree

to PAT_INTERNAL_ONLY.

+ */

+ if (aggform->aggtranstype == INTERNALOID)

context->allowedtype = PAT_INTERNAL_ONLY;

In the above comment, you have refered maximum degree which is not making much sense to me. If it is not a typo, then can you clarify the same.

+ * fix_combine_agg_expr

+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of

+ * Aggrefs so

that they references the corresponding Aggref in the subplan.

+ */

+static Node *

+fix_combine_agg_expr(PlannerInfo

*root,

+ Node *node,

+ indexed_tlist *subplan_itlist,

Index newvarno,

+ int rtoffset)

+ fix_upper_expr_context context;

+ context.root =

root;

+ context.subplan_itlist = subplan_itlist;

+ context.newvarno = newvarno;

+ context.rtoffset = rtoffset;

return fix_combine_agg_expr_mutator(node, &context);

+static Node *

+fix_combine_agg_expr_mutator(Node *node,

fix_upper_expr_context *context)

Don't we want to handle the case of context->subplan_itlist->has_non_vars as it is handled in fix_upper_expr_mutator()? If not then, I think adding the reason for same in comments above function would be better.

tlist.c

\ No newline at end of file

There should be a new line at end of file.

[1] - http://www.postgresql.org/message-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 15:29:42

On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Isn't it better to call it as Parallel Aggregate instead of Partial
> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> changed to Parallel Seq Scan, so I am not able to think why it is better to
> call Partial incase of Aggregates.

I think partial is the right terminology.  Unlike a parallel
sequential scan, a partial aggregate isn't parallel-aware and could be
used in contexts having nothing to do with parallelism.  It's just
that it outputs transition values instead of a finalized value.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Robert Haas

Дата:

16 марта 2016 г., 17:47:06

On Wed, Mar 16, 2016 at 7:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I don't think I'd be objecting if you made PartialAggref a real
>>> alternative to Aggref.  But that's not what you've got here.  A
>>> PartialAggref is just a wrapper around an underlying Aggref that
>>> changes the interpretation of it - and I think that's not a good idea.
>>> If you want to have Aggref and PartialAggref as truly parallel node
>>> types, that seems cool, and possibly better than what you've got here
>>> now.  Alternatively, Aggref can do everything.  But I don't think we
>>> should go with this wrapper concept.
>>
>> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
>> quite happy with how it turned out. I made
>> search_indexed_tlist_for_partial_aggref() to follow-on the series of
>> other search_indexed_tlist_for_* functions and have made it behave the
>> same way, by returning the newly created Var instead of doing that in
>> fix_combine_agg_expr_mutator(), as the last version did.
>>
>> Thanks for the suggestion.
>>
>> New patch attached.
>
> Cool!  Why not initialize aggpartialtype always?

More review comments:
               /*
+                * Likewise for any partial paths, although this case
is more simple as
+                * we don't track the cheapest path.
+                */

I think in the process of getting rebased over the rapidly-evolving
underlying substructure, this comment no longer makes much sense where
it is in the file. IIUC, the comment is referring back to "Forcibly
apply that target to all the Paths for the scan/join rel", but there's
now enough other stuff in the middle that it doesn't really make sense
any more.  And actually, I think you should move the code up higher,
not change the comment.  This belongs before setting
root->upper_targets[foo].

The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
have both complained about, wrong in detail because it doesn't call
has_parallel_hazard anywhere.  Basically, you have the wrong design.
There shouldn't be any need to check parallelModeOK here.  Rather,
what you should be doing is setting consider_parallel to true or false
on the upper rel.  See set_rel_consider_parallel for how this is set
for base relations, set_append_rel_size() for append relations, and
perhaps most illustratively build_join_rel() for join relations.  You
should have some equivalent of this logic for upper rels, or at least
the upper rels you care about:
  if (inner_rel->consider_parallel && outer_rel->consider_parallel &&       !has_parallel_hazard((Node *) restrictlist,
false))      joinrel->consider_parallel = true;

Then, you can consider parallel aggregation if consider_parallel is
true and any other conditions that you care about are also met.

I think that the way you are considering sorted aggregation, hashed
aggregation, and mixed strategies does not make very much sense.  It
seems to me that what you should do is:

1. Figure out the cheapest PartialAggregate path.  You will need to
compare the costs of (a) a hash aggregate, (b) an explicit sort +
group aggregate, and (c) grouping a presorted path.  (We can
technically skip (c) for right now since it can't occur.)  I would go
ahead and use add_partial_path() on these to stuff them into the
partial_pathlist for the upper rel.

2. Take the first (cheapest) path in the partial_pathlist and stick a
Gather node on top of it.  Store this in a local variable, say,
partial_aggregate_path.

3. Construct a finalize-hash-aggregate path for partial_aggregate_path
and also a sort+finalize-group/plain-aggregate path for
partial_aggregate_path, and add each of those to the upper rel.  They
will either beat out the non-parallel paths or they won't.

The point is that the decision as to whether to use hashing or sorting
below the Gather is completely independent from the choice of which
one to use above the Gather.  Pick the best strategy below the Gather;
then pick the best strategy to stick on top of that above the Gather.
* This is very similar to make_group_input_target(), only we do not recurse* into Aggrefs. Aggrefs are left intact and
addedto the target list. Here we* also add any Aggrefs which are located in the HAVING clause into the* PathTarget.**
Aggrefsare also setup into partial mode and the partial return types are* set to become the type of the aggregate
transitionstate rather than the* aggregate function's return type.

This comment isn't very helpful because it tells you what the function
does, which you can find out anyway from reading the code.  What it
should do is explain why it does it.  Just to take one particular
point, why would we not want to recurse into Aggrefs in this case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

17 марта 2016 г., 00:06:01

On 17 March 2016 at 00:57, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 6:49 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I don't think I'd be objecting if you made PartialAggref a real
>>> alternative to Aggref.  But that's not what you've got here.  A
>>> PartialAggref is just a wrapper around an underlying Aggref that
>>> changes the interpretation of it - and I think that's not a good idea.
>>> If you want to have Aggref and PartialAggref as truly parallel node
>>> types, that seems cool, and possibly better than what you've got here
>>> now.  Alternatively, Aggref can do everything.  But I don't think we
>>> should go with this wrapper concept.
>>
>> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
>> quite happy with how it turned out. I made
>> search_indexed_tlist_for_partial_aggref() to follow-on the series of
>> other search_indexed_tlist_for_* functions and have made it behave the
>> same way, by returning the newly created Var instead of doing that in
>> fix_combine_agg_expr_mutator(), as the last version did.
>>
>> Thanks for the suggestion.
>>
>> New patch attached.
>
> Cool!  Why not initialize aggpartialtype always?

Because the follow-on patch sets that to either the serialtype or the
aggtranstype, depending on if serialisation is required. Serialisation
is required for parallel aggregate, but if we're performing the
partial agg in the main process, then we'd not need to do that. This
could be solved by adding more fields to AggRef to cover the
aggserialtype and perhaps expanding aggpartial into an enum mode which
allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
attention to the mode and return 1 of the 3 possible types.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

17 марта 2016 г., 00:08:14

On 17 March 2016 at 01:29, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Isn't it better to call it as Parallel Aggregate instead of Partial
>> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
>> changed to Parallel Seq Scan, so I am not able to think why it is better to
>> call Partial incase of Aggregates.
>
> I think partial is the right terminology.  Unlike a parallel
> sequential scan, a partial aggregate isn't parallel-aware and could be
> used in contexts having nothing to do with parallelism.  It's just
> that it outputs transition values instead of a finalized value.

+1  the reason the partial aggregate patches have been kept separate
from the parallel aggregate patches is that partial aggregate will
serve for many other purposes. Parallel Aggregate is just one of many
possible use cases for this, so it makes little sense to give it a
name according to a single use case.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

17 марта 2016 г., 06:11:05

On 17 March 2016 at 03:47, Robert Haas <robertmhaas@gmail.com> wrote:
> More review comments:
>
>                 /*
> +                * Likewise for any partial paths, although this case
> is more simple as
> +                * we don't track the cheapest path.
> +                */
>
> I think in the process of getting rebased over the rapidly-evolving
> underlying substructure, this comment no longer makes much sense where
> it is in the file. IIUC, the comment is referring back to "Forcibly
> apply that target to all the Paths for the scan/join rel", but there's
> now enough other stuff in the middle that it doesn't really make sense
> any more.  And actually, I think you should move the code up higher,
> not change the comment.  This belongs before setting
> root->upper_targets[foo].

Yes, that's not where it was originally... contents may settle during transit...

> The logic in create_grouping_paths() is too ad-hoc and, as Amit and I
> have both complained about, wrong in detail because it doesn't call
> has_parallel_hazard anywhere.  Basically, you have the wrong design.
> There shouldn't be any need to check parallelModeOK here.  Rather,
> what you should be doing is setting consider_parallel to true or false
> on the upper rel.  See set_rel_consider_parallel for how this is set
> for base relations, set_append_rel_size() for append relations, and
> perhaps most illustratively build_join_rel() for join relations.  You
> should have some equivalent of this logic for upper rels, or at least
> the upper rels you care about:
>
>    if (inner_rel->consider_parallel && outer_rel->consider_parallel &&
>         !has_parallel_hazard((Node *) restrictlist, false))
>         joinrel->consider_parallel = true;
>
> Then, you can consider parallel aggregation if consider_parallel is
> true and any other conditions that you care about are also met.
>
> I think that the way you are considering sorted aggregation, hashed
> aggregation, and mixed strategies does not make very much sense.  It
> seems to me that what you should do is:
>
> 1. Figure out the cheapest PartialAggregate path.  You will need to
> compare the costs of (a) a hash aggregate, (b) an explicit sort +
> group aggregate, and (c) grouping a presorted path.  (We can
> technically skip (c) for right now since it can't occur.)  I would go
> ahead and use add_partial_path() on these to stuff them into the
> partial_pathlist for the upper rel.
>
> 2. Take the first (cheapest) path in the partial_pathlist and stick a
> Gather node on top of it.  Store this in a local variable, say,
> partial_aggregate_path.
>
> 3. Construct a finalize-hash-aggregate path for partial_aggregate_path
> and also a sort+finalize-group/plain-aggregate path for
> partial_aggregate_path, and add each of those to the upper rel.  They
> will either beat out the non-parallel paths or they won't.
>
> The point is that the decision as to whether to use hashing or sorting
> below the Gather is completely independent from the choice of which
> one to use above the Gather.  Pick the best strategy below the Gather;
> then pick the best strategy to stick on top of that above the Gather.

Good point. I've made local alterations to the patch so that partial
paths are now generated on the grouped_rel.
I also got rid of the enable_hashagg test in create_grouping_paths().
The use of this did seemed rather old school planner. Instead I
altered cost_agg() to add disable_cost appropriately, which simplifies
the logic of when and when not to consider hash aggregate paths. The
only downside of this that I can see is that the hash agg Path is
still generated when enable_hashagg is off, and that means a tiny bit
more work creating the path and calling add_path() for it.

This change also allows nodeGroup to be used for parallel aggregate
when there are no aggregate functions. This was a bit broken in the
old version.

>  * This is very similar to make_group_input_target(), only we do not recurse
>  * into Aggrefs. Aggrefs are left intact and added to the target list. Here we
>  * also add any Aggrefs which are located in the HAVING clause into the
>  * PathTarget.
>  *
>  * Aggrefs are also setup into partial mode and the partial return types are
>  * set to become the type of the aggregate transition state rather than the
>  * aggregate function's return type.
>
> This comment isn't very helpful because it tells you what the function
> does, which you can find out anyway from reading the code.  What it
> should do is explain why it does it.  Just to take one particular
> point, why would we not want to recurse into Aggrefs in this case?

Good point. I've updated it to:

/** make_partialgroup_input_target*  Generate appropriate PathTarget for input for Partial Aggregate nodes.** Similar
tomake_group_input_target(), only we don't recurse into Aggrefs, as* we need these to remain intact so that they can be
foundlater in  Combine* Aggregate nodes during setrefs. Vars will be still pulled out of* non-Aggref nodes as these
willstill be required by the combine aggregate* phase.** We also convert any Aggrefs which we do find and put them into
partialmode,* this adjusts the Aggref's return type so that the partially calculated* aggregate value can make its way
upthe execution tree up to the Finalize* Aggregate node.*/
 

I will post an updated patch once I've addressed Amit's points.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

17 марта 2016 г., 06:13:19

Hi again,

This is probably me missing something, but is there a reason parallel aggregate doesn't seem to ever create append nodes containing Index scans?

SET random_page_cost TO 0.2;

SET max_parallel_degree TO 8;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;

QUERY PLAN

-------------------------------------------------------------------------------------------------

Finalize GroupAggregate (cost=310596.32..310598.03 rows=31 width=16)

Group Key: view_time_day

-> Sort (cost=310596.32..310596.79 rows=186 width=16)

Sort Key: view_time_day

-> Gather (cost=310589.00..310589.31 rows=186 width=16)

Number of Workers: 5

-> Partial HashAggregate (cost=310589.00..310589.31 rows=31 width=16)

Group Key: view_time_day

-> Parallel Seq Scan on base (cost=0.00..280589.00 rows=6000000 width=12)

SET max_parallel_degree TO 0;

postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

GroupAggregate (cost=0.56..600085.92 rows=31 width=16)

Group Key: view_time_day

-> Index Only Scan using base_view_time_day_count_i_idx on base (cost=0.56..450085.61 rows=30000000 width=12)

(3 rows)

Cheers,

James Sewell,
Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

P (+61) 3 8370 8000 W www.lisasoft.com F (+61) 3 8370 8099

On Thu, Mar 17, 2016 at 8:08 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 17 March 2016 at 01:29, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 8:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Isn't it better to call it as Parallel Aggregate instead of Partial
>> Aggregate. Initialy, we have kept Partial for seqscan, but later on we
>> changed to Parallel Seq Scan, so I am not able to think why it is better to
>> call Partial incase of Aggregates.
>
> I think partial is the right terminology. Unlike a parallel
> sequential scan, a partial aggregate isn't parallel-aware and could be
> used in contexts having nothing to do with parallelism. It's just
> that it outputs transition values instead of a finalized value.

+1 the reason the partial aggregate patches have been kept separate
from the parallel aggregate patches is that partial aggregate will
serve for many other purposes. Parallel Aggregate is just one of many
possible use cases for this, so it makes little sense to give it a
name according to a single use case.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Parallel Aggregate

От

Haribabu Kommi

Дата:

17 марта 2016 г., 06:42:21

On Thu, Mar 17, 2016 at 2:13 PM, James Sewell <james.sewell@lisasoft.com> wrote:
>
> Hi again,
>
> This is probably me missing something, but is there a reason parallel aggregate doesn't seem to ever create append
nodescontaining Index scans?
 
>
> SET random_page_cost TO 0.2;
> SET max_parallel_degree TO 8;
>
> postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
>                                            QUERY PLAN
> -------------------------------------------------------------------------------------------------
>  Finalize GroupAggregate  (cost=310596.32..310598.03 rows=31 width=16)
>    Group Key: view_time_day
>    ->  Sort  (cost=310596.32..310596.79 rows=186 width=16)
>          Sort Key: view_time_day
>          ->  Gather  (cost=310589.00..310589.31 rows=186 width=16)
>                Number of Workers: 5
>                ->  Partial HashAggregate  (cost=310589.00..310589.31 rows=31 width=16)
>                      Group Key: view_time_day
>                      ->  Parallel Seq Scan on base  (cost=0.00..280589.00 rows=6000000 width=12)
>
>
> SET max_parallel_degree TO 0;
>
> postgres=# explain SELECT sum(count_i) FROM base GROUP BY view_time_day;
>                                                     QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..600085.92 rows=31 width=16)
>    Group Key: view_time_day
>    ->  Index Only Scan using base_view_time_day_count_i_idx on base  (cost=0.56..450085.61 rows=30000000 width=12)
> (3 rows)


To get good parallelism benefit, the workers has to execute most of
the plan in parallel.
If we run only some part of the upper plan in parallel, we may not get
better parallelism
benefit. At present only seq scan node possible for parallelism at
scan node level.
Index scan is not possible as of now. So because of this reason based
on the overall
cost of the parallel aggregate + parallel seq scan, the plan is chosen.

If index scan is changed to make it parallel in future, it is possible
that parallel aggregate +
parallel index scan plan may chosen.

Regards,
Hari Babu
Fujitsu Australia

Re: Parallel Aggregate

От

David Rowley

Дата:

17 марта 2016 г., 08:05:53

On 17 March 2016 at 01:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Few assorted comments:
>
> 1.
>   /*
> + * Determine if it's possible to perform aggregation in parallel using
> + * multiple worker processes. We can permit this when there's at least one
> + * partial_path in input_rel, but not if the query has grouping sets,
> + * (although this likely just requires a bit more thought). We must also
> + * ensure that any aggregate functions which are present in either the
> + * target list, or in the HAVING clause all support parallel mode.
> + */
> + can_parallel = false;
> +
> + if ((parse->hasAggs || parse->groupClause != NIL) &&
> + input_rel->partial_pathlist != NIL &&
> + parse->groupingSets == NIL &&
> + root->glob->parallelModeOK)
>
> I think here you need to use has_parallel_hazard() with second parameter as
> false to ensure expressions are parallel safe. glob->parallelModeOK flag
> indicates that there is no parallel unsafe expression, but it can still
> contain parallel restricted expression.

Yes, I'd not gotten to fixing that per Robert's original comment about
it, but I think I have now.

> 2.
>  AggPath *
>  create_agg_path(PlannerInfo *root,
> @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>
> List *groupClause,
>   List *qual,
>   const AggClauseCosts
> *aggcosts,
> - double numGroups)
> + double numGroups,
> +
> bool combineStates,
> + bool finalizeAggs)
>
> Don't you need to set parallel_aware flag in this function as we do for
> create_seqscan_path()?

I don't really know the answer to that... I mean there's nothing
special done in nodeAgg.c if the node is running in a worker or in the
main process. So I guess the only difference is that EXPLAIN will read
"Parallel Partial (Hash|Group)Aggregate" instead of "Partial
(Hash|Group)Aggregate", is that desired? What's the logic behind
having "Parallel" in EXPLAIN?

> 3.
> postgres=# explain select count(*) from t1;
>                                       QUERY PLAN
>
> --------------------------------------------------------------------------------
> ------
>  Finalize Aggregate  (cost=45420.57..45420.58 rows=1 width=8)
>    ->  Gather  (cost=45420.35..45420.56 rows=2 width=8)
>          Number of Workers: 2
>          ->  Partial Aggregate  (cost=44420.35..44420.36 rows=1 width=8)
>                ->  Parallel Seq Scan on t1  (cost=0.00..44107.88 rows=124988
> wid
> th=0)
> (5 rows)
>
> Isn't it better to call it as Parallel Aggregate instead of Partial
> Aggregate.  Initialy, we have kept Partial for seqscan, but later on we
> changed to Parallel Seq Scan, so I am not able to think why it is better to
> call Partial incase of Aggregates.

I already commented on this.

> 4.
>   /*
> + * Likewise for any partial paths, although this case is more simple as
> +
>  * we don't track the cheapest path.
> + */
> + foreach(lc, current_rel->partial_pathlist)
> +
> {
> + Path   *subpath = (Path *) lfirst(lc);
> +
> + Assert(subpath->param_info ==
> NULL);
> + lfirst(lc) = apply_projection_to_path(root, current_rel,
> +
> subpath, scanjoin_target);
> + }
> +
>
>
> Can't we do this by teaching apply_projection_to_path() as done in the
> latest patch posted by me to push down the target list beneath workers [1].

Probably, but I'm not sure I want to go changing that now. The patch
is useful without it, so perhaps it can be a follow-on fix.

> 5.
> + /*
> + * If we find any aggs with an internal transtype then we must ensure
> + * that
> pointers to aggregate states are not passed to other processes,
> + * therefore we set the maximum degree
> to PAT_INTERNAL_ONLY.
> + */
> + if (aggform->aggtranstype == INTERNALOID)
> +
> context->allowedtype = PAT_INTERNAL_ONLY;
>
> In the above comment, you have refered maximum degree which is not making
> much sense to me. If it is not a typo, then can you clarify the same.

hmm. I don't quite understand the confusion. Perhaps you think of
"degree" in the parallel sense? This is talking about the levels of
degree of partial aggregation, which is nothing to do with parallel
aggregation, parallel aggregation just requires that to work. The
different. "Degree" in this sense was just meaning that PAY_ANY is the
highest degree, PAT_INTERNAL_ONLY lesser so etc. I thought the
"degree" thing was explained ok in the comment for
aggregates_allow_partial(), but perhaps I should just remove it, if
it's confusing.

Changed to:
/*
* If we find any aggs with an internal transtype then we must ensure
* that pointers to aggregate states are not passed to other processes,
* therefore we set the maximum allowed type to PAT_INTERNAL_ONLY.
*/

> 6.
> + * fix_combine_agg_expr
> + *  Like fix_upper_expr() but additionally adjusts the Aggref->args of
> + *  Aggrefs so
> that they references the corresponding Aggref in the subplan.
> + */
> +static Node *
> +fix_combine_agg_expr(PlannerInfo
> *root,
> +   Node *node,
> +   indexed_tlist *subplan_itlist,
> +
> Index newvarno,
> +   int rtoffset)
> +{
> + fix_upper_expr_context context;
> +
> + context.root =
> root;
> + context.subplan_itlist = subplan_itlist;
> + context.newvarno = newvarno;
> + context.rtoffset = rtoffset;
> +
> return fix_combine_agg_expr_mutator(node, &context);
> +}
> +
> +static Node *
> +fix_combine_agg_expr_mutator(Node *node,
> fix_upper_expr_context *context)
>
> Don't we want to handle the case of context->subplan_itlist->has_non_vars as
> it is handled in fix_upper_expr_mutator()?  If not then, I think adding the
> reason for same in comments above function would be better.

Yes it should be doing the same as fix_upper_expr_mutator with the
exception of handling of Aggrefs Will fix. Thanks!

> 7.
> tlist.c
>
> +}
> \ No newline at end of file
>
> There should be a new line at end of file.

Thanks.

> [1] -
> http://www.postgresql.org/message-id/CAA4eK1Jk8hm-2j-CKjvdd0CZTsdPX=EdK_qhzc4689hq0xtfMQ@mail.gmail.com

Updated patch attached.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patch

Re: Parallel Aggregate

От

David Rowley

Дата:

17 марта 2016 г., 15:22:13

On 17 March 2016 at 18:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Updated patch attached.

Please disregard
0001-Allow-aggregation-to-happen-in-parallel_2016-03-17.patch. This
contained a badly thought through last minute change to how the Gather
path is generated and is broken.

I played around with ways of generating the Gather node as
create_gather_path() is not really geared up for what's needed here,
since grouped_rel cannot be passed into create_gather_path() since it
contains the final aggregate PathTarget rather than the partial
PathTarget, and also incorrect row estimates. I've ended up with an
extra double *rows argument in this function to make it possible to
override the rows from rel. I'm not sure how this'll go down... It
does not seem perfect.

In the end I've now added a new upper planner type to allow me to
create a RelOptInfo for the partial aggregate relation, so that I can
pass create_gather_path() a relation with the correct PathTarget. This
seemed better than borrowing grouped_rel, then overriding the
reltarget after create_gather_path() returned. Although I'm willing to
consider the option that someone will disagree with how I've done
things here.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-18.patch

Re: Parallel Aggregate

От

Amit Kapila

Дата:

17 марта 2016 г., 15:22:21

On Thu, Mar 17, 2016 at 10:35 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 17 March 2016 at 01:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Few assorted comments:
> >
> > 2.
> > AggPath *
> > create_agg_path(PlannerInfo *root,
> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
> >
> > List *groupClause,
> > List *qual,
> > const AggClauseCosts
> > *aggcosts,
> > - double numGroups)
> > + double numGroups,
> > +
> > bool combineStates,
> > + bool finalizeAggs)
> >
> > Don't you need to set parallel_aware flag in this function as we do for
> > create_seqscan_path()?
>
> I don't really know the answer to that... I mean there's nothing
> special done in nodeAgg.c if the node is running in a worker or in the
> main process.

On again thinking about it, I think it is okay to set parallel_aware flag as false. This flag means whether that particular node has any parallelism behaviour which is true for seqscan, but I think not for partial aggregate node.

Few other comments on latest patch:

+ /*

+ * XXX does this need estimated for each partial path, or are they

+ * all

going to be the same anyway?

+ */

+ dNumPartialGroups = get_number_of_groups(root,

clamp_row_est(partial_aggregate_path->rows),

rollup_lists,

rollup_groupclauses);

For considering partial groups, do we need to rollup related lists?

+ hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,

&agg_costs,

dNumPartialGroups);

+ /*

+ * Generate a

hashagg Path, if we can, but we'll skip this if the hash

+ * table looks like it'll exceed work_mem.

+ if (can_hash && hashaggtablesize < work_mem * 1024L)

hash table size should be estimated only if can_hash is true.

+ foreach(lc, grouped_rel->partial_pathlist)

+ {

+ Path *path =

(Path *) lfirst(lc);

+ double total_groups;

+ total_groups = path-

>parallel_degree * path->rows;

+ path = (Path *) create_gather_path(root, grouped_rel, path,

NULL,

+ &total_groups);

Do you need to perform it foreach partial path or just do it for firstpartial path?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Aggregate

От

David Rowley

Дата:

17 марта 2016 г., 16:11:19

On 18 March 2016 at 01:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>>
>> On 17 March 2016 at 01:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > Few assorted comments:
>> >
>> > 2.
>> >  AggPath *
>> >  create_agg_path(PlannerInfo *root,
>> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>> >
>> > List *groupClause,
>> >   List *qual,
>> >   const AggClauseCosts
>> > *aggcosts,
>> > - double numGroups)
>> > + double numGroups,
>> > +
>> > bool combineStates,
>> > + bool finalizeAggs)
>> >
>> > Don't you need to set parallel_aware flag in this function as we do for
>> > create_seqscan_path()?
>>
>> I don't really know the answer to that... I mean there's nothing
>> special done in nodeAgg.c if the node is running in a worker or in the
>> main process.
>>
>
> On again thinking about it, I think it is okay to set parallel_aware flag as
> false.  This flag means whether that particular node has any parallelism
> behaviour which is true for seqscan, but I think not for partial aggregate
> node.
>
> Few other comments on latest patch:
>
> 1.
>
> + /*
> + * XXX does this need estimated for each partial path, or are they
> + * all
> going to be the same anyway?
> + */
> + dNumPartialGroups = get_number_of_groups(root,
> +
> clamp_row_est(partial_aggregate_path->rows),
> +
> rollup_lists,
> +
> rollup_groupclauses);
>
> For considering partial groups, do we need to rollup related lists?

No it doesn't you're right. I did mean to remove these, but they're
NIL anyway. Seems better to remove them to prevent confusion.

> 2.
> + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
> +
>  &agg_costs,
> +
>  dNumPartialGroups);
> +
> + /*
> + * Generate a
> hashagg Path, if we can, but we'll skip this if the hash
> + * table looks like it'll exceed work_mem.
> +
> */
> + if (can_hash && hashaggtablesize < work_mem * 1024L)
>
>
> hash table size should be estimated only if can_hash is true.

Good point. Changed.

>
> 3.
> + foreach(lc, grouped_rel->partial_pathlist)
> + {
> + Path   *path =
> (Path *) lfirst(lc);
> + double total_groups;
> +
> + total_groups = path-
>>parallel_degree * path->rows;
> +
> + path = (Path *) create_gather_path(root, grouped_rel, path,
> NULL,
> +   &total_groups);
>
> Do you need to perform it foreach partial path or just do it for
> firstpartial path?

That's true. The order here does not matter since we're passing
directly into a Gather node, so it's wasteful to consider anything
apart from the cheapest path. -- Fixed.

There was also a missing hash table size check on the Finalize
HashAggregate Path consideration. I've added that now.

Updated patch is attached. Thanks for the re-review.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch

Re: Parallel Aggregate

От

Amit Kapila

Дата:

18 марта 2016 г., 10:25:06

On Thu, Mar 17, 2016 at 6:41 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 18 March 2016 at 01:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> > <david.rowley@2ndquadrant.com> wrote:
>
> Updated patch is attached. Thanks for the re-review.
>

Few more comments:

+ if (parse->groupClause)

+ path = (Path *) create_sort_path(root,

+ grouped_rel,

+ path,

+ root->group_pathkeys,

+ -1.0);

For final path, why do you want to sort just for group by case?

+ path = (Path *) create_gather_path(root, partial_grouped_rel, path,

+ NULL, &total_groups);

+ if (parse->groupClause)

+ path = (Path *) create_sort_path(root,

+ grouped_rel,

+ path,

+ root->group_pathkeys,

+ -1.0);

+ if (parse->hasAggs)

+ add_path(grouped_rel, (Path *)

+ create_agg_path(root,

+ grouped_rel,

+ path,

+ target,

+ parse->groupClause ? AGG_SORTED : AGG_PLAIN,

+ parse->groupClause,

+ (List *) parse->havingQual,

+ &agg_costs,

+ partial_grouped_rel->rows,

+ true,

+ true));

+ else

+ add_path(grouped_rel, (Path *)

+ create_group_path(root,

+ grouped_rel,

+ path,

+ target,

+ parse->groupClause,

+ (List *) parse->havingQual,

+ total_groups));

In above part of patch, it seems you are using number of groups differenetly; for create_group_path() and create_gather_path(), you have used total_groups whereas for create_agg_path() partial_grouped_rel->rows is used, is there a reason for the same?

+ if (grouped_rel->partial_pathlist)

+ {

+ Path *path = (Path *)

linitial(grouped_rel->partial_pathlist);

+ double total_groups;

+ total_groups =

path->rows * path->parallel_degree;

+ path = (Path *) create_gather_path(root, partial_grouped_rel,

path,

+ NULL, &total_groups);

A. Won't passing partial_grouped_rel lead to incomplete information required by create_gather_path() w.r.t the case of parameterized path info?

B. You have mentioned that passing grouped_rel will make gather path contain the information of final path target, but what is the problem with that? I mean to ask why Gather node is required to contain partial path target information instead of final path target.

C. Can we consider passing pathtarget to create_gather_path() as that seems to save us from inventing new UpperRelationKind? If you are worried about adding the new parameter (pathtarget) to create_gather_path(), then I think we are already passing it in many other path generation functions, so why not for gather path generation as well?

4A.

Overall function create_grouping_paths() looks better than previous, but I think still it is difficult to read. I think it can be improved by generating partial aggregate paths separately as we do for nestloop join refer function consider_parallel_nestloop

4B.

Rather than directly using create_gather_path(), can't we use generate_gather_paths as for all places where we generate gather node, generate_gather_paths() is used.

+make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)

{

+ foreach(lc, final_target->exprs)

+ {

+ Expr *expr = (Expr *) lfirst(lc);

i++;

+ if (parse->groupClause)

+ {

+ Index sgref = final_target-

>sortgrouprefs[i];

+ if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)

!= NULL)

+ {

+ /*

* It's a grouping column, so add it to the input target as-is.

+ */

add_column_to_pathtarget(input_target, expr, sgref);

+ continue;

}

+ }

+ /*

+ * Non-grouping column, so just remember the expression for later

* call to pull_var_clause.

+ */

+ non_group_cols = lappend(non_group_cols, expr);

}

Do we want to achieve something different in the above foreach loop than the similar loop in make_group_input_target(), if not then why are they not exactly same?

+ /* XXX this causes some redundant cost calculation ... */

+ input_target = set_pathtarget_cost_width(root,

input_target);

+ return input_target;

Can't we use return set_pathtarget_cost_width() directly rather than fetching it in input_target and then returning input_target?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Aggregate

От

David Rowley

Дата:

18 марта 2016 г., 16:16:58

On 18 March 2016 at 20:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Few more comments:
>
> 1.
>
> + if (parse->groupClause)
> + path = (Path *) create_sort_path(root,
> + grouped_rel,
> + path,
> + root->group_pathkeys,
> + -1.0);
>
> For final path, why do you want to sort just for group by case?

If there's no GROUP BY then there will only be a single group, this
does not require sorting, e.g SELECT SUM(col) from sometable;

I added the comment:

/*
* Gather is always unsorted, so we'll need to sort, unless there's
* no GROUP BY clause, in which case there will only be a single
* group.
*/

> 2.
> + path = (Path *) create_gather_path(root, partial_grouped_rel, path,
> +   NULL, &total_groups);
> +
> + if (parse->groupClause)
> + path = (Path *) create_sort_path(root,
> + grouped_rel,
> + path,
> + root->group_pathkeys,
> + -1.0);
> +
> + if (parse->hasAggs)
> + add_path(grouped_rel, (Path *)
> + create_agg_path(root,
> + grouped_rel,
> + path,
> + target,
> + parse->groupClause ? AGG_SORTED : AGG_PLAIN,
> + parse->groupClause,
> + (List *) parse->havingQual,
> + &agg_costs,
> + partial_grouped_rel->rows,
> + true,
> + true));
> + else
> + add_path(grouped_rel, (Path *)
> + create_group_path(root,
> +  grouped_rel,
> +  path,
> +  target,
> +  parse->groupClause,
> +  (List *) parse->havingQual,
> +  total_groups));
>
> In above part of patch, it seems you are using number of groups
> differenetly; for create_group_path() and create_gather_path(), you have
> used total_groups whereas for create_agg_path() partial_grouped_rel->rows is
> used, is there a reason for the same?

That's a mistake... too much code shuffling yesterday it seems.

> 3.
> + if (grouped_rel->partial_pathlist)
> + {
> + Path   *path = (Path *)
> linitial(grouped_rel->partial_pathlist);
> + double total_groups;
> +
> + total_groups =
> path->rows * path->parallel_degree;
> + path = (Path *) create_gather_path(root, partial_grouped_rel,
> path,
> +   NULL, &total_groups);
>
> A. Won't passing partial_grouped_rel lead to incomplete information required
> by create_gather_path() w.r.t the case of parameterized path info?

There should be no parameterized path info after joins are over, but
never-the-less I took your advice about passing PathTarget to
create_gather_path(), so this partial_grouped_rel no longer exists.

> B. You have mentioned that passing grouped_rel will make gather path contain
> the information of final path target, but what is the problem with that? I
> mean to ask why Gather node is required to contain partial path target
> information instead of final path target.

Imagine a query such as: SELECT col,SUM(this) FROM sometable GROUP BY
col HAVING SUM(somecolumn) > 0;

In this case SUM(somecolumn) won't be in the final PathTarget. The
partial grouping target will contain the Aggref from the HAVING
clause. The other difference with te partial aggregate PathTarget is
that the Aggrefs return the partial state in exprType() rather than
the final value's type, which is required so the executor knows how to
form and deform tuples, plus many other things.

> C. Can we consider passing pathtarget to create_gather_path() as that seems
> to save us from inventing new UpperRelationKind? If you are worried about
> adding the new parameter (pathtarget) to create_gather_path(), then I think
> we are already passing it in many other path generation functions, so why
> not for gather path generation as well?

That's a better idea... Changed to that...

> 4A.
> Overall function create_grouping_paths() looks better than previous, but I
> think still it is difficult to read.  I think it can be improved by
> generating partial aggregate paths separately as we do for nestloop join
> refer function consider_parallel_nestloop

hmm, perhaps the partial path generation could be moved off to another
static function, although we'd need to pass quite a few parameters to
it, like can_sort, can_hash, partial_grouping_target, grouped_rel,
root. Perhaps it's worth doing, but we still need the
partial_grouping_target for the Gather node, so it's not like that
other function can do all of the parallel stuff... We'd still need
some knowledge of that in create_grouping_paths()

> 4B.
> Rather than directly using create_gather_path(), can't we use
> generate_gather_paths as for all places where we generate gather node,
> generate_gather_paths() is used.

I don't think this is a good fit here, although it would be nice as it
would save having to special case generating the final aggregate paths
on the top of the partial paths. It does not seem that nice as it's
not really that clear if we need to make a combine aggregate node, or
a normal aggregate node on the path. The only way to determine that
would by by checking if it was a GatherPath or not, and that does not
seem like a nice way to go about doing that. Someone might go and
invent something new like MergeGather one day.

> 5.
> +make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
> {
> ..
> ..
> + foreach(lc, final_target->exprs)
> + {
> + Expr   *expr = (Expr *) lfirst(lc);
> +
> +
> i++;
> +
> + if (parse->groupClause)
> + {
> + Index sgref = final_target-
>>sortgrouprefs[i];
> +
> + if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause)
> +
> != NULL)
> + {
> + /*
> +
>  * It's a grouping column, so add it to the input target as-is.
> + */
> +
> add_column_to_pathtarget(input_target, expr, sgref);
> + continue;
> +
> }
> + }
> +
> + /*
> + * Non-grouping column, so just remember the expression for later
> +
> * call to pull_var_clause.
> + */
> + non_group_cols = lappend(non_group_cols, expr);
> +
> }
> ..
> }
>
> Do we want to achieve something different in the above foreach loop than the
> similar loop in make_group_input_target(), if not then why are they not
> exactly same?

It seems that the problem that causes me to change that around is now
gone. With the change reverted I'm unable to produce the original
crash that I was getting. I know that Tom has done quite a number of
changes to PathTargets while I've been working on this, so perhaps not
surprising. I've reverted that change now.

> 6.
> + /* XXX this causes some redundant cost calculation ... */
> + input_target = set_pathtarget_cost_width(root,
> input_target);
> + return input_target;
>
> Can't we use return set_pathtarget_cost_width() directly rather than
> fetching it in input_target and then returning input_target?

Yes, fixed.

Many thanks for the thorough review.

I've attached an updated patch.

I also tweaked the partial path generation in create_grouping_paths()
so that it only considers sorting the cheapest path, or using any
existing pre-sorted paths, rather than trying to sort every path.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-19.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

18 марта 2016 г., 19:46:31

On Wed, Mar 16, 2016 at 5:05 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> Cool!  Why not initialize aggpartialtype always?
>
> Because the follow-on patch sets that to either the serialtype or the
> aggtranstype, depending on if serialisation is required. Serialisation
> is required for parallel aggregate, but if we're performing the
> partial agg in the main process, then we'd not need to do that. This
> could be solved by adding more fields to AggRef to cover the
> aggserialtype and perhaps expanding aggpartial into an enum mode which
> allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
> attention to the mode and return 1 of the 3 possible types.

Urk.  That might still be better than what you have right now, but
it's obviously not great.  How about ditching aggpartialtype and
adding aggoutputtype instead?  Then you can always initialize that to
whatever it's supposed to be based on the type of aggregation you are
doing, and exprType() can simply return that field.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Alvaro Herrera

Дата:

18 марта 2016 г., 23:53:56

I read this a bit, as an exercise to try to follow parallel query a bit.
I think the most interesting thing I have to say is that the new error
messages in ExecInitExpr do not conform to our style.  Probably just
downcase everything except Aggref and you're done, since they're
can't-happen conditions anyway.  The comments below are mostly as I
learn how the whole thing works so if I'm talking nonsense, I'm happy to
be ignored or, better yet, educated.

I think the way we set the "consider_parallel" flag is a bit odd (we
just "return" out of the function in the cases were it mustn't be set);
but that mechanism is already part of set_rel_consider_parallel and
similar to (but not quite like) longstanding routines such as
set_rel_width, so nothing new in this patch.  I find this a bit funny
coding, but then this is the planner so maybe it's okay.

I think the comment on search_indexed_tlist_for_partial_aggref is a bit
bogus; it says it returns an existing Var, but what it does is
manufacture one itself.  I *think* the code is all right, but the
comment seems misleading.

In set_combineagg_references(), there are two calls to
fix_combine_agg_expr(); I think the one hanging on the
search_indexed_tlist_for_sortgroupref call is useless; you could just
have the "if newexpr != NULL" in the outer block (after initializing to
NULL before the ressortgroupref check).

set_combineagg_references's comment says it does something "similar to
set_upper_references, and additionally" some more stuff, but reading the
code for both functions I think that's not quite true.  I think the
comment should say that both functions are parallel, but one works for
partial aggs and the other doesn't.  Actually, what happens if you feed
an agg plan with combineStates to set_upper_references?  If it still
works but the result is not optimal, maybe we should check against that
case, so as to avoid the case where somebody hacks this further and the
plans are suddenly not agg-combined anymore.  What I actually expect to
happen is that something would explode during execution; in that case
perhaps it's better to add a comment?  (In further looking at other
setrefs.c similar functions, maybe it's fine the way you have it.)

Back at make_partialgroup_input_target, the comment says "so that they
can be found later in Combine Aggregate nodes during setrefs".  I think
it's better to be explicit and say ".. can be found later during
set_combineagg_references" or something.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

18 марта 2016 г., 23:57:04

On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I've attached an updated patch.

This looks substantially better than earlier versions, although I
haven't studied every detail of it yet.

+ * Partial aggregation requires that each aggregate does not have a DISTINCT or
+ * ORDER BY clause, and that it also has a combine function set. Since partial

I understand why partial aggregation doesn't work if you have an ORDER
BY clause attached to the aggregate itself, but it's not so obvious to
me that using DISTINCT should rule it out.  I guess we can do it that
way for now, but it seems aggregate-specific - e.g. AVG() can't cope
with DISTINCT, but MIN() or MAX() wouldn't care.  Maybe MIN() and
MAX() are the outliers in this regard, but they are a pretty common
case.

+ * An Aggref can operate in one of two modes. Normally an aggregate function's
+ * value is calculated with a single executor Agg node, however there are
+ * times, such as parallel aggregation when we want to calculate the aggregate

I think you should adjust the punctuation to "with a single executor
Agg node; however, there are".  And maybe drop the word "executor".

And on the next line, I'd add a comma: "such as parallel aggregation,
when we want".
                               astate->xprstate.evalfunc =
(ExprStateEvalFunc) ExecEvalAggref;
-                               if (parent && IsA(parent, AggState))
+                               if (!aggstate || !IsA(aggstate, AggState))                               {
-                                       AggState   *aggstate =
(AggState *) parent;
-
-                                       aggstate->aggs = lcons(astate,
aggstate->aggs);
-                                       aggstate->numaggs++;
+                                       /* planner messed up */
+                                       elog(ERROR, "Aggref found in
non-Agg plan node");                               }
-                               else
+                               if (aggref->aggpartial ==
aggstate->finalizeAggs)                               {                                       /* planner messed up */
-                                       elog(ERROR, "Aggref found in
non-Agg plan node");
+                                       if (aggref->aggpartial)
+                                               elog(ERROR, "Partial
type Aggref found in FinalizeAgg plan node");
+                                       else
+                                               elog(ERROR,
"Non-Partial type Aggref found in Non-FinalizeAgg plan node");                               }
+                               aggstate->aggs = lcons(astate, aggstate->aggs);
+                               aggstate->numaggs++;

This seems like it involves more code rearrangement than is really
necessary here.

+        * Partial paths in the input rel could allow us to perform
aggregation in
+        * parallel, set_grouped_rel_consider_parallel() will determine if it's
+        * going to be safe to do so.

Change comma to semicolon or period.
       /*        * Generate a HashAgg Path atop of the cheapest partial path, once        * again, we'll only do this
ifit looks as though the hash table won't        * exceed work_mem.        */
 

Same here.  Commas are not the way to connect two independent sentences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

19 марта 2016 г., 03:36:06

On 19 March 2016 at 05:46, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 5:05 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>>> Cool!  Why not initialize aggpartialtype always?
>>
>> Because the follow-on patch sets that to either the serialtype or the
>> aggtranstype, depending on if serialisation is required. Serialisation
>> is required for parallel aggregate, but if we're performing the
>> partial agg in the main process, then we'd not need to do that. This
>> could be solved by adding more fields to AggRef to cover the
>> aggserialtype and perhaps expanding aggpartial into an enum mode which
>> allows NORMAL, PARTIAL, PARTIAL_SERIALIZE, and have exprType() pay
>> attention to the mode and return 1 of the 3 possible types.
>
> Urk.  That might still be better than what you have right now, but
> it's obviously not great.  How about ditching aggpartialtype and
> adding aggoutputtype instead?  Then you can always initialize that to
> whatever it's supposed to be based on the type of aggregation you are
> doing, and exprType() can simply return that field.

hmm, that might be better, but it kinda leaves aggpartial without much
of a job to do. The only code which depends on that is the sanity
checks that I added in execQual.c, and it does not really seem worth
keeping it for that. The only sanity check that I can think to do here
is if (aggstate->finalizeAggs && aggref->aggoutputtype !=
aggref->aggtype) -- we have a problem. Obviously we can't check that
for non-finalize nodes since the aggtype can match the aggoutputtype
for legitimate reasons.



-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

19 марта 2016 г., 05:32:16

On 19 March 2016 at 09:53, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I read this a bit, as an exercise to try to follow parallel query a bit.

Thanks for taking a look at this.

> I think the most interesting thing I have to say is that the new error
> messages in ExecInitExpr do not conform to our style.  Probably just
> downcase everything except Aggref and you're done, since they're
> can't-happen conditions anyway.  The comments below are mostly as I
> learn how the whole thing works so if I'm talking nonsense, I'm happy to
> be ignored or, better yet, educated.

Per a comment above from Robert I ended up just removing most of that
code due to the removal of Aggref->aggpartial. It did not seem worth
keeping aggpartial around just for these errors either, but let me
know if you think otherwise.

> I think the way we set the "consider_parallel" flag is a bit odd (we
> just "return" out of the function in the cases were it mustn't be set);
> but that mechanism is already part of set_rel_consider_parallel and
> similar to (but not quite like) longstanding routines such as
> set_rel_width, so nothing new in this patch.  I find this a bit funny
> coding, but then this is the planner so maybe it's okay.

hmm, I think its very similar to set_rel_consider_parallel(), as that
just does return for non-supported cases, and if it makes it until the
end of the function consider_parallel is set to true.

Would you rather see;

/* we can do nothing in parallel if there's no aggregates or group by */
if (!parse->hasAggs && parse->groupClause == NIL)
grouped_rel->consider_parallel = false;

/* grouping sets are currently not supported by parallel aggregate */
else if (parse->groupingSets)
grouped_rel->consider_parallel = false;

...?

> I think the comment on search_indexed_tlist_for_partial_aggref is a bit
> bogus; it says it returns an existing Var, but what it does is
> manufacture one itself.  I *think* the code is all right, but the
> comment seems misleading.

Ok, I've changed it to:

search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist

which seems to be equivalent to search_indexed_tlist_for_non_var()'s comment.

> In set_combineagg_references(), there are two calls to
> fix_combine_agg_expr(); I think the one hanging on the
> search_indexed_tlist_for_sortgroupref call is useless; you could just
> have the "if newexpr != NULL" in the outer block (after initializing to
> NULL before the ressortgroupref check).

Yes, but see set_upper_references(), it just follows the pattern there
but calls fix_combine_agg_expr() instead of fix_upper_expr(). For
simplicity of review it seems to be nice to keep it following the same
pattern.

> set_combineagg_references's comment says it does something "similar to
> set_upper_references, and additionally" some more stuff, but reading the
> code for both functions I think that's not quite true.  I think the
> comment should say that both functions are parallel, but one works for
> partial aggs and the other doesn't.

Ok, I've changed the comment to:
/*
 * set_combineagg_references
 *  This does a similar job as set_upper_references(), but treats Aggrefs
 *  in a different way. Here we transforms Aggref nodes args to suit the
 *  combine aggregate phase. This means that the Aggref->args are converted
 *  to reference the corresponding aggregate function in the subplan rather
 *  than simple Var(s), as would be the case for a non-combine aggregate
 *  node.
 */

> Actually, what happens if you feed
> an agg plan with combineStates to set_upper_references?  If it still
> works but the result is not optimal, maybe we should check against that
> case, so as to avoid the case where somebody hacks this further and the
> plans are suddenly not agg-combined anymore.  What I actually expect to
> happen is that something would explode during execution; in that case
> perhaps it's better to add a comment?  (In further looking at other
> setrefs.c similar functions, maybe it's fine the way you have it.)

This simply won't work as this is the code which causes the
sum((sum(num))) in the combine aggregate's target list.
The normal code is trying to look for Vars, but this code is trying to
find that sum(num) inside the subplan target list.

Example EXPLAIN VERBOSE output:
Finalize Aggregate  (cost=105780.67..105780.68 rows=1 width=8)
  Output: pg_catalog.sum((sum(num)))

Try commenting out;

if (aggplan->combineStates)
set_combineagg_references(root, plan, rtoffset);
else

to see the horrors than ensue. It'll basically turn aggregate
functions in to a rather inefficient random number generator. This is
because the combine Aggref->args still point to "num", instead of
sum(num).

> Back at make_partialgroup_input_target, the comment says "so that they
> can be found later in Combine Aggregate nodes during setrefs".  I think
> it's better to be explicit and say ".. can be found later during
> set_combineagg_references" or something.

Changed. Thanks for the review.

Updated patch is attached.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-19a.patch

Re: Parallel Aggregate

От

Robert Haas

Дата:

19 марта 2016 г., 17:19:12

On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Updated patch is attached.

I think this looks structurally correct now, and I think it's doing
the right thing as far as parallelism is concerned.  I don't see any
obvious problems in the rest of it, either, but I haven't thought
hugely deeply about the way you are doing the costing, nor have I
totally convinced myself that all of the PathTarget and setrefs stuff
is correct.  But I think it's probably pretty close.  I'll study it
some more next week.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

20 марта 2016 г., 01:07:34

On 19 March 2016 at 06:15, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 18, 2016 at 9:16 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> I've attached an updated patch.
>
> This looks substantially better than earlier versions, although I
> haven't studied every detail of it yet.
>
> + * Partial aggregation requires that each aggregate does not have a DISTINCT or
> + * ORDER BY clause, and that it also has a combine function set. Since partial
>
> I understand why partial aggregation doesn't work if you have an ORDER
> BY clause attached to the aggregate itself, but it's not so obvious to
> me that using DISTINCT should rule it out.  I guess we can do it that
> way for now, but it seems aggregate-specific - e.g. AVG() can't cope
> with DISTINCT, but MIN() or MAX() wouldn't care.  Maybe MIN() and
> MAX() are the outliers in this regard, but they are a pretty common
> case.

hmm? We'd have no way to ensure that two worker processes aggregated
only values which other workers didn't.
Of course this just happens to be equivalent for MIN() and MAX(), but
today we don't attempt to transform MIN(DISTINCT col) to MIN(col), so
I see no reason at all why this patch should go and add something
along those lines. Perhaps it's something for the future though,
although it's certainly not anything specific to parallel aggregate.

> + * An Aggref can operate in one of two modes. Normally an aggregate function's
> + * value is calculated with a single executor Agg node, however there are
> + * times, such as parallel aggregation when we want to calculate the aggregate
>
> I think you should adjust the punctuation to "with a single executor
> Agg node; however, there are".  And maybe drop the word "executor".
>
> And on the next line, I'd add a comma: "such as parallel aggregation,
> when we want".

Fixed. Although I've revised that block a bit after getting rid of aggpartial.

>
>                                 astate->xprstate.evalfunc =
> (ExprStateEvalFunc) ExecEvalAggref;
> -                               if (parent && IsA(parent, AggState))
> +                               if (!aggstate || !IsA(aggstate, AggState))
>                                 {
> -                                       AggState   *aggstate =
> (AggState *) parent;
> -
> -                                       aggstate->aggs = lcons(astate,
> aggstate->aggs);
> -                                       aggstate->numaggs++;
> +                                       /* planner messed up */
> +                                       elog(ERROR, "Aggref found in
> non-Agg plan node");
>                                 }
> -                               else
> +                               if (aggref->aggpartial ==
> aggstate->finalizeAggs)
>                                 {
>                                         /* planner messed up */
> -                                       elog(ERROR, "Aggref found in
> non-Agg plan node");
> +                                       if (aggref->aggpartial)
> +                                               elog(ERROR, "Partial
> type Aggref found in FinalizeAgg plan node");
> +                                       else
> +                                               elog(ERROR,
> "Non-Partial type Aggref found in Non-FinalizeAgg plan node");
>                                 }
> +                               aggstate->aggs = lcons(astate, aggstate->aggs);
> +                               aggstate->numaggs++;
>
> This seems like it involves more code rearrangement than is really
> necessary here.

This is mostly gone, as after removing aggpartial some of these checks
are not possible. I just have some additional code:

Aggref   *aggref = (Aggref *) node;

if (aggstate->finalizeAggs &&
aggref->aggoutputtype != aggref->aggtype)
{
/* planner messed up */
elog(ERROR, "Aggref aggoutputtype must match aggtype");
}

But nothing to sanity check non-finalize nodes.

>
> +        * Partial paths in the input rel could allow us to perform
> aggregation in
> +        * parallel, set_grouped_rel_consider_parallel() will determine if it's
> +        * going to be safe to do so.
>
> Change comma to semicolon or period.

Changed.

>         /*
>          * Generate a HashAgg Path atop of the cheapest partial path, once
>          * again, we'll only do this if it looks as though the hash table won't
>          * exceed work_mem.
>          */
>
> Same here.  Commas are not the way to connect two independent sentences.

Changed.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

20 марта 2016 г., 11:58:21

On 20 March 2016 at 03:19, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> Updated patch is attached.
>
> I think this looks structurally correct now, and I think it's doing
> the right thing as far as parallelism is concerned.  I don't see any
> obvious problems in the rest of it, either, but I haven't thought
> hugely deeply about the way you are doing the costing, nor have I
> totally convinced myself that all of the PathTarget and setrefs stuff
> is correct.  But I think it's probably pretty close.  I'll study it
> some more next week.

Thank you for the reviews. The only thing I can think to mention which
I've not already is that I designed estimate_hashagg_tablesize() to be
reusable in various places in planner.c, yet I've only made use of it
in create_grouping_paths(). I would imagine that it might be nice to
also modify the other places which perform a similar calculation to
use that function, but I don't think that needs to be done for this
patch... perhaps a follow-on cleanup.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Tomas Vondra

Дата:

20 марта 2016 г., 23:47:25

Hi,

On 03/20/2016 09:58 AM, David Rowley wrote:
> On 20 March 2016 at 03:19, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 18, 2016 at 10:32 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>> Updated patch is attached.
>>
>> I think this looks structurally correct now, and I think it's doing
>> the right thing as far as parallelism is concerned.  I don't see any
>> obvious problems in the rest of it, either, but I haven't thought
>> hugely deeply about the way you are doing the costing, nor have I
>> totally convinced myself that all of the PathTarget and setrefs stuff
>> is correct.  But I think it's probably pretty close.  I'll study it
>> some more next week.
>
> Thank you for the reviews. The only thing I can think to mention which
> I've not already is that I designed estimate_hashagg_tablesize() to be
> reusable in various places in planner.c, yet I've only made use of it
> in create_grouping_paths(). I would imagine that it might be nice to
> also modify the other places which perform a similar calculation to
> use that function, but I don't think that needs to be done for this
> patch... perhaps a follow-on cleanup.

Hmmm, so how many places that could use the new function are there? I've 
only found these two:
   create_distinct_paths (planner.c)   choose_hashed_setop (prepunion.c)

That doesn't seem like an extremely high number, so perhaps doing it in 
this patch would be fine. However if the function is defined as static, 
choose_hashed_setop won't be able to use it anyway (well, it'll have to 
move the prototype into a header and remove the static).

I wonder why we would need to allow cost_agg=NULL, though? I mean, if 
there are no costing information, wouldn't it be better to either raise 
an error, or at the very least do something like:
    } else        hashentrysize += hash_agg_entry_size(0);

which is what create_distinct_paths does?

I'm not sure changing the meaning of enable_hashagg like this is a good 
idea. It worked as a hard switch before, while with this change that 
would not be the case. Or more accurately - it would not be the case for 
aggregates, but it would still work the old way for other types of 
plans. Not sure that's a particularly good idea.

What about introducing a GUC to enable parallel aggregate, while still 
allowing other types of parallel nodes (I guess that would be handy for 
other types of parallel nodes - it's a bit blunt tool, but tweaking 
max_parallel_degree is even blumter)? e.g. enable_parallelagg?

I do also have a question about parallel aggregate vs. work_mem. 
Nowadays we mostly say to users a query may allocate a multiple of 
work_mem, up to one per node in the plan. Apparently with parallel 
aggregate we'll have to say "multiplied by number of workers", because 
each aggregate worker may allocate up to hashaggtablesize. Is that 
reasonable? Shouldn't we restrict the total size of hash tables in all 
workers somehow?

create_grouping_paths also contains this comment:
 /*  * Generate HashAgg Path providing the estimated hash table size is not  * too big, although if no other Paths were
generatedabove, then we'll  * begrudgingly generate one so that we actually have a Path to work  * with.  */

I'm not sure this is a particularly clear comment, I think the old one 
was way much informative despite being a single line:

/* Consider reasons to disable hashing, but only if we can sort instead */

BTW create_grouping_paths probably grew to a size when splitting it into 
smaller pieces would be helpful?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

21 марта 2016 г., 02:30:13

On 21 March 2016 at 09:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 03/20/2016 09:58 AM, David Rowley wrote:
>> Thank you for the reviews. The only thing I can think to mention which
>> I've not already is that I designed estimate_hashagg_tablesize() to be
>> reusable in various places in planner.c, yet I've only made use of it
>> in create_grouping_paths(). I would imagine that it might be nice to
>> also modify the other places which perform a similar calculation to
>> use that function, but I don't think that needs to be done for this
>> patch... perhaps a follow-on cleanup.
>
> Hmmm, so how many places that could use the new function are there? I've
> only found these two:
>
>    create_distinct_paths (planner.c)
>    choose_hashed_setop (prepunion.c)
>
> That doesn't seem like an extremely high number, so perhaps doing it in this
> patch would be fine. However if the function is defined as static,
> choose_hashed_setop won't be able to use it anyway (well, it'll have to move
> the prototype into a header and remove the static).
>
> I wonder why we would need to allow cost_agg=NULL, though? I mean, if there
> are no costing information, wouldn't it be better to either raise an error,
> or at the very least do something like:
>
>     } else
>         hashentrysize += hash_agg_entry_size(0);
>
> which is what create_distinct_paths does?

Yes, it should do that... My mistake. I've ended up just removing the
NULL check as I don't want to touch create_distinct_paths() in this
patch. I'd rather leave that as a small cleanup patch for later,
although the code in create_distinct_paths() is more simple without
the aggregate sizes being calculated, so there's perhaps less of a
need to use the helper function there. If that cleanup patch
materialises then the else hashentrysize += hash_agg_entry_size(0);
can be added with that patch.

> I'm not sure changing the meaning of enable_hashagg like this is a good
> idea. It worked as a hard switch before, while with this change that would
> not be the case. Or more accurately - it would not be the case for
> aggregates, but it would still work the old way for other types of plans.
> Not sure that's a particularly good idea.

Hmm, I don't see how it was a hard switch before. If we were unable to
sort by the group by clause then hashagg would magically be enabled.
The reason I did this was to simplify the logic in
create_grouping_paths(). What difference do you imagine that there
actually is here?

The only thing I can think of is; we now generate a hashagg path where
we previously didn't. This has disable_cost added to the startup_cost
so is quite unlikely to win. Perhaps there is some differences if
someone did SET enable_sort = false; SET enable_hashagg = false; I'm
not sure if we should be worried there though. Also maybe there's
going to be a difference if the plan costings were so high that
disable_cost was drowned out by the other costings.

Apart from that, It would actually be nice to be consistent with this
enable_* GUCs, as to my knowledge the others all just add disable_cost
to the startup_cost of the path.

> What about introducing a GUC to enable parallel aggregate, while still
> allowing other types of parallel nodes (I guess that would be handy for
> other types of parallel nodes - it's a bit blunt tool, but tweaking
> max_parallel_degree is even blumter)? e.g. enable_parallelagg?

Haribabu this in his version of the patch and I didn't really
understand the need for it, I assumed it was for testing only. We
don't have enable_parallelseqscan, and would we plan on adding GUCs
each time we enable a node for parallelism? I really don't think so,
we already have parallel hash join and nested loop join without GUCs
to disable them. I see no reason to add them there, and I also don't
here.

> I do also have a question about parallel aggregate vs. work_mem. Nowadays we
> mostly say to users a query may allocate a multiple of work_mem, up to one
> per node in the plan. Apparently with parallel aggregate we'll have to say
> "multiplied by number of workers", because each aggregate worker may
> allocate up to hashaggtablesize. Is that reasonable? Shouldn't we restrict
> the total size of hash tables in all workers somehow?

I did think about this, but thought, either;

1) that a project wide decision should be made on how to handle this,
not just for parallel aggregate, but parallel hash join too, which as
I understand it, for now it builds an identical hashtable per worker.
2) the limit is per node, per connection, and parallel aggregates have
multiple connections, so we might not be breaking our definition of
how to define work_mem, since we're still limited by max_connections
anyway.

> create_grouping_paths also contains this comment:
>
>  /*
>   * Generate HashAgg Path providing the estimated hash table size is not
>   * too big, although if no other Paths were generated above, then we'll
>   * begrudgingly generate one so that we actually have a Path to work
>   * with.
>   */
>
> I'm not sure this is a particularly clear comment, I think the old one was
> way much informative despite being a single line:
>
> /* Consider reasons to disable hashing, but only if we can sort instead */

hmm, I find it quite clear, but perhaps that's because I wrote the
code. I'm not really sure what you're not finding clear about it to be
honest. Tom's original comment was quite generic to allow for more
reasons, but I removed one of those reasons by simplifying the logic
around enable_hashagg, so I didn't think Tom's comment suited well
anymore.

I've rewritten the comment to become:

/*
* Providing that the estimated size of the hashtable does not exceed
* work_mem, we'll generate a HashAgg Path, although if we were unable
* to sort above, then we'd better generate a Path, so that we at least
* have one.
*/

How about that?

> BTW create_grouping_paths probably grew to a size when splitting it into
> smaller pieces would be helpful?

I'd rather not. Amit mentioned this too [1]. See 4A. Robert has marked
it as ready for committer, so I really don't want to start hacking it
up too much at this stage unless Robert requests so.

An updated patch is attached. This hopefully addresses your concerns
with the comment, and also the estimate_hashagg_tablesize() NULL
checking.

[1] http://www.postgresql.org/message-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

0001-Allow-aggregation-to-happen-in-parallel_2016-03-21.patch

Re: Parallel Aggregate

От

Tomas Vondra

Дата:

21 марта 2016 г., 03:01:46

Hi,

On 03/21/2016 12:30 AM, David Rowley wrote:
> On 21 March 2016 at 09:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>  ...
>> I'm not sure changing the meaning of enable_hashagg like this is a
>> good idea. It worked as a hard switch before, while with this
>> change that would not be the case. Or more accurately - it would
>> not be the case for aggregates, but it would still work the old way
>> for other types of plans. Not sure that's a particularly good
>> idea.
>
> Hmm, I don't see how it was a hard switch before. If we were unable
> to sort by the group by clause then hashagg would magically be
> enabled.

Sure, but that's not what I meant by the "hard switch" (sorry for the 
inaccuracy). Let's assume we can actually so the sorted aggregate. In 
that case the enable_hashagg=off entirely skipped the hashagg path, 
irrespectively of the cost or whatever. With the new code we will add 
the path, and it may actually win on the basis of cost (e.g. if there's 
enable_sort=off at the same time).

But I'm not convinced this is actually wrong, I merely pointed out the 
behavior is not exactly the same and may have unintended consequences.

> The reason I did this was to simplify the logic in
> create_grouping_paths(). What difference do you imagine that there
> actually is here?

That the hashed path may win over the sorted path, as explained above.

> The only thing I can think of is; we now generate a hashagg path
> where we previously didn't. This has disable_cost added to the
> startup_cost so is quite unlikely to win. Perhaps there is some
> differences if someone did SET enable_sort = false; SET
> enable_hashagg = false; I'm not sure if we should be worried there
> though. Also maybe there's going to be a difference if the plan
> costings were so high that disable_cost was drowned out by the other
> costings.

Ah, I see you came to the same conclusion ...

>
> Apart from that, It would actually be nice to be consistent with
> this enable_* GUCs, as to my knowledge the others all just add
> disable_cost to the startup_cost of the path.

Perhaps.

>
>> What about introducing a GUC to enable parallel aggregate, while
>> still allowing other types of parallel nodes (I guess that would be
>> handy for other types of parallel nodes - it's a bit blunt tool,
>> but tweaking max_parallel_degree is even blumter)? e.g.
>> enable_parallelagg?
>
> Haribabu this in his version of the patch and I didn't really
> understand the need for it, I assumed it was for testing only. We
> don't have enable_parallelseqscan, and would we plan on adding GUCs
> each time we enable a node for parallelism? I really don't think so,
> we already have parallel hash join and nested loop join without GUCs
> to disable them. I see no reason to add them there, and I also don't
> here.

I'm not so sure about that. I certainly think we'll be debugging queries 
in a not so distant future, wishing for such GUCs.

>> I do also have a question about parallel aggregate vs. work_mem.
>> Nowadays we mostly say to users a query may allocate a multiple of
>> work_mem, up to one per node in the plan. Apparently with parallel
>> aggregate we'll have to say "multiplied by number of workers", because
>> each aggregate worker may allocate up to hashaggtablesize. Is that
>> reasonable? Shouldn't we restrict the total size of hash tables in
>> all workers somehow?
>
> I did think about this, but thought, either;
>
> 1) that a project wide decision should be made on how to handle this,
> not just for parallel aggregate, but parallel hash join too, which as
> I understand it, for now it builds an identical hashtable per worker.>
> 2) the limit is per node, per connection, and parallel aggregates have
> multiple connections, so we might not be breaking our definition of
> how to define work_mem, since we're still limited by max_connections
> anyway.
>

I do agree this is not just a question for parallel aggregate, and that 
perhaps we're not breaking the definition. But I think in practice we'll 
be hitting the memory limits much more often, because parallel queries 
are very synchronized (running the same node on all parallel workers at 
exactly the same time).

Not sure if we have to reflect that in the planner, but it will probably 
impact what work_mem values are considered safe.

>> create_grouping_paths also contains this comment:
>>
>>  /*
>>   * Generate HashAgg Path providing the estimated hash table size is not
>>   * too big, although if no other Paths were generated above, then we'll
>>   * begrudgingly generate one so that we actually have a Path to work
>>   * with.
>>   */
>>
>> I'm not sure this is a particularly clear comment, I think the old one was
>> way much informative despite being a single line:
>>
>> /* Consider reasons to disable hashing, but only if we can sort instead */
>
> hmm, I find it quite clear, but perhaps that's because I wrote the
> code. I'm not really sure what you're not finding clear about it to be
> honest. Tom's original comment was quite generic to allow for more
> reasons, but I removed one of those reasons by simplifying the logic
> around enable_hashagg, so I didn't think Tom's comment suited well
> anymore.
>
> I've rewritten the comment to become:
>
> /*
> * Providing that the estimated size of the hashtable does not exceed
> * work_mem, we'll generate a HashAgg Path, although if we were unable
> * to sort above, then we'd better generate a Path, so that we at least
> * have one.
> */
>
> How about that?

OK

>
>> BTW create_grouping_paths probably grew to a size when splitting it into
>> smaller pieces would be helpful?
>
> I'd rather not. Amit mentioned this too [1]. See 4A. Robert has marked
> it as ready for committer, so I really don't want to start hacking it
> up too much at this stage unless Robert requests so.

OK

>
> An updated patch is attached. This hopefully addresses your concerns
> with the comment, and also the estimate_hashagg_tablesize() NULL
> checking.
>
> [1] http://www.postgresql.org/message-id/CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com

thanks

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

Alvaro Herrera

Дата:

21 марта 2016 г., 05:48:55

David Rowley wrote:
> I've rewritten the comment to become:
> 
> /*
> * Providing that the estimated size of the hashtable does not exceed
> * work_mem, we'll generate a HashAgg Path, although if we were unable
> * to sort above, then we'd better generate a Path, so that we at least
> * have one.
> */
> 
> How about that?

I think "Providing" should be "Provided".

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

David Rowley

Дата:

21 марта 2016 г., 05:59:24

On 21 March 2016 at 15:48, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> David Rowley wrote:
>
>> I've rewritten the comment to become:
>>
>> /*
>> * Providing that the estimated size of the hashtable does not exceed
>> * work_mem, we'll generate a HashAgg Path, although if we were unable
>> * to sort above, then we'd better generate a Path, so that we at least
>> * have one.
>> */
>>
>> How about that?
>
> I think "Providing" should be "Provided".

Both make sense, although I do only see instances of "Provided that"
in the source.
I'm not opposed to changing it, it just does not seem worth emailing a
complete patch to do that, but let me know if you feel differently.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

Alvaro Herrera

Дата:

21 марта 2016 г., 06:24:39

David Rowley wrote:
> On 21 March 2016 at 15:48, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > David Rowley wrote:
> >
> >> I've rewritten the comment to become:
> >>
> >> /*
> >> * Providing that the estimated size of the hashtable does not exceed
> >> * work_mem, we'll generate a HashAgg Path, although if we were unable
> >> * to sort above, then we'd better generate a Path, so that we at least
> >> * have one.
> >> */
> >>
> >> How about that?
> >
> > I think "Providing" should be "Provided".
> 
> Both make sense, although I do only see instances of "Provided that"
> in the source.

Interesting.  "Providing that" seems awkward to me, and I had only seen
the other wording thus far, but
http://english.stackexchange.com/questions/149459/what-is-the-difference-between-providing-that-and-provided-that
explains that I'm wrong.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Aggregate

От

Robert Haas

Дата:

21 марта 2016 г., 15:29:51

On Sun, Mar 20, 2016 at 11:24 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> David Rowley wrote:
>> On 21 March 2016 at 15:48, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> > David Rowley wrote:
>> >
>> >> I've rewritten the comment to become:
>> >>
>> >> /*
>> >> * Providing that the estimated size of the hashtable does not exceed
>> >> * work_mem, we'll generate a HashAgg Path, although if we were unable
>> >> * to sort above, then we'd better generate a Path, so that we at least
>> >> * have one.
>> >> */
>> >>
>> >> How about that?
>> >
>> > I think "Providing" should be "Provided".
>>
>> Both make sense, although I do only see instances of "Provided that"
>> in the source.
>
> Interesting.  "Providing that" seems awkward to me, and I had only seen
> the other wording thus far, but
> http://english.stackexchange.com/questions/149459/what-is-the-difference-between-providing-that-and-provided-that
> explains that I'm wrong.

Well, my instincts are the same as yours, actually...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

Robert Haas

Дата:

21 марта 2016 г., 16:35:26

On Sun, Mar 20, 2016 at 7:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> An updated patch is attached. This hopefully addresses your concerns
> with the comment, and also the estimate_hashagg_tablesize() NULL
> checking.

I have committed this after changing some of the comments.

There might still be bugs ... but I don't see them.  And the speedups
look very impressive.

Really nice work, David.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Aggregate

От

David Rowley

Дата:

21 марта 2016 г., 21:21:43

On 22 March 2016 at 02:35, Robert Haas <robertmhaas@gmail.com> wrote:
> I have committed this after changing some of the comments.
>
> There might still be bugs ... but I don't see them.  And the speedups
> look very impressive.
>
> Really nice work, David.

Thanks for that, and thank you for taking the time to carefully review
it and commit it.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel Aggregate

От

James Sewell

Дата:

22 марта 2016 г., 08:47:35

Good news!<span></span><br /><br />On Tuesday, 22 March 2016, David Rowley <<a
href="mailto:david.rowley@2ndquadrant.com">david.rowley@2ndquadrant.com</a>>wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 22 March 2016 at 02:35,
RobertHaas <<a href="javascript:;">robertmhaas@gmail.com</a>> wrote:<br /> > I have committed this after
changingsome of the comments.<br /> ><br /> > There might still be bugs ... but I don't see them.  And the
speedups<br/> > look very impressive.<br /> ><br /> > Really nice work, David.<br /><br /> Thanks for that,
andthank you for taking the time to carefully review<br /> it and commit it.<br /><br /> --<br />  David Rowley       
          <a href="http://www.2ndQuadrant.com/" target="_blank">http://www.2ndQuadrant.com/</a><br />  PostgreSQL
Development,24x7 Support, Training & Services<br /><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<a
href="javascript:;">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote><br /><br />-- <br /><div
style="color:#b8276a;font:bold14px Arial,Helvetica,sans-serif"><br /><span style="color:#323b62">James
Sewell,</span><br/><span style="color:#a9a9a9"><span style="font:12px Arial,Helvetica,sans-serif">PostgreSQL Team Lead
/Solutions Architect </span></span><br /><span style="color:#a9a9a9">______________________________________</span><br
/> </div><img alt="" src="http://www.lisasoft.com/sites/lisasoft/files/u1/logo1.jpg"
style="margin:0px;width:153px;height:50px"/><br /><div style="font:normal 12px/20px
Arial,Helvetica,sans-serif;color:#404040;margin:0"><spanstyle="color:#a9a9a9">Level 2, 50 Queen St, Melbourne VIC
3000</span><br/><br /><strong>P </strong><span style="color:#a9a9a9"><span
style="font-family:Arial,Verdana,sans-serif">(+61)3 8370 8000</span></span><span
style="color:rgb(169,169,169)"> </span><strong></strong><span style="color:rgb(50,59,98)"><span
style="width:15px;display:inline-block"><strong>W</strong></span></span><a style="color:rgb(64,64,64);margin:2px
0px;text-decoration:none"><spanstyle="color:#a9a9a9">www.lisasoft.com</span></a>  <span
style="color:rgb(50,59,98)"><spanstyle="width:15px;display:inline-block"><strong>F </strong></span></span><span
style="color:#a9a9a9"><spanstyle="font-family:Arial,Verdana,sans-serif">(+61) 3 8370 8099</span></span></div><div
style="padding-top:8px"> </div><br /><br /><p><div
style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:14.545454025268555px;background-color:rgb(255,255,255)"><hr
/><fontcolor="Gray" face="Arial" size="1">The contents of this email are confidential and may be subject to legal or
professionalprivilege and copyright. No representation is made that this email is free of viruses or other defects. If
youhave received this communication in error, you may not copy or distribute any part of it or otherwise disclose its
contentsto anyone. Please advise the sender of your incorrect receipt of this correspondence.</font></div>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Parallel Aggregate

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения