Re: Parallel Aggregate

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Parallel Aggregate
Дата
Msg-id CAKJS1f9k5Ej57dJ2oCJrht=ZzO8twpQsktO08K4103b3cpQsSg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: Parallel Aggregate  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Список pgsql-hackers
On 20 October 2015 at 23:23, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 13 October 2015 at 20:57, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > Also, I think the path for parallel aggregation should probably be
>> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
>> > path here.  I'm not clear whether that is what you are thinking or
>> > not.
>>
>> No. I am thinking of the following way.
>> Gather->partialagg->some partial path
>>
>> I want the Gather node to merge the results coming from all workers,
>> otherwise
>> it may be difficult to merge at parent of gather node. Because in case
>> the partial
>> group aggregate is under the Gather node, if any of two workers are
>> returning
>> same group key data, we need to compare them and combine it to make it a
>> single group. If we are at Gather node, it is possible that we can
>> wait till we get
>> slots from all workers. Once all workers returns the slots we can compare
>> and merge the necessary slots and return the result. Am I missing
>> something?
>
>
> My assumption is the same as Robert's here.
> Unless I've misunderstood, it sounds like you're proposing to add logic into
> the Gather node to handle final aggregation? That sounds like a modularity
> violation of the whole node concept.
>
> The handling of the final aggregate stage is not all that different from the
> initial aggregate stage. The primary difference is just that your calling
> the combine function instead of the transition function, and the values

Yes, you are correct, till now i am thinking of using transition types as the
approach, because of that reason only I proposed it as Gather node to handle
the finalize aggregation.

> being aggregated are aggregates states rather than the type of the values
> which were initially aggregated. The handling of GROUP BY is all the same,
> yet you only apply the HAVING clause during final aggregation. This is why I
> ended up implementing this in nodeAgg.c instead of inventing some new node
> type that's mostly a copy and paste of nodeAgg.c [1]

After going through your Partial Aggregation / GROUP BY before JOIN patch,
Following is my understanding of parallel aggregate.

Finalize [hash] aggregate
        -> Gather
              -> Partial [hash] aggregate

The data that comes from the Gather node contains the group key and
grouping results.
Based on these we can generate another hash table in case of hash aggregate at
finalize aggregate and return the final results. This approach works
for both plain and
hash aggregates.

For group aggregate support of parallel aggregate, the plan should be
as follows.

Finalize Group aggregate
    ->sort
        -> Gather
              -> Partial group aggregate
                   ->sort

The data that comes from Gather node needs to be sorted again based on
the grouping key,
merge the data and generates the final grouping result.

With this approach, we no need to change anything in Gather node. Is
my understanding correct?


Our understandings are aligned. 


Hi,

I just wanted to cross post here to mark that I've posted an updated patch for combining aggregate states:

I also wanted to check if you've managed to make any progress on Parallel Aggregation? I'm very interested in this myself and would like to progress with it, if you're not already doing so.

My current thinking is that most of the remaining changes required for parallel aggregation, after applying the combine aggregate state patch, will be in the exact area that Tom will be making changes for the upper planner path-ification work. I'm not all that certain if we should hold off for that or not. 

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Combining Aggregates
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [PROPOSAL] VACUUM Progress Checker.