Re: [HACKERS] Partition-wise aggregation/grouping

Поиск
Список
Период
Сортировка
От Jeevan Chalke
Тема Re: [HACKERS] Partition-wise aggregation/grouping
Дата
Msg-id CAM2+6=Wd3f6PKjNKgYxaY+NPvtuhk5_oRZhqLOp7-N4PG8mPHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Tue, Mar 6, 2018 at 2:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> However, to perform Gather or Gather Merge once we have all partial paths
> ready, and to avoid too many existing code rearrangement, I am calling
> try_partitionwise_grouping() before we do any aggregation/grouping on whole
> relation. By doing this, we will be having all partial paths in
> partially_grouped_rel and then existing code will do required finalization
> along with any Gather or Gather Merge, if required.
>
> Please have a look over attached patch-set and let me know if it needs
> further changes.

This does look better.

Thank you, Robert.
 

+      <term><varname>enable_partitionwise_agg</varname> (<type>boolean</type>)

Please don't abbreviate "aggregate" to "agg".

This is in-lined with enable_hashagg GUC. Do you think enable_partitionwise_aggregate seems better? But it will be not consistent with other GUC names like enable_hashagg then.
 

-       /* Build final grouping paths */
-       add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-
partially_grouped_rel, agg_costs,
-
&agg_final_costs, gd, can_sort, can_hash,
-                                                         dNumGroups,
(List *) parse->havingQual);
+       if (isPartialAgg)
+       {
+               Assert(agg_partial_costs && agg_final_costs);
+               add_paths_to_partial_grouping_rel(root, input_rel,
+
           partially_grouped_rel,
+
           agg_partial_costs,
+
           gd, can_sort, can_hash,
+
           false, true);
+       }
+       else
+       {
+               double          dNumGroups;
+
+               /* Estimate number of groups. */
+               dNumGroups = get_number_of_groups(root,
+
           cheapest_path->rows,
+
           gd,
+
           child_data ? make_tlist_from_pathtarget(target) :
parse->targetList);
+
+               /* Build final grouping paths */
+               add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+
partially_grouped_rel, agg_costs,
+
agg_final_costs, gd, can_sort, can_hash,
+
dNumGroups, (List *) havingQual);
+       }

This looks strange.  Why do we go down two completely different code
paths here?

It is because when isPartialAgg = true we need to create partially aggregated non-partial paths which should be added in partially_grouped_rel->pathlist. And when isPartialAgg = false, we are creating fully aggregated paths which goes into grouped_rel->pathlist.
 
  It seems to me that the set of paths we add to the
partial_pathlist shouldn't depend at all on isPartialAgg.  I might be
confused, but it seems to me that any aggregate path we construct that
is going to run in parallel must necessarily be partial, because even
if each group will occur only in one partition, it might still occur
in multiple workers for that partition, so finalization would be
needed.

Thats's true. We are creating partially aggregated partial paths for this and keeps them in partially_grouped_rel->partial_pathlist.
 
  On the other hand, for non-partial paths, we can add then to
partially_grouped_rel when isPartialAgg = true and to grouped_rel when
isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
vs. AGGSPLIT_INITIAL_SERIAL. 

Yes. As explained above, they goes in pathlist of respective Rel.
However, PathTarget is different too, we need partial_pathtarget when isPartialAgg = true and also need agg_partial_costs.
 
But that doesn't appear to be what this
is doing.

So the code for doing partially aggregated partial paths and partially aggregated non-partial path is same except partial paths goes into partial_pathlist where as non-partial goes into pathlist of partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel() when isPartialAgg = true seems correct. Also as we have decided, this function is responsible to create all partially aggregated paths including both partial and non-partial.

Am I missing something?
 

+       /*
+        * If there are any fully aggregated partial paths present,
may be because
+        * of parallel Append over partitionwise aggregates, we must stick a
+        * Gather or Gather Merge path atop the cheapest partial path.
+        */
+       if (grouped_rel->partial_pathlist)

This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.

Well, these comments are not present anywhere else than this place. With Parallel Append and Partitionwise aggregates, it is now possible to have fully aggregated partial paths now. And thus we need to stick a Gather and/or Gather Merge atop cheapest partial path. And I believe the code does the same. Am I missing something?
 

More tomorrow...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: constraint exclusion and nulls in IN (..) clause
Следующее
От: Anton Dignös
Дата:
Сообщение: Re: IndexJoin memory problem using spgist and boxes