Re: [HACKERS] Partition-wise aggregation/grouping

Поиск

Список

Период

Сортировка

От	Jeevan Chalke
Тема	Re: [HACKERS] Partition-wise aggregation/grouping
Дата	9 октября 2017 г. 15:10:04
Msg-id	CAM2+6=WSW2qFDf-3MDz1m6AnTAxQASLcmyBf7vmFf84CR2V3Pw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] Partition-wise aggregation/grouping (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы	Re: [HACKERS] Partition-wise aggregation/grouping
Список	pgsql-hackers

Дерево обсуждения

On Thu, Sep 28, 2017 at 3:12 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

Here are comments on 0004 from last patch set. But most the comments
still apply.

Thank you, Ashutosh for reviewing.

Patch 0001 adds functions create_hash_agg_path() and create_sort_agg_path().
Patch 0004 adds a new argument to those functions for conditions in HAVING
clause. We should move those changes to 0001 and pass parse->havingQual to
these functions in 0001 itself. That will keep all changes to those functions
together and also make 0003 small.

Done.

The prologue of try_partition_wise_grouping() mentions a restriction of
partition keys being leading group by clauses. This restriction is not
specified in the prologue of have_grouping_by_partkey(), which actually checks
for this restriction. The requirement per prologue of that function is just to
have partition keys in group clause. I think have_grouping_by_partkey() is
correct, and we should change prologue of try_partition_wise_grouping() to be
in sync with have_grouping_by_partkey().

Done.

The prologue explains why
partition-wise aggregation/grouping would be efficient with this restriction,
but it doesn't explain why partial aggregation/grouping per partition would be
efficient. May be it should do that as well. Similar is the case with partial
aggregation/grouping discussion in README.

I have tried updating it. Please check.

+ /* Do not handle grouping sets for now. */
Is this a real restriction or just restriction for first cut of this feature?
Can you please add more explanation? May be update README as well?

Grouping sets plan does not work with an inheritance subtree (see notes in
create_groupingsets_plan). Thus grouping sets are not handled here.

+ grouped_rel->part_scheme = input_rel->part_scheme;
Is this true even for partial aggregates? I think not. Since group by clause
does not contain partition keys, the rows from multiple partitions participate
in one group and thus the partition keys of input relation do not apply to the
grouped relation. In this case, it seems that the grouped rel will have
part_rels but will not be partitioned.

I have removed this as your analysis is correct. grouped_rel is not
partitioned.

+ /*
+ * If there is no path for the child relation, then we cannot add
+ * aggregate path too.
+ */
+ if (input_child_rel->pathlist == NIL)
+ return;
When can this happen? I think, similar to partition-wise join it happens when
there is a dummy parent relation. See [1]. If that's the case, you may want to
do things similar to what partition-wise join is doing. If there's some other
reason for doing this, returing from here half-way is actually waste of memory
and planning time. Instead, we may want to loop over the part_rels to find if
any of those have empty pathlist and return from there before doing any actual
work.

This is kind of can't happen scenario, so I have converted it to an Assert().
And yes, I am marking a grouped_rel as dummy rel when input rel is dummy.

+ extra.pathTarget = child_target;
+ extra.inputRows = input_child_rel->cheapest_startup_path->rows;
+ extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+ (Node *)
query->havingQual,
+ nappinfos,
+ appinfos);
These lines are updating some fields of "extra" structure in every loop. The
structure is passed to create_child_grouping_paths() in the loop and to
add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel() only
gets some member values for the last child. Is that right?

No. Patch do update those fields before calling add_paths_to_append_rel().

Should we split
extra into two structures one to be used within the loop and one outside? Or
may be send the members being updated within the loop separately?

I don't see any point in splitting. We need almost all fields at child path
creation as well as at finalization step. The patch basically just re-using
the struct variable.

+ /*
+ * Forcibly apply scan/join target to all the Paths for the scan/join
+ * rel.
+ *
[ lines clipped ]
+ if (subpath == input_child_rel->cheapest_total_path)
+ input_child_rel->cheapest_total_path = path;
+ }
+ }
This code seems to be copied from grouping_planner() almost verbatim. Is there
a way we can refactor it into a function and use it in both the places.

Done.
Moved this in 0003-Refactor-code-applying-scanjoin-target-to-paths-into.patch

have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
whether a given clause expression matches any of the partition keys. Or you
could use list_intersection() instead of following loop
+ foreach(lc, partexprs)
+ {
+ Expr *partexpr = lfirst(lc);
+
+ if (list_member(groupexprs, partexpr))
+ {
+ found = true;
+ break;
+ }
+ }
+ /*
+ * If none of the partition key matches with any of the GROUP BY
+ * expression, return false.
+ */
+ if (!found)
+ return false;

Well, the logic in match_expr_to_partition_keys() does not exactly match with
the scenarios here. It may match with few alterations but then it will become
complex. So better to have them separate.

list_intersection() is a good suggestion as it will reduce this block
altogether and will have less lines-of-code to maintain. However, it returns
a list of all matching cells from List1 which is done by comparing all
elements. But here in this case we don't need to match further after very
first match. Thus this logic saves on those unnecessary matching.

create_child_grouping_paths() and create_grouping_paths() has almost similar
code. Is there a way we could refactor the code to extract common code into a
function called by these two functions or reuse create_grouping_paths() for
children as well? I don't think we will be able to do the later.

After refactoring most of the code in create_grouping_paths() (0001-0003),
it is very little code remained which is duplicated. Refactoring those few
lines into another function looks odd.
Let me know, if you still think to refactor those few lines in a separate
function.

+ /* Nothing to do if there is an empty pathlist */
+ if (grouped_rel->pathlist == NIL)
+ return false;
When would that happen? Similar condition in case of parent grouped rel throws
an error, so when this code is called, we know for sure that parent had
non-empty pathlist. So, we would expect child to have non-empty pathlist as
well.

Yes and agree too. This is kind of not-reachable return.
Do you mean we should also throw an error here like in case of parent grouped
rel? I opted to not throw an error and instead go with the non partition-wise
path.

+ grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+ input_rel->relids);
+
+ /* Mark this rel as "other upper rel" */
+ grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
I think we need to pass relopkind as an argument to fetch_upper_rel(), now that
we have "upper" relations and "other upper" relations. relids will still be a
"key" to find an upper relation but its reloptkind should match the given
reloptkind. fetch_upper_rel() is used to create the upper relation if it
doesn't exist. So, with the above code, if some other function calls
fetch_upper_rel() with given relids, it would get an upper rel with
RELOPT_UPPER_REL and then this code would change it to RELOPT_OTHER_UPPER_REL.
That looks odd. May be we should have written build_upper_rel() and
find_upper_rel() similar to build_join_rel() and find_join_rel() instead of
combining both the functionalities in one function.

Make sense. But I am reluctant to update fetch_upper_rel() and all it's
callers.
However, do you think having a separate function for other upper rel for this
is a good idea, named fetch_other_upper_rel() in-lined with fetch_upper_rel()?

+ /*
+ * create_append_path() sets the path target from the given relation.
+ * However, in this case grouped_rel doesn't have a target set. So we
+ * must set the pathtarget to the passed in target.
+ */
+ apath->pathtarget = target;
I think, we should change create_append_path() functions to accept target as an
additional argument. For append rels other than aggregate and grouping, target
will be same as relation's target. For agg/group append rels, we will pass
different targets for partial and non-partial grouping paths.

Done in 0005-Pass-pathtarget-to-create_-merge_-append_path.patch.

+ /*
+ * Since Append's output is always unsorted, we'll need to sort,
+ * unless there's no GROUP BY clause or a degenerate (constant) one,
+ * in which case there will only be a single group.
+ */
append path here can be output of either merge append or append. If it's output
of merge append, we don't need to sort it again, do we?

Yes, you are right, we don't need an explicit sort over merge-append.
Done those changes.

create_partition_agg_paths() creates append paths and then adds finalization
path if necessary. The code to add finalization path seems to be similar to the
code that adds finalization path for parallel query. May be we could take out
common code into a function and call that function in two places. I see this
function as accepting a partial aggregation/grouping path and returning a path
that finalizes partial aggregates/groups.

It seems that it will become messy. Per my understanding the only common code
is related to the add_path() call with appropriate create_agg_path() or
create_group_path(). Those are anyways function calls and I don't see any
reason to split them into the separate function.

Attached new patch set having HEAD at 84ad4b0 with all these review points
fixed. Let me know if I missed any thanks.

I have merged parallelism changes into main patch i.e. 0007 as most of the
changes in that patch are actual modifying same lines added by 0007.

Thanks

--

Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения

partition-wise-agg-v5.tar.gz

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Partition-wise aggregation/grouping

Вложения