Re: WIP: Aggregation push-down - take2

Поиск
Список
Период
Сортировка
От	vignesh C
Тема	Re: WIP: Aggregation push-down - take2
Дата	4 января 2023 г. 13:12:21
Msg-id	CALDaNm2KsNYEQ-T2TDPEiicHRVbbz8JinSzm6hXGvDswpfkqfQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: WIP: Aggregation push-down - take2 (Antonin Houska <ah@cybertec.at>)
Ответы	Re: WIP: Aggregation push-down - take2
Список	pgsql-hackers
Дерево обсуждения
On Thu, 17 Nov 2022 at 16:34, Antonin Houska <ah@cybertec.at> wrote:
>
> Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> > Hi,
> >
> > I did a quick initial review of the v20 patch series. I plan to do a
> > more thorough review over the next couple days, if time permits. In
> > general I think the patch is in pretty good shape.
>
> Thanks.
>
> > I've added a bunch of comments in a number of places - see the "review
> > comments" parts for each of the original parts. That should make it
> > easier to deal with all the items. I'll go through the main stuff here:
>
> Unless I miss something, all these items are covered in context below, except
> for this one:
>
> > 7) when I change enable_agg_pushdown to true and run regression tests, I
> > get a bunch of failures like
> >
> >    ERROR:  WindowFunc found where not expected
> >
> > Seems we don't handle window functions correctly somewhere, or maybe
> > setup_aggregate_pushdown should check/reject hasWindowFuncs too?
>
> We don't need to reject window functions, window functions are processed after
> grouping/aggregation. The problem I noticed in the regression tests was that a
> window function referenced a (non-window) aggregate. We just need to ensure
> that pull_var_clause() recurses into that window function in such cases:
>
> Besides the next version, v21-fixes.patch file is attached. It tries to
> summarize all the changes between v21 and v22. (I wonder if this attachment
> makes the cfbot fail.)
>
>
> diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> index 8e913c92d8..8dc39765f2 100644
> --- a/src/backend/optimizer/plan/initsplan.c
> +++ b/src/backend/optimizer/plan/initsplan.c
> @@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
>         Assert(root->grouped_var_list == NIL);
>
>         tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
> -                                                                 PVC_INCLUDE_AGGREGATES);
> +                                                                 PVC_INCLUDE_AGGREGATES |
> +                                                                 PVC_RECURSE_WINDOWFUNCS);
>
>         /*
>          * Although GroupingFunc is related to root->parse->groupingSets, this
>
>
> > ---
> >  src/backend/optimizer/util/relnode.c | 11 +++++++++++
> >  src/include/nodes/pathnodes.h        |  3 +++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
> > index 94720865f47..d4367ba14a5 100644
> > --- a/src/backend/optimizer/util/relnode.c
> > +++ b/src/backend/optimizer/util/relnode.c
> > @@ -382,6 +382,12 @@ find_base_rel(PlannerInfo *root, int relid)
> >  /*
> >   * build_rel_hash
> >   *     Construct the auxiliary hash table for relation specific data.
> > + *
> > + * XXX Why is this renamed, leaving out the "join" part? Are we going to use
> > + * it for other purposes?
>
> Yes, besides join relation, it's used to find the "grouped relation" by
> Relids. This change tries to follow the suggestion "Maybe an appropriate
> preliminary patch ..." in [1], but I haven't got any feedback whether my
> understanding was correct.
>
> > + * XXX Also, why change the API and not pass PlannerInfo? Seems pretty usual
> > + * for planner functions.
>
> I think that the reason was that, with the patch applied, PlannerInfo contains
> multiple fields of the RelInfoList type, so build_rel_hash() needs an
> information which one it should process. Passing the exact field is simpler
> than passing PlannerInfo plus some additional information.
>
> >   */
> >  static void
> >  build_rel_hash(RelInfoList *list)
> > @@ -422,6 +428,11 @@ build_rel_hash(RelInfoList *list)
> >  /*
> >   * find_rel_info
> >   *     Find a base or join relation entry.
> > + *
> > + * XXX Why change the API and not pass PlannerInfo? Seems pretty usual
> > + * for planner functions.
>
> For the same reason that build_rel_hash() receives the list explicitly, see
> above.
>
> > + * XXX I don't understand why we need both this and find_join_rel.
>
> Perhaps I just wanted to keep the call sites of find_join_rel() untouched. I
> think that
>
>     find_join_rel(root, relids);
>
> is a little bit easier to read than
>
>     (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
>
> >   */
> >  static void *
> >  find_rel_info(RelInfoList *list, Relids relids)
> > diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
> > index 0ca7d5ab51e..018ce755720 100644
> > --- a/src/include/nodes/pathnodes.h
> > +++ b/src/include/nodes/pathnodes.h
> > @@ -88,6 +88,9 @@ typedef enum UpperRelationKind
> >   * present and valid when rel_hash is not NULL.  Note that we still maintain
> >   * the list even when using the hash table for lookups; this simplifies life
> >   * for GEQO.
> > + *
> > + * XXX I wonder why we actually need a separate node, merely wrapping fields
> > + * that already existed ...
>
> This is so that the existing fields can still be printed out
> (nodes/outfuncs.c).
>
> > diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
> > index 2fd1a962699..6f6b7d0b93b 100644
> > --- a/src/backend/optimizer/README
> > +++ b/src/backend/optimizer/README
> > @@ -1168,6 +1168,12 @@ input of Agg node. However, if the groups are large enough, it may be more
> >  efficient to apply the partial aggregation to the output of base relation
> >  scan, and finalize it when we have all relations of the query joined:
> >
> > +XXX review: Hmm, do we need to push it all the way down to base relations? Or
> > +would it make sense to do the agg on an intermediate level? Say, we're joining
> > +three tables A, B and C. Maybe the agg could/should be evaluated on top of join
> > +A+B, before joining with C? Say, maybe the aggregate references columns from
> > +both base relations?
> > +
> >    EXPLAIN
> >    SELECT a.i, avg(b.y)
> >    FROM a JOIN b ON b.j = a.i
>
> Another example below does show the partial aggregates at join level.
>
> > +XXX Perhaps mention this may also mean the partial ggregate could be pushed
> > +to a remote server with FDW partitions?
>
> Even if it's not implemented in the current patch version?
>
> > +
> >  Note that there's often no GROUP BY expression to be used for the partial
> >  aggregation, so we use equivalence classes to derive grouping expression: in
> >  the example above, the grouping key "b.j" was derived from "a.i".
> >
> > +XXX I think this is slightly confusing - there is a GROUP BY expression for the
> > +partial aggregate, but as stated in the query it may not reference the side of
> > +a join explicitly.
>
> ok, changed.
>
> >  Also note that in this case the partial aggregate uses the "b.j" as grouping
> >  column although the column does not appear in the query target list. The point
> >  is that "b.j" is needed to evaluate the join condition, and there's no other
> >  way for the partial aggregate to emit its values.
> >
> > +XXX Not sure I understand what this is trying to say. Firstly, maybe it'd be
> > +helpful to show targetlists in the EXPLAIN, i.e. do it as VERBOSE. But more
> > +importantly, isn't this a direct consequence of the equivalence classes stuff
> > +mentioned in the preceding paragraph?
>
> The equivalence class is just a mechanism to derive expressions which are not
> explicitly mentioned in the query, but there's always a question whether you
> need to derive any expression for particular table or not. Here I tried to
> explain that the choice of join columns is related to the choice of grouping
> keys for the partial aggregate.
>
> I've deleted this paragraph and added a note to the previous one.
>
> >  Besides base relation, the aggregation can also be pushed down to join:
> >
> >    EXPLAIN
> > @@ -1217,6 +1235,10 @@ Besides base relation, the aggregation can also be pushed down to join:
> >         ->  Hash
> >               ->  Seq Scan on a
> >
> > +XXX Aha, so this is pretty-much an answer to my earlier comment, and matches
> > +my example with three tables. Maybe this suggests the initial reference to
> > +base relations is a bit confusing.
>
> I tried to use the simplest example to demonstrate the concepts, then extended
> it to the partially-aggregated joins.
>
> > +XXX I think this is a good explanation of the motivation for this patch, but
> > +maybe it'd be good to go into more details about how we decide if it's correct
> > +to actually do the pushdown, data structures etc. Similar to earlier parts of
> > +this README.
>
> Added two paragraphs, see "Regarding correctness...".
>
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index f00f900ff41..6d2c2f4fc36 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -196,9 +196,10 @@ make_one_rel(PlannerInfo *root, List *joinlist)
> >       /*
> >        * Now that the sizes are known, we can estimate the sizes of the grouped
> >        * relations.
> > +      *
> > +      * XXX Seems more consistent with code nearby.
> >        */
> > -     if (root->grouped_var_list)
> > -             setup_base_grouped_rels(root);
> > +     setup_base_grouped_rels(root);
>
> In general I prefer not calling a function if it's obvious that it's not
> needed, but on the other hand the test of the 'grouped_var_list' field may be
> considered disturbing from the caller's perspective. I've got no strong
> opinion on this, so I can accept this proposal.
>
> >
> >  /*
> > - * setup_based_grouped_rels
> > + * setup_base_grouped_rels
> >   *     For each "plain" relation build a grouped relation if aggregate pushdown
> >   *    is possible and if this relation is suitable for partial aggregation.
> >   */
>
> Fixed, thanks.
>
> >  {
> >       Index           rti;
> >
> > +     /* If there are no grouped relations, estimate their sizes. */
> > +     if (!root->grouped_var_list)
> > +             return;
> > +
>
> Accepted, but with different wording (s/relations/expressions/).
>
> > +             /* XXX Shouldn't this check be earlier? Seems cheaper than the check
> > +              * calling bms_nonempty_difference, for example. */
> >               if (brel->reloptkind != RELOPT_BASEREL)
> >                       continue;
>
> Right, moved.
>
> >               rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
> > -             if (rel_grouped)
> > -             {
> > -                     /* Make the relation available for joining. */
> > -                     add_grouped_rel(root, rel_grouped, agg_info);
> > -             }
> > +
> > +             /* XXX When does this happen? */
> > +             if (!rel_grouped)
> > +                     continue;
> > +
> > +             /* Make the relation available for joining. */
> > +             add_grouped_rel(root, rel_grouped, agg_info);
>
> I'd use the "continue" statement if there was a lot of code in the "if
> (rel_grouped) {...}" branch, but no strong preference in this case, so
> accepted.
>
> >       }
> >  }
> >
> > @@ -560,6 +569,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> >                                       /* Plain relation */
> >                                       set_plain_rel_pathlist(root, rel, rte);
> >
> > +                                     /* XXX Shouldn't this really be part of set_plain_rel_pathlist? */
> > +
> >                                       /* Add paths to the grouped relation if one exists. */
> >                                       rel_grouped = find_grouped_rel(root, rel->relids,
>
> Yes, it can. Moved.
>
> > @@ -3382,6 +3393,11 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
> >
> >  /*
> >   * Apply partial aggregation to a subpath and add the AggPath to the pathlist.
> > + *
> > + * XXX I think this is potentially quite confusing, because the existing "add"
> > + * functions add_path and add_partial_path only check if the proposed path is
> > + * dominated by an existing path, pathkeys, etc. But this does more than that,
> > + * perhaps even constructing new path etc.
> >   */
> >  static void
> >  add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
>
> Maybe, but I don't have a good idea of an alternative name.
> create_group_path() already exists and the create_*_path() functions are
> rather low-level. Maybe generate_grouped_path(), and at the same time rename
> generate_grouping_paths() to generate_grouped_paths()? In general, the
> generate_*_path*() functions do non-trivial things and eventually call
> add_path().
>
> > @@ -3399,9 +3414,16 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >       else
> >               elog(ERROR, "unexpected strategy %d", aggstrategy);
> >
> > +     /*
> > +      * Bail out if we failed to create a suitable aggregated path. This can
> > +      * happen e.g. then the path does not support hashing (for AGG_HASHED),
> > +      * or when the input path is not sorted.
> > +      */
> > +     if (agg_path == NULL)
> > +             return;
> > +
> >       /* Add the grouped path to the list of grouped base paths. */
> > -     if (agg_path != NULL)
> > -             add_path(rel, (Path *) agg_path);
> > +     add_path(rel, (Path *) agg_path);
>
> ok, changed.
>
> >  }
> >
> >  /*
> > @@ -3545,7 +3567,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
> >
> >       for (lev = 2; lev <= levels_needed; lev++)
> >       {
> > -             RelOptInfo *rel_grouped;
> >               ListCell   *lc;
> >
> >               /*
> > @@ -3567,6 +3588,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
> >                */
> >               foreach(lc, root->join_rel_level[lev])
> >               {
> > +                     RelOptInfo *rel_grouped;
> > +
> >                       rel = (RelOptInfo *) lfirst(lc);
>
> Sure, fixed.
>
> > diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
> > index 8e913c92d8b..d7a9de9645e 100644
> > --- a/src/backend/optimizer/plan/initsplan.c
> > +++ b/src/backend/optimizer/plan/initsplan.c
> > @@ -278,6 +278,8 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
> >   * each possible grouping expression.
> >   *
> >   * root->group_pathkeys must be setup before this function is called.
> > + *
> > + * XXX Perhaps this should check/reject hasWindowFuncs too?
>
> create_window_paths() is called after create_grouping_paths() (see
> grouping_planner()), so it should not care whether the input (possibly
> grouped) paths involve the aggregate push-down or not.
>
> >   */
> >  extern void
> >  setup_aggregate_pushdown(PlannerInfo *root)
> > @@ -311,6 +313,12 @@ setup_aggregate_pushdown(PlannerInfo *root)
> >       if (root->parse->hasTargetSRFs)
> >               return;
> >
> > +     /*
> > +      * XXX Maybe it'd be better to move create_aggregate_grouped_var_infos and
> > +      * create_grouping_expr_grouped_var_infos to a function returning bool, and
> > +      * only check that here.
> > +      */
> > +
>
> Hm, it looks to me like too much "indirection", and also a decriptive function
> name would be tricky to invent.
>
> >       /* Create GroupedVarInfo per (distinct) aggregate. */
> >       create_aggregate_grouped_var_infos(root);
> >
> > @@ -329,6 +337,8 @@ setup_aggregate_pushdown(PlannerInfo *root)
> >        * Now that we know that grouping can be pushed down, search for the
> >        * maximum sortgroupref. The base relations may need it if extra grouping
> >        * expressions get added to them.
> > +      *
> > +      * XXX Shouldn't we do that only when adding extra grouping expressions?
> >        */
> >       Assert(root->max_sortgroupref == 0);
> >       foreach(lc, root->processed_tlist)
>
> We don't know at this (early) stage whether those "extra grouping expression"
> will be needed for at least one relation. (max_sortgroupref is used by
> create_rel_agg_info())
>
> > diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> > index 0ada3ba3ebe..2f4db69c1f9 100644
> > --- a/src/backend/optimizer/plan/planner.c
> > +++ b/src/backend/optimizer/plan/planner.c
> > @@ -3899,6 +3899,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
> >       /*
> >        * The non-partial paths can come either from the Gather above or from
> >        * aggregate push-down.
> > +      *
> > +      * XXX I can't quite convince myself this is correct. How come it's fine
> > +      * to check pathlist and then call set_cheapest() on partially_grouped_rel?
> > +      * Maybe it's correct and the comment merely needs to explain this.
>
> It's not clear to me what makes you confused. Without my patch, the code looks
> like this:
>
>     if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
>     {
>         gather_grouping_paths(root, partially_grouped_rel);
>         set_cheapest(partially_grouped_rel);
>     }
>
> Here gather_grouping_paths() adds paths to partially_grouped_rel->pathlist. My
> patch calls set_cheapest() independent from gather_grouping_paths() because
> the paths requiring the aggregate finalization can also be generated by the
> aggregate push-down feature.
>
> >        */
> >       if (partially_grouped_rel && partially_grouped_rel->pathlist)
> >               set_cheapest(partially_grouped_rel);
> > @@ -6847,6 +6851,12 @@ create_partial_grouping_paths(PlannerInfo *root,
> >        * push-down.
> >        */
> >       partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
> > +
> > +     /*
> > +      * If the relation already exists, it must have been created by aggregate
> > +      * pushdown. We can't check how exactly it got created, but we can at least
> > +      * check that aggregate pushdown is enabled.
> > +      */
> >       Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
>
> ok, done.
>
> > @@ -6872,6 +6882,8 @@ create_partial_grouping_paths(PlannerInfo *root,
> >        * If we can't partially aggregate partial paths, and we can't partially
> >        * aggregate non-partial paths, then don't bother creating the new
> >        * RelOptInfo at all, unless the caller specified force_rel_creation.
> > +      *
> > +      * XXX Not sure why we're checking the partially_grouped_rel here?
> >        */
> >       if (cheapest_total_path == NULL &&
> >               cheapest_partial_path == NULL &&
>
> I think (but not verified yet) that without this test the function could
> return NULL for reasons unrelated to the aggregate push-down. Nevertheless, I
> realize now that there's no aggregate push-down specific processing in the
> function. I've adjusted it so that it does return, but the returned value is
> partially_grouped_rel rather than NULL.
>
> > @@ -6881,7 +6893,9 @@ create_partial_grouping_paths(PlannerInfo *root,
> >
> >       /*
> >        * Build a new upper relation to represent the result of partially
> > -      * aggregating the rows from the input relation.
> > +      * aggregating the rows from the input relation. The relation may
> > +      * already exist due to aggregate pushdown, in which case we don't
> > +      * need to create it.
> >        */
> >       if (partially_grouped_rel == NULL)
> >               partially_grouped_rel = fetch_upper_rel(root,
>
> ok, done.
>
> > @@ -6903,6 +6917,8 @@ create_partial_grouping_paths(PlannerInfo *root,
> >        *
> >        * If the target was already created for the sake of aggregate push-down,
> >        * it should be compatible with what we'd create here.
> > +      *
> > +      * XXX Why is this checking reltarget->exprs? What does that mean?
> >        */
> >       if (partially_grouped_rel->reltarget->exprs == NIL)
> >               partially_grouped_rel->reltarget =
>
> I've added this comment:
>
>          * XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
>          * push-down generated no paths), it created an empty target. Should we
>          * change the convention and have it assign NULL to reltarget instead?  Or
>          * should we introduce a function like is_pathtarget_empty()?
>
> > diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
> > index 7025ebf94be..395bd093d34 100644
> > --- a/src/backend/optimizer/util/pathnode.c
> > +++ b/src/backend/optimizer/util/pathnode.c
> > @@ -3163,9 +3163,21 @@ create_agg_path(PlannerInfo *root,
> >  }
> >
> >  /*
> > + * create_agg_sorted_path
> > + *           Creates a pathnode performing sorted aggregation/grouping
> > + *
> >   * Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
> >   *
> >   * NULL is returned if sorting of subpath output is not suitable.
> > + *
> > + * XXX I'm a bit confused why we need this? We now have create_agg_path and also
> > + * create_agg_sorted_path and create_agg_hashed_path.
>
> Do you mean that the function names are confusing? The functions
> create_agg_sorted_path() and create_agg_hashed_path() do some checks /
> preparation for the call of the existing function create_agg_path(), which is
> more low-level. Should the names be something like
> create_partial_agg_sorted_path() and create_partial_agg_hashed_path() ?
>
> > + *
> > + * XXX This assumes the input path to be sorted in a suitable way, but for
> > + * regular aggregation we check that separately and then perhaps add sort
> > + * if needed (possibly incremental one). That is, we don't do such checks
> > + * in create_agg_path. Shouldn't we do the same thing before calling this
> > + * new functions?
> >   */
> >  AggPath *
> >  create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> > @@ -3184,6 +3196,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >       agg_exprs = agg_info->agg_exprs;
> >       target = agg_info->target;
>
> Likewise, it seems that you'd like to see different function name and maybe
> different location of this function. Both create_agg_sorted_path() and
> create_agg_hashed_path() are rather wrappers for create_agg_path().
>
> >
> > +     /* Bail out if the input path is not sorted at all. */
> >       if (subpath->pathkeys == NIL)
> >               return NULL;
>
> ok, done.
>
> > @@ -3192,6 +3205,18 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >
> >       /*
> >        * Find all query pathkeys that our relation does affect.
> > +      *
> > +      * XXX Not sure what "that our relation does affect" means? Also, we
> > +      * are not looking at query_pathkeys but group_pathkeys, so that's a
> > +      * bit confusing. Perhaps something like this would be better:
> > +      *
>
> Indeed, the check of pathkeys was weird, I've reworked it.
>
> > @@ -3210,10 +3235,21 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >               }
> >       }
> >
> > +     /* Bail out if the subquery has no pathkeys for the grouping. */
> >       if (key_subset == NIL)
> >               return NULL;
> >
> > -     /* Check if AGG_SORTED is useful for the whole query.  */
> > +     /*
> > +      * Check if AGG_SORTED is useful for the whole query.
> > +      *
> > +      * XXX So this means we require the group pathkeys matched to the
> > +      * subpath have to be a prefix of subpath->pathkeys. Why is that
> > +      * necessary? We'll reduce the cardinality, and in the worst case
> > +      * we'll have to add a separate sort (full or incremental). Or we
> > +      * could finalize using hashed aggregate.
>
> Although with different arguments, pathkeys_contained_in() is still used in
> the new version of the patch. I've added a TODO comment about the incremental
> sort (it did not exist when I was writing the patch), but what do you mean by
> "reducing the cardinality"? Eventually the partial aggregate should reduce the
> cardinality, but for the AGG_SORT strategy to work, the input sorting must be
> such that the executor can recognize the group boundaries.
>
> > +      *
> > +      * XXX Doesn't seem to change any regression tests when disabled.
> > +      */
> >       if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
> >               return NULL;
>
> "disabled" means removal of this part (including the return statement), or
> returning NULL unconditionally? Whatever you mean, please check with the new
> version.
>
> > @@ -3231,7 +3267,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
> >       result = create_agg_path(root, rel, subpath, target,
> >                                                        AGG_SORTED, aggsplit,
> >                                                        agg_info->group_clauses,
> > -                                                      NIL,
> > +                                                      NIL,   /* qual for HAVING clause */
> >                                                        &agg_costs,
> >                                                        dNumGroups);
>
> ok, done here as well as in create_agg_hashed_path().
>
> > @@ -3283,6 +3319,9 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
> >
&agg_costs,
> >
dNumGroups);
> >
> > +             /*
> > +              * XXX But we can spill to disk in hashagg now, no?
> > +              */
> >               if (hashaggtablesize < work_mem * 1024L)
> >               {
>
> Yes, we can. It wasn't possible while I was writing the patch. Fixed.
>
> > diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
> > index 868d21c351e..6e87ada684b 100644
> > --- a/src/backend/utils/misc/postgresql.conf.sample
> > +++ b/src/backend/utils/misc/postgresql.conf.sample
> > @@ -388,6 +388,7 @@
> >  #enable_seqscan = on
> >  #enable_sort = on
> >  #enable_tidscan = on
> > +#enable_agg_pushdown = on
>
> Done.
>
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index 1055ea70940..05192ca549a 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3352,7 +3352,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
> >                                               RelOptInfo *rel_plain, RelAggInfo *agg_info)
> >  {
> >       ListCell   *lc;
> > -     Path       *path;
> > +     Path       *path;       /* XXX why declare at this level, not in the loops */
> >
>
> I usually do it this way, not sure why. Perhaps because it's less typing :-) I
> changed that in the next version so that we don't waste time arguing about
> unimportant things.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
5212d447fa53518458cbe609092b347803a667c5 ===
=== applying patch ./v21-fixes.patch
patching file src/backend/optimizer/README
Hunk #1 FAILED at 1186.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/optimizer/README.rej
patching file src/backend/optimizer/path/allpaths.c
Hunk #1 FAILED at 197.
Hunk #2 FAILED at 341.
Hunk #3 succeeded at 339 with fuzz 1 (offset -11 lines).
Hunk #4 succeeded at 1014 with fuzz 2 (offset 647 lines).
Hunk #5 FAILED at 378.
Hunk #6 FAILED at 563.
Hunk #7 succeeded at 2793 with fuzz 1 (offset 1948 lines).
Hunk #8 FAILED at 867.
Hunk #9 FAILED at 3439.
Hunk #10 FAILED at 3590.
Hunk #11 succeeded at 3430 (offset -182 lines).
7 out of 11 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/allpaths.c.rej
patching file src/backend/optimizer/path/costsize.c

[1] - http://cfbot.cputube.org/patch_41_3764.log

Regards,
Vignesh
В списке pgsql-hackers по дате отправления:
Предыдущее

От: vignesh C
Дата: 04 января 2023 г., 13:06:27
Сообщение: Re: Prefetch the next tuple's memory during seqscans
Следующее

От: vignesh C
Дата: 04 января 2023 г., 13:15:05
Сообщение: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WIP: Aggregation push-down - take2

Предыдущее

Следующее