Re: [HACKERS] Improve OR conditions on joined columns (common star schema problem)

Поиск
Список
Период
Сортировка
От	Tom Lane
Тема	Re: [HACKERS] Improve OR conditions on joined columns (common star schema problem)
Дата	12 февраля 2017 г. 03:30:35
Msg-id	4343.1486859435@sss.pgh.pa.us обсуждение
Ответ на	Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem) (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы	Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem) Re: [HACKERS] Improve OR conditions on joined columns (common star schema problem)
Список	pgsql-hackers
Дерево обсуждения
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 2/8/17 5:54 PM, Tom Lane wrote:
>> Maybe it'd be better to imagine this as something closer to planagg.c,
>> that is it knows how to apply a specific high-level optimization that
>> might or might not be a win, so it builds a path describing that and sees
>> if it looks cheaper than a path done the normal way.  The fact that
>> we could even build a path describing a union is something that wasn't
>> there before 9.6, but maybe there's enough infrastructure for that now.

> It's encouraging that there's at least a template to follow there... If 
> there is missing infrastructure, is it likely to be major? My uninformed 
> guess would be that the pathification was the major hurdle.

I wrote a POC patch for this on a long airplane ride.  It's not complete,
and I'm sure there are bugs as well, but it makes your example case
better.  What I did about the de-duplication issue is to de-dup using
the CTIDs of all baserels in the query.  This means the optimization
is only applicable to cases where all the rels have CTIDs ... but other
methods such as inspecting unique keys ain't gonna work for non-table
rels either, so I think this is about the best we can hope for.
However, I did not understand your point about:

> BTW, there's an important caveat here: users generally do NOT want 
> duplicate rows from the fact table if the dimension table results aren't 
> unique.

so maybe we should be thinking about some other way entirely?

Anyway, what's attached below produces this on your example query:

 Aggregate  (cost=38.12..38.13 rows=1 width=8) (actual time=0.158..0.158 rows=1 loops=1)
   ->  Result  (cost=38.03..38.11 rows=4 width=0) (actual time=0.151..0.155 rows=3 loops=1)
         ->  Unique  (cost=38.03..38.07 rows=4 width=18) (actual time=0.150..0.154 rows=3 loops=1)
               ->  Sort  (cost=38.03..38.04 rows=4 width=18) (actual time=0.150..0.151 rows=4 loops=1)
                     Sort Key: f.ctid, d1.ctid, d2.ctid
                     Sort Method: quicksort  Memory: 25kB
                     ->  Append  (cost=4.85..37.99 rows=4 width=18) (actual time=0.070..0.106 rows=4 loops=1)
                           ->  Nested Loop Left Join  (cost=4.85..19.00 rows=2 width=18) (actual time=0.069..0.075
rows=2loops=1)
 
                                 ->  Nested Loop  (cost=4.57..18.37 rows=2 width=16) (actual time=0.035..0.038 rows=2
loops=1)
                                       ->  Index Scan using dim_t_key on dim d1  (cost=0.28..8.29 rows=1 width=10)
(actualtime=0.009..0.009 rows=1 loops=1)
 
                                             Index Cond: ('1'::text = t)
                                       ->  Bitmap Heap Scan on fact f  (cost=4.30..10.05 rows=2 width=14) (actual
time=0.023..0.025rows=2 loops=1)
 
                                             Recheck Cond: (f1 = d1.s)
                                             Heap Blocks: exact=2
                                             ->  Bitmap Index Scan on f_f1  (cost=0.00..4.29 rows=2 width=0) (actual
time=0.016..0.016rows=2 loops=1)
 
                                                   Index Cond: (f1 = d1.s)
                                 ->  Index Scan using dim_pkey on dim d2  (cost=0.28..0.31 rows=1 width=10) (actual
time=0.016..0.017rows=0 loops=2)
 
                                       Index Cond: (f.f2 = s)
                           ->  Nested Loop Left Join  (cost=4.85..19.00 rows=2 width=18) (actual time=0.025..0.029
rows=2loops=1)
 
                                 ->  Nested Loop  (cost=4.57..18.37 rows=2 width=16) (actual time=0.022..0.025 rows=2
loops=1)
                                       ->  Index Scan using dim_t_key on dim d2  (cost=0.28..8.29 rows=1 width=10)
(actualtime=0.003..0.003 rows=1 loops=1)
 
                                             Index Cond: ('1'::text = t)
                                       ->  Bitmap Heap Scan on fact f  (cost=4.30..10.05 rows=2 width=14) (actual
time=0.017..0.019rows=2 loops=1)
 
                                             Recheck Cond: (f2 = d2.s)
                                             Heap Blocks: exact=2
                                             ->  Bitmap Index Scan on f_f2  (cost=0.00..4.29 rows=2 width=0) (actual
time=0.014..0.014rows=2 loops=1)
 
                                                   Index Cond: (f2 = d2.s)
                                 ->  Index Scan using dim_pkey on dim d1  (cost=0.28..0.31 rows=1 width=10) (actual
time=0.001..0.001rows=0 loops=2)
 
                                       Index Cond: (f.f1 = s)

The main remaining piece of work here is that, as you can see from the
above, it fails to eliminate joins to tables that we don't actually need
in a particular UNION arm.  This is because the references to those
tables' ctid columns prevent analyzejoins.c from removing the joins.
I've thought about ways to deal with that but haven't come up with
anything that wasn't pretty ugly and/or invasive.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/Makefile b/src/backend/optimizer/plan/Makefile
index 88a9f7ff8c0177b920b60900276335022ae0e813..1db6bd55278cd4e3072e9c1ea7b2e648a01e9eed 100644
*** a/src/backend/optimizer/plan/Makefile
--- b/src/backend/optimizer/plan/Makefile
*************** top_builddir = ../../../..
*** 13,18 ****
  include $(top_builddir)/src/Makefile.global

  OBJS = analyzejoins.o createplan.o initsplan.o planagg.o planmain.o planner.o \
!     setrefs.o subselect.o

  include $(top_srcdir)/src/backend/common.mk
--- 13,18 ----
  include $(top_builddir)/src/Makefile.global

  OBJS = analyzejoins.o createplan.o initsplan.o planagg.o planmain.o planner.o \
!     planunionor.o setrefs.o subselect.o

  include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 881742f46b66d7cfcdf5161f93277a57ff39307c..2a4dff6c87de50c7f43745df182399cb39f0da56 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** grouping_planner(PlannerInfo *root, bool
*** 1496,1501 ****
--- 1496,1502 ----
          List       *activeWindows = NIL;
          List       *rollup_lists = NIL;
          List       *rollup_groupclauses = NIL;
+         List       *union_or_subpaths;
          standard_qp_extra qp_extra;

          /* A recursive query should always have setOperations */
*************** grouping_planner(PlannerInfo *root, bool
*** 1656,1661 ****
--- 1657,1670 ----
              preprocess_minmax_aggregates(root, tlist);

          /*
+          * Preprocess join OR clauses that might be better handled as UNIONs.
+          * This likewise needs to be close to the query_planner() call.  But
+          * it doesn't matter which of preprocess_minmax_aggregates() and this
+          * function we call first, because they treat disjoint sets of cases.
+          */
+         union_or_subpaths = split_join_or_clauses(root);
+
+         /*
           * Figure out whether there's a hard limit on the number of rows that
           * query_planner's result subplan needs to return.  Even if we know a
           * hard limit overall, it doesn't apply if the query has any
*************** grouping_planner(PlannerInfo *root, bool
*** 1689,1694 ****
--- 1698,1711 ----
                                      standard_qp_callback, &qp_extra);

          /*
+          * If we found any way to convert a join OR clause to a union, finish
+          * up generating the path(s) for that, and add them into the topmost
+          * scan/join relation.
+          */
+         if (union_or_subpaths)
+             finish_union_or_paths(root, current_rel, union_or_subpaths);
+
+         /*
           * Convert the query's result tlist into PathTarget format.
           *
           * Note: it's desirable to not do this till after query_planner(),
diff --git a/src/backend/optimizer/plan/planunionor.c b/src/backend/optimizer/plan/planunionor.c
index ...ce744d6b47c5ca3eb119f42017e1513c3941b585 .
*** a/src/backend/optimizer/plan/planunionor.c
--- b/src/backend/optimizer/plan/planunionor.c
***************
*** 0 ****
--- 1,596 ----
+ /*-------------------------------------------------------------------------
+  *
+  * planunionor.c
+  *      Consider whether join OR clauses can be converted to UNION queries.
+  *
+  * This module looks for OR clauses whose arms each reference a single
+  * query relation (but not all the same rel), and tries to generate a path
+  * representing conversion of such an OR clause into a UNION operation.
+  * For example,
+  *        SELECT ... FROM a, b WHERE (cond-on-A OR cond-on-B) AND other-conds
+  * can be implemented as
+  *        SELECT ... FROM a, b WHERE cond-on-A AND other-conds
+  *        UNION
+  *        SELECT ... FROM a, b WHERE cond-on-B AND other-conds
+  * given a suitable definition of "UNION" (one that won't merge rows that
+  * would have been separate in the original query output).  Since this change
+  * converts join clauses into restriction clauses, the modified query can be
+  * much faster to run than the original, despite the duplication of effort
+  * involved and the extra UNION processing.  It's particularly useful for
+  * star-schema queries where the OR arms reference different dimension tables;
+  * each separated query may be able to remove joins to all but one dimension
+  * table.
+  *
+  * We must insist that the WHERE and JOIN/ON clauses contain no volatile
+  * functions, because of the likelihood that qual clauses will be evaluated
+  * more times than a naive programmer would expect.  We need not restrict
+  * the SELECT's tlist, because that will be evaluated after the UNION.
+  *
+  * The current implementation of the UNION step is to de-duplicate using
+  * row CTIDs.  A big limitation is that this only works on plain relations,
+  * and not for instance on foreign tables.  Another problem is that we can
+  * only de-duplicate by sort/unique, not hashing; but that could be fixed
+  * if we write a hash opclass for TID.
+  *
+  *
+  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *      src/backend/optimizer/plan/planunionor.c
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+
+ #include "access/sysattr.h"
+ #include "catalog/pg_class.h"
+ #include "catalog/pg_operator.h"
+ #include "catalog/pg_type.h"
+ #include "miscadmin.h"
+ #include "nodes/makefuncs.h"
+ #include "optimizer/clauses.h"
+ #include "optimizer/cost.h"
+ #include "optimizer/pathnode.h"
+ #include "optimizer/planmain.h"
+ #include "optimizer/prep.h"
+ #include "optimizer/subselect.h"
+ #include "optimizer/tlist.h"
+ #include "optimizer/var.h"
+
+
+ static bool is_suitable_join_or_clause(BoolExpr *orclause, List **relids);
+ static bool is_query_safe_for_union_or_transform(PlannerInfo *root,
+                                      Relids allbaserels);
+ static List *create_union_or_subpaths(PlannerInfo *root, BoolExpr *orclause,
+                          int n, List *armrelids, Relids allbaserels);
+ static void union_or_qp_callback(PlannerInfo *root, void *extra);
+
+
+ /*
+  * split_join_or_clauses - create paths based on splitting join OR clauses
+  *
+  * This should be called by grouping_planner() just before it's ready to call
+  * query_planner(), because we generate simplified join paths by cloning the
+  * planner's state and invoking query_planner() on a modified version of
+  * the query parsetree.  Thus, all preprocessing needed before query_planner()
+  * must already be done.  Note however that we repeat reduce_outer_joins()
+  * because of the possibility that the simplified WHERE clause allows reduction
+  * of an outer join to inner-join form.  That's okay for now, but maybe we
+  * should move the reduce_outer_joins() call into query_planner()?
+  *
+  * The result is a list (one entry per potential join OR path) of sublists of
+  * best paths for the inputs to the UNION step.  Adding the UNION processing
+  * is pretty mechanical, but we can't do it until we have a RelOptInfo for the
+  * top-level join rel.
+  */
+ List *
+ split_join_or_clauses(PlannerInfo *root)
+ {
+     List       *results = NIL;
+     Query       *parse = root->parse;
+     Relids        allbaserels = NULL;
+     bool        checked_query = false;
+     ListCell   *lc;
+     int            n;
+
+     /*
+      * At least for now, we restrict this optimization to plain SELECTs.
+      */
+     if (parse->commandType != CMD_SELECT ||
+         parse->rowMarks ||
+         parse->setOperations)
+         return NIL;
+
+     /*
+      * Reject if query contains any CTEs; copying them would break
+      * single-evaluation semantics.  (In principle we could arrange for all
+      * UNION arms to read from a single instance of a CTE, but that's an
+      * improvement for another day, especially since we have no way to de-dup
+      * CTE outputs anyway.)
+      */
+     if (parse->cteList)
+         return NIL;
+
+     /*
+      * The query must reference multiple tables, else we certainly aren't
+      * going to find any suitable OR clauses.  Do a quick check that there's
+      * more than one RTE.
+      */
+     if (list_length(parse->rtable) < 2)
+         return NIL;
+
+     /*
+      * Scan the top-level WHERE clause looking for suitable OR clauses, and
+      * for each one, generate paths for the UNION input sub-queries.  There
+      * might be more than one suitable OR clause, in which case we can try the
+      * transformation for each one of them separately and add that list of
+      * paths to the results.
+      *
+      * XXX we should also search the JOIN/ON clauses of any top-level inner
+      * JOIN nodes, since those are semantically equivalent to WHERE.  But it's
+      * hard to see how to do that without either copying the whole JOIN tree
+      * in advance or repeating the search after copying, and neither of those
+      * options seems attractive.
+      */
+     n = 0;
+     foreach(lc, (List *) parse->jointree->quals)
+     {
+         Node       *qual = (Node *) lfirst(lc);
+         List       *armrelids;
+
+         if (or_clause(qual) &&
+             is_suitable_join_or_clause((BoolExpr *) qual, &armrelids))
+         {
+             List       *subpaths;
+
+             /*
+              * Check that the query as a whole is safe for this optimization.
+              * We only need to do this once, but it's somewhat expensive, so
+              * don't do it till we find a candidate OR clause.  This is also a
+              * good time to compute the set of baserels in the query, which
+              * we'll need at each step.
+              */
+             if (!checked_query)
+             {
+                 allbaserels = get_relids_in_jointree((Node *) parse->jointree,
+                                                      false);
+                 if (!is_query_safe_for_union_or_transform(root, allbaserels))
+                     return NIL;
+                 checked_query = true;
+             }
+             /* OK, transform the query and create a list of sub-paths */
+             subpaths = create_union_or_subpaths(root, (BoolExpr *) qual,
+                                                 n, armrelids, allbaserels);
+             results = lappend(results, subpaths);
+         }
+         n++;
+     }
+
+     return results;
+ }
+
+ /*
+  * Finish constructing Paths representing the UNION implementation of join
+  * OR clause(s), and attach them to "joinrel", which is the final scan/join
+  * relation returned by query_planner() for the conventional implementation of
+  * the query.  "union_or_subpaths" is the output of split_join_or_clauses().
+  */
+ void
+ finish_union_or_paths(PlannerInfo *root, RelOptInfo *joinrel,
+                       List *union_or_subpaths)
+ {
+     Relids        allbaserels;
+     ListCell   *lc;
+
+     /*
+      * XXX better to pass this forward from split_join_or_clauses?    Seems like
+      * we could get a different answer if join removal happened in main query,
+      * and that would probably be bad.
+      */
+     allbaserels = get_relids_in_jointree((Node *) root->parse->jointree, false);
+
+     /* This loop iterates once per splittable OR clause */
+     foreach(lc, union_or_subpaths)
+     {
+         List       *subpaths = (List *) lfirst(lc);
+         Path       *appendpath;
+         List       *uniq_operators;
+         List       *uniq_exprs;
+         int            brelid;
+         UniquePath *pathnode;
+         Path        sort_path;    /* dummy for result of cost_sort */
+         int            numCols;
+
+         /* Generate Append path combining the sub-paths for one UNION */
+         appendpath = (Path *) create_append_path(joinrel, subpaths, NULL, 0);
+
+         /*
+          * The Append path's pathtarget has to match what is actually coming
+          * out of the subpaths, since Append can't project.  Assume that they
+          * all generate equivalent pathtargets.
+          *
+          * XXX this is going to be wrong when join removal removes some vars?
+          */
+         appendpath->pathtarget =
+             copy_pathtarget(((Path *) linitial(subpaths))->pathtarget);
+
+         /*
+          * Make the operator and expression lists needed for the Unique path.
+          */
+         uniq_operators = uniq_exprs = NIL;
+         brelid = -1;
+         while ((brelid = bms_next_member(allbaserels, brelid)) >= 0)
+         {
+             Var           *var;
+
+             var = makeVar(brelid,
+                           SelfItemPointerAttributeNumber,
+                           TIDOID,
+                           -1,
+                           InvalidOid,
+                           0);
+             uniq_operators = lappend_oid(uniq_operators, TIDEqualOperator);
+             uniq_exprs = lappend(uniq_exprs, var);
+         }
+
+         /*
+          * Generate a Unique path representing the de-duplication step. For
+          * now, we can only consider sort+unique implementation.
+          *
+          * XXX maybe refactor to share some code with create_unique_path()?
+          */
+         pathnode = makeNode(UniquePath);
+
+         pathnode->path.pathtype = T_Unique;
+         pathnode->path.parent = joinrel;
+         pathnode->path.pathtarget = appendpath->pathtarget;
+         pathnode->path.param_info = appendpath->param_info;
+         pathnode->path.parallel_aware = false;
+         pathnode->path.parallel_safe = joinrel->consider_parallel &&
+             appendpath->parallel_safe;
+         pathnode->path.parallel_workers = appendpath->parallel_workers;
+
+         /*
+          * Treat the output as unsorted, since it almost certainly doesn't
+          * match any useful pathkeys.
+          */
+         pathnode->path.pathkeys = NIL;
+
+         pathnode->subpath = appendpath;
+         pathnode->in_operators = uniq_operators;
+         pathnode->uniq_exprs = uniq_exprs;
+
+         /* Estimate number of output rows */
+         pathnode->path.rows = appendpath->rows;
+         numCols = list_length(uniq_exprs);
+
+         /*
+          * Estimate cost for sort+unique implementation
+          */
+         cost_sort(&sort_path, root, NIL,
+                   appendpath->total_cost,
+                   appendpath->rows,
+                   appendpath->pathtarget->width,
+                   0.0,
+                   work_mem,
+                   -1.0);
+
+         /*
+          * Charge one cpu_operator_cost per comparison per input tuple.  We
+          * assume all columns get compared at most of the tuples.  (XXX
+          * probably this is an overestimate.)  This should agree with
+          * create_unique_path.
+          */
+         sort_path.total_cost += cpu_operator_cost * appendpath->rows * numCols;
+
+         pathnode->umethod = UNIQUE_PATH_SORT;
+
+         pathnode->path.startup_cost = sort_path.startup_cost;
+         pathnode->path.total_cost = sort_path.total_cost;
+
+         /* Attach it to the joinrel */
+         add_path(joinrel, (Path *) pathnode);
+     }
+
+     /* We need to refigure which is the cheapest path */
+     set_cheapest(joinrel);
+ }
+
+ /*
+  * Is this OR clause a suitable clause for splitting?
+  *
+  * Each of its arms must reference just one rel, and they must not all be
+  * the same rel.
+  * On success, pass back a list of the relids referenced by each OR arm,
+  * so we don't have to repeat the pull_varnos() work later.
+  */
+ static bool
+ is_suitable_join_or_clause(BoolExpr *orclause, List **relids)
+ {
+     bool        ok = false;
+     List       *relidlist = NIL;
+     int            firstrelid = 0;
+     ListCell   *lc;
+
+     *relids = NIL;                /* prevent uninitialized-variable warnings */
+     foreach(lc, orclause->args)
+     {
+         Node       *qual = (Node *) lfirst(lc);
+         Relids        varnos = pull_varnos(qual);
+         int            relid;
+
+         if (!bms_get_singleton_member(varnos, &relid))
+             return false;        /* this arm fails the sine qua non */
+         if (relidlist == NIL)
+             firstrelid = relid;
+         else if (firstrelid != relid)
+             ok = true;            /* arms reference more than one relid */
+         relidlist = lappend_int(relidlist, relid);
+     }
+     *relids = relidlist;
+     return ok;
+ }
+
+ /*
+  * Is query as a whole safe to apply union OR transformation to?
+  * This checks relatively-expensive conditions that we don't want to
+  * worry about until we've found a candidate OR clause.
+  */
+ static bool
+ is_query_safe_for_union_or_transform(PlannerInfo *root, Relids allbaserels)
+ {
+     Query       *parse = root->parse;
+     ListCell   *lc;
+     int            relid;
+
+     /*
+      * Must not have any volatile functions in FROM or WHERE (see notes at
+      * head of file).
+      */
+     if (contain_volatile_functions((Node *) parse->jointree))
+         return false;
+
+     /*
+      * We insist that all baserels used in the query be plain relations, so
+      * that we can use their ctids as unique row identifiers in the UNION
+      * step.  One could imagine ways to relax this later, for instance by
+      * forcibly adding WITH ORDINALITY to function RTEs.  We'd have to examine
+      * each RTE anyway, though, to check for volatile functions.
+      */
+     relid = 0;
+     foreach(lc, parse->rtable)
+     {
+         RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+
+         relid++;
+
+         /* ignore RTEs that aren't referenced baserels */
+         if (!bms_is_member(relid, allbaserels))
+             continue;
+
+         /* fail if not plain rel */
+         if (rte->rtekind != RTE_RELATION)
+             return false;
+         /* fail if it doesn't have CTIDs */
+         if (rte->relkind != RELKIND_RELATION &&
+             rte->relkind != RELKIND_MATVIEW)
+             return false;
+
+         /* disallow TABLESAMPLE (might be okay if repeatable?) */
+         if (rte->tablesample)
+             return false;
+
+         /* check for volatiles in security barrier quals */
+         if (contain_volatile_functions((Node *) rte->securityQuals))
+             return false;
+     }
+
+     /* OK to proceed */
+     return true;
+ }
+
+ /*
+  * Split the query and the given OR clause into one UNION arm per relation
+  * mentioned in the OR clause, and make a list of best paths for the UNION
+  * arms.  (Since the UNION step will lose any ordering or fast-start
+  * properties of the paths, there's no need to consider any but the
+  * cheapest-total path for each arm; hence it's okay to return just one path
+  * per arm.)
+  * "n" is the OR clause's index in the query's WHERE list.
+  * "armrelids" is the OR-arm-to-referenced-rel mapping.
+  * "allbaserels" is the relids of all baserels used in the query.  We use
+  * this to build the list of CTID variables we'll need to de-duplicate.
+  */
+ static List *
+ create_union_or_subpaths(PlannerInfo *root, BoolExpr *orclause,
+                          int n, List *armrelids, Relids allbaserels)
+ {
+     List       *subpaths = NIL;
+     Relids        orrels;
+     int            relid;
+     ListCell   *lc;
+     ListCell   *lc2;
+
+     /*
+      * There might be multiple OR arms referring to the same rel, which we
+      * should combine into a restriction OR clause.  So first identify the set
+      * of rels used in the OR.
+      */
+     orrels = NULL;
+     foreach(lc, armrelids)
+         orrels = bms_add_member(orrels, lfirst_int(lc));
+
+     /* Now, for each such rel, generate a path for a UNION arm */
+     while ((relid = bms_first_member(orrels)) >= 0)
+     {
+         List       *orarms;
+         PlannerInfo *subroot;
+         Query       *parse;
+         List       *subquery_quals;
+         List       *tlist;
+         int            brelid;
+         bool        hasOuterJoins;
+         RelOptInfo *final_rel;
+         Path       *subpath;
+         int            k;
+         ListCell   *prev;
+
+         /* Extract the OR arms for this rel, making copies for safety */
+         orarms = NIL;
+         forboth(lc, orclause->args, lc2, armrelids)
+         {
+             Node       *qual = (Node *) lfirst(lc);
+             int            qualrelid = lfirst_int(lc2);
+
+             if (qualrelid == relid)
+                 orarms = lappend(orarms, copyObject(qual));
+         }
+         Assert(orarms != NIL);
+         if (list_length(orarms) == 1)
+         {
+             /*
+              * When there's just a single arm for this rel (the typical case),
+              * it goes directly into the subquery's WHERE list.  But it might
+              * be a sub-AND, in which case we must flatten it into a qual list
+              * to preserve AND/OR flatness.
+              */
+             orarms = make_ands_implicit((Expr *) linitial(orarms));
+         }
+         else
+         {
+             /*
+              * When there's more than one arm, convert back to an OR clause.
+              * No flatness worries here, the arms were already valid OR-list
+              * elements.
+              */
+             orarms = list_make1(make_orclause(orarms));
+         }
+
+         /* Clone the planner's state */
+         subroot = (PlannerInfo *) palloc(sizeof(PlannerInfo));
+         memcpy(subroot, root, sizeof(PlannerInfo));
+         subroot->parse = parse = (Query *) copyObject(root->parse);
+         /* XXX likely there's more to do here */
+
+         /* Probably should drop any ordering, limit clauses? */
+         subroot->tuple_fraction = 1.0;
+         subroot->limit_tuples = -1.0;
+
+         /*
+          * Remove the subquery's copy of the original OR clause, which we
+          * identify by its index in the WHERE clause list.
+          */
+         subquery_quals = (List *) parse->jointree->quals;
+         k = 0;
+         prev = NULL;
+         foreach(lc, subquery_quals)
+         {
+             if (k == n)
+             {
+                 subquery_quals = list_delete_cell(subquery_quals, lc, prev);
+                 break;
+             }
+             k++;
+             prev = lc;
+         }
+
+         /* And instead add the qual or quals we extracted from the OR clause */
+         subquery_quals = list_concat(subquery_quals, orarms);
+         parse->jointree->quals = (Node *) subquery_quals;
+
+         /*
+          * We must add all the baserels' CTID columns to the subquery's tlist
+          * so that they'll be available for de-duplication purposes.
+          *
+          * XXX need a way to prevent these from preventing join removal.
+          */
+         tlist = (List *) copyObject(root->processed_tlist);
+         brelid = -1;
+         while ((brelid = bms_next_member(allbaserels, brelid)) >= 0)
+         {
+             Var           *var;
+             char        resname[32];
+             TargetEntry *tle;
+
+             var = makeVar(brelid,
+                           SelfItemPointerAttributeNumber,
+                           TIDOID,
+                           -1,
+                           InvalidOid,
+                           0);
+             snprintf(resname, sizeof(resname), "union_ctid%u", brelid);
+             tle = makeTargetEntry((Expr *) var,
+                                   list_length(tlist) + 1,
+                                   pstrdup(resname),
+                                   true);
+             tlist = lappend(tlist, tle);
+
+         }
+         subroot->processed_tlist = tlist;
+
+         /* Re-apply reduce_outer_joins() in case we can now reduce some */
+         /* (XXX would be better if this just got moved into query_planner) */
+         hasOuterJoins = false;
+         foreach(lc, parse->rtable)
+         {
+             RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+
+             if (rte->rtekind == RTE_JOIN)
+             {
+                 if (IS_OUTER_JOIN(rte->jointype))
+                 {
+                     hasOuterJoins = true;
+                     break;
+                 }
+             }
+         }
+         if (hasOuterJoins)
+             reduce_outer_joins(subroot);
+
+         /* Plan the modified query */
+         final_rel = query_planner(subroot, subroot->processed_tlist,
+                                   union_or_qp_callback, NULL);
+
+         /*
+          * Since we didn't go through subquery_planner() to handle the
+          * subquery, we have to do some of the same cleanup it would do, in
+          * particular cope with params and initplans used within this
+          * subquery.  (This won't matter if we end up not using the subplan.)
+          */
+         SS_identify_outer_params(subroot);
+         SS_charge_for_initplans(subroot, final_rel);
+
+         /*
+          * Get the cheapest-total path for the subquery; there's little value
+          * in considering any others.
+          */
+         subpath = final_rel->cheapest_total_path;
+         Assert(subpath);
+
+         /* Add cheapest-total path to subpaths list */
+         subpaths = lappend(subpaths, subpath);
+     }
+
+     return subpaths;
+ }
+
+ /*
+  * Compute query_pathkeys and other pathkeys during plan generation
+  */
+ static void
+ union_or_qp_callback(PlannerInfo *root, void *extra)
+ {
+     /*
+      * Since the output of the subquery is going to go through a UNION step
+      * that destroys ordering, there's little need to worry about what its
+      * output order is.  Hence, don't bother telling it about pathkeys that
+      * might apply to these later execution steps.
+      */
+     root->group_pathkeys = NIL;
+     root->window_pathkeys = NIL;
+     root->distinct_pathkeys = NIL;
+     root->sort_pathkeys = NIL;
+     root->query_pathkeys = NIL;
+ }
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 94ef84bca9cba62d80187aa174947c74ace525f1..4743a0ca2c58946390d77f6c5e6b6870b0ce6720 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern RelOptInfo *query_planner(Planner
*** 45,50 ****
--- 45,57 ----
  extern void preprocess_minmax_aggregates(PlannerInfo *root, List *tlist);

  /*
+  * prototypes for plan/planunionor.c
+  */
+ extern List *split_join_or_clauses(PlannerInfo *root);
+ extern void finish_union_or_paths(PlannerInfo *root, RelOptInfo *joinrel,
+                       List *union_or_subpaths);
+
+ /*
   * prototypes for plan/createplan.c
   */
  extern Plan *create_plan(PlannerInfo *root, Path *best_path);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Improve OR conditions on joined columns (common star schema problem)