Re: UNION ALL on partitioned tables won't use indices.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: UNION ALL on partitioned tables won't use indices.
Дата
Msg-id 20140114.180447.145186052.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: UNION ALL on partitioned tables won't use indices.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: UNION ALL on partitioned tables won't use indices.  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
This is cont'd from previous CF3.

You'll see the overview and the discussion since in the thread
begins from there. The issue ramains as of current 9.4dev head.

http://www.postgresql.org/message-id/20131024.193953.233464126.horiguchi.kyotaro@lab.ntt.co.jp

The issue in brief is that UNION ALL on inheritance tables does
not make use of indices in contrast to that on bare tables. This
is because of appendrels with the depth more than 2 levels
brought by inheritance tables.

Some of UNION ALL operation - especially using ORDERBY and LIMIT
- will be accelerated by this patch. Details in the message
above.

I proposed 3 types of solution but the one of them - tweaking
Equivalence Class (Type 1)- was dismissed because of
inappropriate manipulation on Equivalence Class. The rest do
essentially the same thing - flattening appendrels - at different
timings.

In type 2, the process is implemented as a generic appendrel
flattening function and applied after expand_inherited_tables()
in subquery_planner.

In type 3, the equivelant process is combined in
expand_inherited_rtentry().

Type 2 loops once for whole appendrel list (in most cases) so it
might be more effective than type 3 which searches the parent
appendrel for each appended ones. Type 3 is more approprieate
design assuming that appendrel tree deeper than 2 levels will be
generated only by expand_inherited_tables().

The attached two patches are rebased to current 9.4dev HEAD and
make check at the topmost directory and src/test/isolation are
passed without error. One bug was found and fixed on the way. It
was an assertion failure caused by probably unexpected type
conversion introduced by collapse_appendrels() which leads
implicit whole-row cast from some valid reltype to invalid
reltype (0) into adjust_appendrel_attrs_mutator().

Attached files are the two versions of patches mentioned above.
- unionall_inh_idx_typ2_v4_20140114.patch- unionall_inh_idx_typ3_v4_20140114.patch

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..582e8c3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index             target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)    }}
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                    list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            
+            return (Node*)copyObject(
+                list_nth(ctx->child_appinfo->translated_vars,
+                         oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set.
@@ -1237,6 +1272,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)    List
*inhOIDs;   List       *appinfos;    ListCell   *l;
 
+    AppendRelInfo *parent_appinfo = NULL;    /* Does RT entry allow inheritance? */    if (!rte->inh)
@@ -1301,6 +1337,21 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
oldrc->isParent= true;    /*
 
+     * If parent relation is appearing in a subselect of UNION ALL, it has
+     * furthur parent appendrelinfo. Save it to pull up inheritance children
+     * later.
+     */
+    foreach(l, root->append_rel_list)
+    {
+        AppendRelInfo *appinfo = (AppendRelInfo *)lfirst(l);
+        if(appinfo->child_relid == rti)
+        {
+            parent_appinfo = appinfo;
+            break;
+        }
+    }
+    
+    /*     * Must open the parent relation to examine its tupdesc.  We need not lock     * it; we assume the rewriter
alreadydid.     */
 
@@ -1378,6 +1429,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)        }        /*
+         * Pull up this appinfo onto just above of the parent. The parent
+         * relation has its own parent when it appears as a subquery of UNION
+         * ALL. Pulling up these children gives a chance to consider
+         * MergeAppend on whole the UNION ALL tree.
+         */
+
+        if (parent_appinfo)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = appinfo;
+            ctx.target_rti      = rti;
+
+            appinfo->parent_relid = parent_appinfo->parent_relid;
+            appinfo->parent_reltype = parent_appinfo->parent_reltype;
+            appinfo->parent_reloid = parent_appinfo->parent_reloid;
+            appinfo->translated_vars = (List*)expression_tree_mutator(
+                parent_appinfo->translated_vars,
+                transvars_merge_mutator, &ctx);
+        }
+
+        /*         * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.         */        if (oldrc)
@@ -1662,7 +1735,8 @@ adjust_appendrel_attrs_mutator(Node *node,                 * step to convert the tuple layout to
theparent's rowtype.                 * Otherwise we have to generate a RowExpr.                 */
 
-                if (OidIsValid(appinfo->child_reltype))
+                if (OidIsValid(appinfo->child_reltype) &&
+                    OidIsValid(appinfo->parent_reltype))                {                    Assert(var->vartype ==
appinfo->parent_reltype);                   if (appinfo->parent_reltype != appinfo->child_reltype)
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..d254ff3 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1da4b2f..f7873a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse,    expand_inherited_tables(root);    /*
+     * Collapse multilevel inheritances into fewer levels.
+     *
+     * UNION ALL containing subselects on inherited tables leaves multilevel
+     * inheritance after calling expand_inherited_tables().  Fold them in
+     * order to make MergeAppend plan available for such queries.
+     */
+    collapse_appendrels(root);
+
+    /*     * Set hasHavingQual to remember if HAVING clause is present.  Needed     * because preprocess_expression
willreduce a constant-true condition to     * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index e249628..165c010 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -57,6 +57,11 @@ typedef struct    AppendRelInfo *appinfo;} adjust_appendrel_attrs_context;
+typedef struct {
+    AppendRelInfo    *child_appinfo;
+    Index              target_rti;
+} transvars_merge_context;
+static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root,                       double tuple_fraction,
               List *colTypes, List *colCollations,
 
@@ -98,6 +103,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,                      List
*input_plans,                     List *refnames_tlist);static List *generate_setop_grouplist(SetOperationStmt *op,
List*targetlist);
 
+static Node *transvars_merge_mutator(Node *node,
+                                     transvars_merge_context *ctx);static void expand_inherited_rtentry(PlannerInfo
*root,RangeTblEntry *rte,                         Index rti);static void make_inh_translation_list(Relation
oldrelation,
@@ -1211,6 +1218,131 @@ expand_inherited_tables(PlannerInfo *root)}/*
+ * collapse_appendrels
+ *      Pulling up the descendents by recursively combining successive
+ *      translations.
+ *
+ *      Although the same thing could be done in adjust_appendrel_attrs(),
+ *      doing it here all through at once is more efficient than individually
+ *      (and maybe repetitively) done there.
+ */
+void
+collapse_appendrels(PlannerInfo *root)
+{
+    Index               last_parent_relid = 1;
+    AppendRelInfo    *last_parent = NULL;
+    ListCell         *lcchild;
+    ListCell         *lcparent;
+    bool             full_search = false;
+
+    foreach(lcchild, root->append_rel_list)
+    {
+        AppendRelInfo    *ch_appinfo         = (AppendRelInfo *)lfirst(lcchild);
+        Index             ch_parent_relid = ch_appinfo->parent_relid;
+        
+        if (last_parent_relid != ch_parent_relid)
+        {
+            /*
+             * Find the parent for the current child appinfo if parent relid
+             * is different from that of preveous child.
+             */
+            do
+            {
+                /*
+                 * For most cases, the relations are referred to as the parent
+                 * in their apperarence order in rtable from
+                 * append_rel_list. So start searching for the parent appinfos
+                 * from just after the last parent. If the incremental search
+                 * was failed, retry searching the entire list and exit on
+                 * failure.
+                 */
+                if (!last_parent)
+                {
+                    /*
+                     * If this is the first time or the preveous search was
+                     * failed, set up for full search.
+                     */
+                    lcparent = list_head(root->append_rel_list);
+                    last_parent_relid = 1;
+                    full_search = true;
+                }
+
+                last_parent = NULL;
+                for_each_cell(lcparent, lcparent)
+                {
+                    /*
+                     * Children's and parents' apprelinfos are bonded via
+                     * rtable relations. So two apprelinfos are in
+                     * parent-child relationship when the child_relid of the
+                     * parent is equal to the parent_relid of the child.
+                     */
+                    AppendRelInfo *papp = (AppendRelInfo*)lfirst(lcparent);
+                    if (papp->child_relid == ch_parent_relid)
+                    {
+                        last_parent = papp;
+                        last_parent_relid = ch_parent_relid;
+                        
+                        /* Search from the next cell next time. */
+                        lcparent = lnext(lcparent);
+                        full_search = false;
+                        break;        /* Parent found */
+                    }
+                }
+                /* Retry only when incremental search was failed. */
+            } while (!full_search && !last_parent);
+        }
+
+        /*
+         * Replace child translated_vars so as to be a direct children of the
+         * topmost relation.
+         */
+        if (last_parent)
+        {
+            transvars_merge_context ctx;
+
+            ctx.child_appinfo = ch_appinfo;
+            ctx.target_rti  = last_parent->child_relid;
+
+            ch_appinfo->translated_vars =
+                (List*)expression_tree_mutator(
+                    (Node*)last_parent->translated_vars,
+                    transvars_merge_mutator, &ctx);
+            ch_appinfo->parent_relid = last_parent->parent_relid;
+            ch_appinfo->parent_reltype = last_parent->parent_reltype;
+        }
+    }
+}
+
+/*
+ * Replace Var nodes with corresponding child nodes.
+ */
+static Node *
+transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
+{
+    if (node == NULL)
+        return NULL;
+
+    if (IsA(node, Var))
+    {
+        Var *oldv = (Var*)node;
+
+        if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
+        {
+            if (oldv->varattno >
+                list_length(ctx->child_appinfo->translated_vars))
+                elog(ERROR,
+                     "attribute %d of relation \"%s\" does not exist",
+                     oldv->varattno,
+                     get_rel_name(ctx->child_appinfo->parent_reloid));
+            return (Node*)copyObject(list_nth(ctx->child_appinfo->translated_vars,
+                                              oldv->varattno - 1));
+        }
+    }
+    return expression_tree_mutator(node,
+                                   transvars_merge_mutator, (void*)ctx);
+}
+
+/* * expand_inherited_rtentry *        Check whether a rangetable entry represents an inheritance set. *        If so,
addentries for all the child tables to the query's
 
@@ -1662,7 +1794,8 @@ adjust_appendrel_attrs_mutator(Node *node,                 * step to convert the tuple layout to
theparent's rowtype.                 * Otherwise we have to generate a RowExpr.                 */
 
-                if (OidIsValid(appinfo->child_reltype))
+                if (OidIsValid(appinfo->child_reltype) &&
+                    OidIsValid(appinfo->parent_reltype))                {                    Assert(var->vartype ==
appinfo->parent_reltype);                   if (appinfo->parent_reltype != appinfo->child_reltype)
 
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 0934e63..7a83938 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -49,6 +49,7 @@ extern Plan *plan_set_operations(PlannerInfo *root, double tuple_fraction,                    List
**sortClauses);externvoid expand_inherited_tables(PlannerInfo *root);
 
+extern void collapse_appendrels(PlannerInfo *root);extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
                    AppendRelInfo *appinfo);
 
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 6f9ee5e..d254ff3 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -502,9 +502,40 @@ explain (costs off)               Index Cond: (ab = 'ab'::text)(7 rows)
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+NOTICE:  merging column "a" with inherited definition
+NOTICE:  merging column "b" with inherited definition
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+NOTICE:  merging column "ab" with inherited definition
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+                    QUERY PLAN                    
+--------------------------------------------------
+ Limit
+   ->  Merge Append
+         Sort Key: ((t1.a || t1.b))
+         ->  Index Scan using t1_ab_idx on t1
+         ->  Index Scan using t1c_expr_idx on t1c
+         ->  Index Scan using t2_pkey on t2
+         ->  Index Scan using t2c_pkey on t2c
+(7 rows)
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off) SELECT * FROM
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index d567cf1..7f9a9b1 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -198,9 +198,29 @@ explain (costs off)  SELECT * FROM t2) t WHERE ab = 'ab';
+--
+-- Test that ORDER BY for UNION ALL can be pushed down on inheritance
+-- tables.
+--
+
+CREATE TEMP TABLE t1c (like t1 including indexes) inherits (t1);      
+CREATE TEMP TABLE t2c (like t2 including indexes) inherits (t2);
+INSERT INTO t1c VALUES ('v', 'w'), ('c', 'd');
+INSERT INTO t2c VALUES ('vw'), ('cd');
+set enable_seqscan = on;
+set enable_indexonlyscan = off;
+
+explain (costs off)
+  SELECT * FROM    
+  (SELECT a || b AS ab FROM t1
+   UNION ALL
+   SELECT * FROM t2) t
+  ORDER BY ab LIMIT 1;
+reset enable_seqscan;reset enable_indexscan;reset enable_bitmapscan;
+reset enable_indexonlyscan;-- Test constraint exclusion of UNION ALL subqueriesexplain (costs off)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: GIN improvements part 1: additional information
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Get more from indices.