Re: Get more from indices.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: Get more from indices.
Дата
Msg-id 20131126.204242.141708563.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Get more from indices.  ("Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp>)
Ответы Re: Get more from indices.  ("Etsuro Fujita" <fujita.etsuro@lab.ntt.co.jp>)
Список pgsql-hackers
Thank you for pointing out.

> > the attched pathkey_and_uniqueindx_v4_20131122.patch is changed as
> > follows.
> 
> The patch is compiled successfully and passes all regression tests.  Also,
> the patch works well for the tests shown in an earlier email from
> Horiguchi-san.  But in this version I found an incorrect behavior.
> 
> postgres=# CREATE TABLE t (a int not null, b int not null, c int, d text);
> CREATE TABLE
> postgres=# CREATE UNIQUE INDEX i_t_ab ON t (a, b);
> CREATE INDEX
> postgres=# INSERT INTO t (SELECT a / 5, 4 - (a % 5), a, 't' FROM
> generate_series(000000, 099999) a);
> INSERT 0 100000
> postgres=# ANALYZE t;
> ANALYZE
> postgres=# CREATE TABLE t2 (e text, f int);
> CREATE TABLE
> postgres=# INSERT INTO t2 VALUES ('t', 2);
> INSERT 0 1
> postgres=# INSERT INTO t2 VALUES ('t', 1);
> INSERT 0 1
> postgres=# ANALYZE t2;
> ANALYZE
> postgres=# EXPLAIN SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e ORDER
> BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
>                                    QUERY PLAN
> ----------------------------------------------------------------------------
> ----
>  Limit  (cost=0.29..9.30 rows=10 width=20)
-  ->  Sort  (cost=92.33..92.57 rows=94 width=20)
-        Sort Key: t.a, t.b, t.c, t.d, t2.f
>    ->  Nested Loop  (cost=0.29..129.99 rows=144 width=20)
>          Join Filter: (t.d = t2.e)
>          ->  Index Scan using i_t_ab on t  (cost=0.29..126.80 rows=72
> width=14)
>                Index Cond: (a < 10)
>          ->  Materialize  (cost=0.00..1.03 rows=2 width=6)
>                ->  Seq Scan on t2  (cost=0.00..1.02 rows=2 width=6)
> (7 rows)

Auch. Outermost sort is omitted but necessary (Prefixed by '-' above).

> ISTM this was brought by the following change.
> 
> > In truncate_useless_pathkeys, if query_pathkeys is longer than pathkeys
> > made from index columns old patch picked up the latter as IndexPath's
> > pathkeys. But the former is more suitable according to the context here.
> 
> >  - truncate_useless_pathkeys returns root->query_pathkeys when
> >    the index is fully-ordered and query_pathkeys contains the
> >    pathkeys made from index columns.

I implicitly put a wrong assumption that query_pathkeys is
dedicated for the relation under work, not whole subquery.

> I think it would be required to modify the patch so that the transformation
> of the pathkeys is performed only when each pathkey of query_pathkeys
> references the indexed relation.

What is wrong here is that IndexPath declares that it is ordered
in the pathkeys which includes the columns not belongs to the
table.

>  (The above change might have been made according to my comment
> in an earlier email I sent.  But I have to admit my explanation
> there was not adequate.  I'm sorry for that.)

Nothing to be sorry for. It's my fault.

Anyway, I've put restriction on pathkeys_useful_for_ordering that
pick up longest pathkeys consists only ec members which matches
the required rel bitmap. With the new patch the planner makes
plan with the omitted sort and the query returns the following
result.

uniontest=# SELECT * FROM t, t2 WHERE t.a < 10 AND t.d = t2.e
|  ORDER BY t.a, t.b, t.c, t.d, t2.f LIMIT 10;
|  a | b | c | d | e | f 
| ---+---+---+---+---+---
|  0 | 0 | 4 | t | t | 1  <-+-- Correct order by t2.f.
|  0 | 0 | 4 | t | t | 2  <-+
|  0 | 1 | 3 | t | t | 1
(snipped.)

> Here are random comments.
> 
> * In grouping_planner(), the patch resets the pathkeys of the cheapest
> presorted index-path to query_pathkeys through
> get_cheapest_fractional_path_for_pathkeys().  Is this necessary?  ISTM the
> transformation of the pathkeys after the scan/join optimization would be no
> longer necessary once it has been done at the index-path creation time, ie,
> build_index_paths().  Why does the patch do that thing?

You're appliciated for pointing out. It is utterly useless code
of older patch. ("Have you totally scrapped the older verions?" >me :-(

Removed it in this version.

> * In get_relation_info(), the patch determines the branch condition
> "keyattno != ObjectIdAttributeNumber".  I fail to understand the meaning of
> this branch condition.  Could you explain about it?

Literally answering, it means oid cannot be NULL (if it exists).

Precisely, it reflects that I believed (or wished) it cannot be
NULL if oid column exists. (Other sysattrs are dismissed because
they are hardly to be a unique index column:-)

Oid in a tuple is retrived using HeapTupleGetOid() in
heap_getsysattr().

|     result = ObjectIdGetDatum(HeapTupleGetOid(tup));

Then HeapTupleHeaderGetOid,

| #define HeapTupleGetOid(tuple) \
|            HeapTupleHeaderGetOid((tuple)->t_data)

Finally asking HeapTupleHeaderGetOid for the value.

htup_details.h:354
| #define HeapTupleHeaderGetOid(tup) \
| ( \
|     ((tup)->t_infomask & HEAP_HASOID) ? \
|         *((Oid *) ((char *)(tup) + (tup)->t_hoff - sizeof(Oid))) \
|     : \
|         InvalidOid \
| )

So oid cannot be NULL after all even though can be InvalidOid.

On the otherhand, it can be NULL according to the definition in
heap.c.

heap.c:146
| static FormData_pg_attribute a2 = {
|     0, {"oid"}, OIDOID, 0, sizeof(Oid),
|     ObjectIdAttributeNumber, 0, -1, -1,
|     true, 'p', 'i', true, false, false, true, 0
| };                    ~~~~

Can we set this to false safely?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 606734a..43be0a5 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -953,7 +953,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,        index_pathkeys =
build_index_pathkeys(root,index,                                              ForwardScanDirection);
useful_pathkeys= truncate_useless_pathkeys(root, rel,
 
-                                                    index_pathkeys);
+                                                    index_pathkeys,
+                                                    index->full_ordered);        orderbyclauses = NIL;
orderbyclausecols= NIL;    }
 
@@ -1015,7 +1016,8 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,        index_pathkeys =
build_index_pathkeys(root,index,                                              BackwardScanDirection);
useful_pathkeys= truncate_useless_pathkeys(root, rel,
 
-                                                    index_pathkeys);
+                                                    index_pathkeys,
+                                                    index->full_ordered);        if (useful_pathkeys != NIL)        {
         ipath = create_index_path(root, index,
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 9c8ede6..47d844b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -369,6 +369,29 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys,}/*
+ * path_is_ordered
+ * Return true if the path is apparently ordered on the pathkeys.
+ */
+bool
+path_is_ordered(Path *path, List *pathkeys)
+{
+    if (pathkeys_contained_in(pathkeys, path->pathkeys))
+        return true;
+
+    /*
+     * If IndexPath is fully ordered, it is sufficiently ordered if index
+     * pathkeys is a subset of target pathkeys
+     */
+    if (pathkeys && path->pathkeys &&
+        IsA(path, IndexPath) &&
+        ((IndexPath*)path)->indexinfo->full_ordered &&
+        (pathkeys_contained_in(path->pathkeys, pathkeys)))
+        return true;
+
+    return false;
+}
+
+/* * get_cheapest_fractional_path_for_pathkeys *      Find the cheapest path (for retrieving a specified fraction of
all*      the tuples) that satisfies the given pathkeys and parameterization.
 
@@ -381,6 +404,8 @@ get_cheapest_path_for_pathkeys(List *paths, List *pathkeys, * 'pathkeys' represents a required
ordering(in canonical form!) * 'required_outer' denotes allowable outer relations for parameterized paths * 'fraction'
isthe fraction of the total tuples expected to be retrieved
 
+ * paths->pathkeys might be replaced with pathkeys so as to declare that the
+ * path is ordered on it. */Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
@@ -403,7 +428,7 @@ get_cheapest_fractional_path_for_pathkeys(List *paths,
compare_fractional_path_costs(matched_path,path, fraction) <= 0)            continue;
 
-        if (pathkeys_contained_in(pathkeys, path->pathkeys) &&
+        if (path_is_ordered(path, pathkeys) &&            bms_is_subset(PATH_REQ_OUTER(path), required_outer))
  matched_path = path;    }
 
@@ -798,7 +823,7 @@ build_join_pathkeys(PlannerInfo *root,     * contain pathkeys that were useful for forming this
joinrelbut are     * uninteresting to higher levels.     */
 
-    return truncate_useless_pathkeys(root, joinrel, outer_pathkeys);
+    return truncate_useless_pathkeys(root, joinrel, outer_pathkeys,
false);}/****************************************************************************
@@ -1455,7 +1480,8 @@ right_merge_direction(PlannerInfo *root, PathKey *pathkey) * So the result is always either 0 or
list_length(root->query_pathkeys).*/static int
 
-pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys)
+pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys,
+                             bool pk_full_ordered, Bitmapset *rel){    if (root->query_pathkeys == NIL)        return
0;               /* no special ordering requested */
 
@@ -1469,23 +1495,71 @@ pathkeys_useful_for_ordering(PlannerInfo *root, List *pathkeys)        return
list_length(root->query_pathkeys);   }
 
+    /*
+     * Given that the pathkeys compose an unique key, they are useful for
+     * ordering when they are a sublist of query_pathkeys. Expand pathkeys to
+     * root->query_pathkeys in this case.
+     */
+    if (pk_full_ordered &&
+        pathkeys_contained_in(pathkeys, root->query_pathkeys))
+    {
+        int len = list_length(pathkeys);
+        Assert(!bms_is_empty(rel));
+        if (list_length(root->query_pathkeys) > len)
+        {
+            /*
+             * Extend useful_pathkeys up to as long as relids of EC has a
+             * matche for the target relations.
+             */
+            ListCell *lcpk = list_head(root->query_pathkeys);
+            int i;
+
+            /* Skip known section, list_nth_cell is not extern'ed */
+            for (i = 0 ; i < len ; i++)
+                lcpk = lnext(lcpk);
+
+            for_each_cell(lcpk, lcpk)
+            {
+                ListCell *lcm;
+                PathKey *pk = (PathKey *) lfirst(lcpk);
+                foreach (lcm, pk->pk_eclass->ec_members)
+                {
+                    EquivalenceMember *em = (EquivalenceMember *) lfirst(lcm);
+                    if (bms_is_subset(em->em_relids, rel))
+                        break;
+                }
+                if (!lcm)
+                    break;
+
+                len++;
+            }
+        }
+
+        return len;
+    }
+    return 0;                    /* path ordering not useful */}/* * truncate_useless_pathkeys *        Shorten the
givenpathkey list to just the useful pathkeys.
 
+ *
+ * When pk_full_ordered is true, reconsider counting the uniqueness even if
+ * the pathkeys are once judged as useless by the general criteria. */List *truncate_useless_pathkeys(PlannerInfo
*root,                         RelOptInfo *rel,
 
-                          List *pathkeys)
+                          List *pathkeys,
+                          bool pk_full_ordered){    int            nuseful;    int            nuseful2;    nuseful =
pathkeys_useful_for_merging(root,rel, pathkeys);
 
-    nuseful2 = pathkeys_useful_for_ordering(root, pathkeys);
+    nuseful2 = pathkeys_useful_for_ordering(root, pathkeys,
+                                            pk_full_ordered, rel->relids);    if (nuseful2 > nuseful)        nuseful =
nuseful2;
@@ -1498,7 +1572,7 @@ truncate_useless_pathkeys(PlannerInfo *root,    else if (nuseful == list_length(pathkeys))
returnpathkeys;    else
 
-        return list_truncate(list_copy(pathkeys), nuseful);
+        return list_truncate(list_copy(root->query_pathkeys), nuseful);}/*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 954666c..c6d45e3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -231,6 +231,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,             */
if (info->relam == BTREE_AM_OID)            {
 
+                bool has_nullable_keycol = false;
+                /*                 * If it's a btree index, we can use its opfamily OIDs                 * directly as
thesort ordering opfamily OIDs.
 
@@ -244,10 +246,26 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,                for (i
=0; i < ncolumns; i++)                {                    int16        opt = indexRelation->rd_indoption[i];
 
+                    int16        keyattno = index->indkey.values[i];                    info->reverse_sort[i] = (opt &
INDOPTION_DESC)!= 0;                    info->nulls_first[i] = (opt & INDOPTION_NULLS_FIRST) != 0;
 
-                }
+
+                    if (keyattno > 0)
+                    {
+                        has_nullable_keycol |= 
+                            !relation->rd_att->attrs[keyattno - 1]->attnotnull;
+                    }
+                    /* OID should be NULL if exists. */
+                    else if (keyattno != ObjectIdAttributeNumber)
+                        has_nullable_keycol = true;
+                 }
+
+                /* Check to see if it is a full-ordered index */
+                if (IndexIsValid(index) &&
+                    index->indisunique && index->indimmediate &&
+                    !has_nullable_keycol)
+                    info->full_ordered = true;            }            else if (indexRelation->rd_am->amcanorder)
     {
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6d7b594..b38c667 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -524,6 +524,7 @@ typedef struct IndexOptInfo    bool        predOK;            /* true if predicate matches query */
  bool        unique;            /* true if a unique index */    bool        immediate;        /* is uniqueness
enforcedimmediately? */
 
+    bool        full_ordered;    /* is ordered rows not duplicate ? */    bool        hypothetical;    /* true if
indexdoesn't really exist */    bool        canreturn;        /* can index return IndexTuples? */    bool
amcanorderbyop;/* does AM support order by operator result? */ 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 999adaa..1987659 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -194,7 +194,8 @@ extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,                              List
*outer_pathkeys);externList *truncate_useless_pathkeys(PlannerInfo *root,                          RelOptInfo *rel,
 
-                          List *pathkeys);
+                          List *pathkeys,
+                          bool pk_full_ordered);extern bool has_useful_pathkeys(PlannerInfo *root, RelOptInfo
*rel);#endif  /* PATHS_H */
 
diff --git a/src/test/isolation/expected/drop-index-concurrently-1.out
b/src/test/isolation/expected/drop-index-concurrently-1.out
index 75dff56..ab96fa0 100644
--- a/src/test/isolation/expected/drop-index-concurrently-1.out
+++ b/src/test/isolation/expected/drop-index-concurrently-1.out
@@ -19,10 +19,8 @@ Sortstep explains: EXPLAIN (COSTS OFF) EXECUTE getrow_seq;QUERY PLAN     
-Sort           
-  Sort Key: id, data
-  ->  Seq Scan on test_dc
-        Filter: ((data)::text = '34'::text)
+Index Scan using test_dc_pkey on test_dc
+  Filter: ((data)::text = '34'::text)step select2: SELECT * FROM test_dc WHERE data=34 ORDER BY id,data;id
data           

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Sequence Access Method WIP
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Completing PL support for Event Triggers