[HACKERS] Fixing matching of boolean index columns to sort ordering

Поиск
Список
Период
Сортировка
От Tom Lane
Тема [HACKERS] Fixing matching of boolean index columns to sort ordering
Дата
Msg-id 1788.1481605684@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: [HACKERS] Fixing matching of boolean index columns to sort ordering  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-hackers
The attached patch addresses the complaint raised in
https://www.postgresql.org/message-id/CAHt_Luuao4gd6De61GryK=2ff-MTgHzjqffdjz02uSdVqYmKKQ@mail.gmail.com

namely, that if you have an index on, say, integer columns i and j,
then the planner will figure out that it can use an indexscan with
no additional sort for a query like
    select * from tab where i = 42 order by j;
but the same sort of thing does not work when the first column is
a boolean.  You would think that this query is entirely isomorphic
to the one above:
    select * from tab where b = true order by j;
but it isn't, because in expression preprocessing we simplify that to
    select * from tab where b order by j;
Now, there's no equality condition so no EquivalenceClass is created
containing "b", and it's the presence of the EquivalenceClass that
drives the code that recognizes that the first index column can be
ignored while deciding what sort order the index produces.

The patch fixes that through the expedient of matching boolean index
columns to the restriction clauses for "tab", and when it finds a
match, acting as though we'd found a match to an EquivalenceClass
containing a constant.  This is pretty ugly, but no more so than
several other existing special cases for boolean index columns.

Those special cases would largely go away if we were to canonicalize
"WHERE b" into "WHERE b = true" rather than the reverse, so you might
reasonably ask why we don't do that.  I've asked myself that every time
I had to add another one of these special cases :-(, but the answer is
the same every time: where would you stop?  Every WHERE clause is a
boolean expression, so there's no principled reason why such a rule
wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
wreaking havoc on every other optimization we have.  Restricting it
to only apply to simple boolean columns is no answer because (a) indexes
can be on boolean expressions not just simple columns, and (b) part
of the point of the canonicalization is to make logically-equivalent
expressions look alike, so declining to apply it in some cases would
ruin that.

So, for better or worse, our approach is to simplify out "= true"
and then do whatever pushups we have to do at lower levels to keep
useful cases working nicely.  This is another such pushup.

I'll add this to the next commitfest --- it could use some review
to see if I missed anything.

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2952bfb..7a1a498 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
*************** relation_has_unique_index_for(PlannerInf
*** 3025,3030 ****
--- 3025,3076 ----
      return false;
  }

+ /*
+  * indexcol_is_bool_constant_for_query
+  *
+  * If an index column is constrained to have a constant value by the query's
+  * WHERE conditions, then it's irrelevant for sort-order considerations.
+  * Usually that means we have a restriction clause WHERE indexcol = constant,
+  * which gets turned into an EquivalenceClass containing a constant, which
+  * is recognized as redundant by build_index_pathkeys().  But if the index
+  * column is a boolean variable (or expression), then we are not going to
+  * see WHERE indexcol = constant, because expression preprocessing will have
+  * simplified that to "WHERE indexcol" or "WHERE NOT indexcol".  So we are not
+  * going to have a matching EquivalenceClass (unless the query also contains
+  * "ORDER BY indexcol").  To allow such cases to work the same as they would
+  * for non-boolean values, this function is provided to detect whether the
+  * specified index column matches a boolean restriction clause.
+  */
+ bool
+ indexcol_is_bool_constant_for_query(IndexOptInfo *index, int indexcol)
+ {
+     ListCell   *lc;
+
+     /* If the index isn't boolean, we can't possibly get a match */
+     if (!IsBooleanOpfamily(index->opfamily[indexcol]))
+         return false;
+
+     /* Check each restriction clause for the index's rel */
+     foreach(lc, index->rel->baserestrictinfo)
+     {
+         RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+         /*
+          * As in match_clause_to_indexcol, never match pseudoconstants to
+          * indexes.  (It might be semantically okay to do so here, but the
+          * odds of getting a match are negligible, so don't waste the cycles.)
+          */
+         if (rinfo->pseudoconstant)
+             continue;
+
+         /* See if we can match the clause's expression to the index column */
+         if (match_boolean_index_clause((Node *) rinfo->clause, indexcol, index))
+             return true;
+     }
+
+     return false;
+ }
+

  /****************************************************************************
   *                ----  ROUTINES TO CHECK OPERANDS  ----
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 4436ac1..459498c 100644
*** a/src/backend/optimizer/path/pathkeys.c
--- b/src/backend/optimizer/path/pathkeys.c
*************** build_index_pathkeys(PlannerInfo *root,
*** 480,496 ****
                                                index->rel->relids,
                                                false);

!         /*
!          * If the sort key isn't already present in any EquivalenceClass, then
!          * it's not an interesting sort order for this query.  So we can stop
!          * now --- lower-order sort keys aren't useful either.
!          */
!         if (!cpathkey)
!             break;
!
!         /* Add to list unless redundant */
!         if (!pathkey_is_redundant(cpathkey, retval))
!             retval = lappend(retval, cpathkey);

          i++;
      }
--- 480,509 ----
                                                index->rel->relids,
                                                false);

!         if (cpathkey)
!         {
!             /*
!              * We found the sort key in an EquivalenceClass, so it's relevant
!              * for this query.  Add it to list, unless it's redundant.
!              */
!             if (!pathkey_is_redundant(cpathkey, retval))
!                 retval = lappend(retval, cpathkey);
!         }
!         else
!         {
!             /*
!              * Boolean index keys might be redundant even if they do not
!              * appear in an EquivalenceClass, because of our special treatment
!              * of boolean equality conditions --- see the comment for
!              * indexcol_is_bool_constant_for_query().  If that applies, we can
!              * continue to examine lower-order index columns.  Otherwise, the
!              * sort key is not an interesting sort order for this query, so we
!              * should stop considering index columns; any lower-order sort
!              * keys won't be useful either.
!              */
!             if (!indexcol_is_bool_constant_for_query(index, i))
!                 break;
!         }

          i++;
      }
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 44abe83..dbbb686 100644
*** a/src/include/optimizer/paths.h
--- b/src/include/optimizer/paths.h
*************** extern void create_index_paths(PlannerIn
*** 66,71 ****
--- 66,73 ----
  extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
                                List *restrictlist,
                                List *exprlist, List *oprlist);
+ extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
+                                     int indexcol);
  extern bool match_index_to_operand(Node *operand, int indexcol,
                         IndexOptInfo *index);
  extern void expand_indexqual_conditions(IndexOptInfo *index,
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index e663f9a..c6dfb26 100644
*** a/src/test/regress/expected/create_index.out
--- b/src/test/regress/expected/create_index.out
*************** explain (costs off)
*** 2953,2958 ****
--- 2953,3000 ----
  (2 rows)

  --
+ -- Check matching of boolean index columns to WHERE conditions and sort keys
+ --
+ create temp table boolindex (b bool, i int, unique(b, i), junk float);
+ explain (costs off)
+   select * from boolindex order by b, i limit 10;
+                       QUERY PLAN
+ -------------------------------------------------------
+  Limit
+    ->  Index Scan using boolindex_b_i_key on boolindex
+ (2 rows)
+
+ explain (costs off)
+   select * from boolindex where b order by i limit 10;
+                       QUERY PLAN
+ -------------------------------------------------------
+  Limit
+    ->  Index Scan using boolindex_b_i_key on boolindex
+          Index Cond: (b = true)
+          Filter: b
+ (4 rows)
+
+ explain (costs off)
+   select * from boolindex where b = true order by i desc limit 10;
+                            QUERY PLAN
+ ----------------------------------------------------------------
+  Limit
+    ->  Index Scan Backward using boolindex_b_i_key on boolindex
+          Index Cond: (b = true)
+          Filter: b
+ (4 rows)
+
+ explain (costs off)
+   select * from boolindex where not b order by i limit 10;
+                       QUERY PLAN
+ -------------------------------------------------------
+  Limit
+    ->  Index Scan using boolindex_b_i_key on boolindex
+          Index Cond: (b = false)
+          Filter: (NOT b)
+ (4 rows)
+
+ --
  -- REINDEX (VERBOSE)
  --
  CREATE TABLE reindex_verbose(id integer primary key);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 71f4f54..822c34a 100644
*** a/src/test/regress/sql/create_index.sql
--- b/src/test/regress/sql/create_index.sql
*************** explain (costs off)
*** 1011,1016 ****
--- 1011,1031 ----
    select * from tenk1 where (thousand, tenthous) in ((1,1001), (null,null));

  --
+ -- Check matching of boolean index columns to WHERE conditions and sort keys
+ --
+
+ create temp table boolindex (b bool, i int, unique(b, i), junk float);
+
+ explain (costs off)
+   select * from boolindex order by b, i limit 10;
+ explain (costs off)
+   select * from boolindex where b order by i limit 10;
+ explain (costs off)
+   select * from boolindex where b = true order by i desc limit 10;
+ explain (costs off)
+   select * from boolindex where not b order by i limit 10;
+
+ --
  -- REINDEX (VERBOSE)
  --
  CREATE TABLE reindex_verbose(id integer primary key);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)",File: "xlog.c", Line: 10200)
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol