Re: Avoid full GIN index scan when possible

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Avoid full GIN index scan when possible
Дата
Msg-id 17590.1564685940@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Avoid full GIN index scan when possible  (Julien Rouhaud <rjuju123@gmail.com>)
Ответы Re: Avoid full GIN index scan when possible  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Список pgsql-hackers
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Thu, Aug 1, 2019 at 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Eyeing this a bit further ... doesn't scanPendingInsert also need
>> to honor so->forcedRecheck?  Something along the lines of

> I think so.

Yeah, it does --- the updated pg_trgm test attached fails if it doesn't.

Also, I found that Alexander's concern upthread:

>> What would happen when two-columns index have GIN_SEARCH_MODE_DEFAULT
>> scan on first column and GIN_SEARCH_MODE_ALL on second?  I think even
>> if triconsistent() for second column returns GIN_TRUE, we still need
>> to recheck to verify second columns is not NULL.

is entirely on-point.  This patch generates the wrong answer in the
case I added to gin.sql below.  (The expected output was generated
with HEAD and seems correct, but with these code changes, we incorrectly
report the row with NULL as matching.  So I expect the cfbot is going
to complain about the patch in this state.)

While I've not attempted to fix that here, I wonder whether we shouldn't
fix it by just forcing forcedRecheck to true in any case where we discard
an ALL qualifier.  That would get rid of all the ugliness around
ginFillScanKey, which I'd otherwise really want to refactor to avoid
this business of adding and then removing a scan key.  It would also
get rid of the bit about "XXX Need to use ALL mode instead of EVERYTHING
to skip NULLs if ALL mode has been seen", which aside from being ugly
seems to be dead wrong for multi-column-index cases.

BTW, it's not particularly the fault of this patch, but: what does it
even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?
Should we decide to treat that as an error?  It doesn't look to me like
any of the in-tree opclasses will return such a case, and I'm not at
all convinced what the GIN scan code would actually do with it, except
that I doubt it matches the documentation.

Setting this back to Waiting on Author.

            regards, tom lane

diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index b3e709f..3e5ba9b 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -3498,6 +3498,68 @@ select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}';
   1000
 (1 row)

+-- check handling of indexquals that generate no searchable conditions
+explain (costs off)
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on test_trgm
+         Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+         ->  Bitmap Index Scan on trgm_idx
+               Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+(5 rows)
+
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+ count
+-------
+    19
+(1 row)
+
+explain (costs off)
+select count(*) from test_trgm where t like '%99%' and t like '%qw%';
+                               QUERY PLAN
+-------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on test_trgm
+         Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text))
+         ->  Bitmap Index Scan on trgm_idx
+               Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text))
+(5 rows)
+
+select count(*) from test_trgm where t like '%99%' and t like '%qw%';
+ count
+-------
+    19
+(1 row)
+
+-- ensure that pending-list items are handled correctly, too
+create temp table t_test_trgm(t text COLLATE "C");
+create index t_trgm_idx on t_test_trgm using gin (t gin_trgm_ops);
+insert into t_test_trgm values ('qwerty99'), ('qwerty01');
+explain (costs off)
+select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%';
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on t_test_trgm
+         Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+         ->  Bitmap Index Scan on t_trgm_idx
+               Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+(5 rows)
+
+select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%';
+ count
+-------
+     1
+(1 row)
+
+select count(*) from t_test_trgm where t like '%99%' and t like '%qw%';
+ count
+-------
+     1
+(1 row)
+
 create table test2(t text COLLATE "C");
 insert into test2 values ('abcdef');
 insert into test2 values ('quark');
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index 08459e6..dcfd3c2 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -55,6 +55,22 @@ select t,similarity(t,'gwertyu0988') as sml from test_trgm where t % 'gwertyu098
 select t,similarity(t,'gwertyu1988') as sml from test_trgm where t % 'gwertyu1988' order by sml desc, t;
 select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}';

+-- check handling of indexquals that generate no searchable conditions
+explain (costs off)
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+explain (costs off)
+select count(*) from test_trgm where t like '%99%' and t like '%qw%';
+select count(*) from test_trgm where t like '%99%' and t like '%qw%';
+-- ensure that pending-list items are handled correctly, too
+create temp table t_test_trgm(t text COLLATE "C");
+create index t_trgm_idx on t_test_trgm using gin (t gin_trgm_ops);
+insert into t_test_trgm values ('qwerty99'), ('qwerty01');
+explain (costs off)
+select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%';
+select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%';
+select count(*) from t_test_trgm where t like '%99%' and t like '%qw%';
+
 create table test2(t text COLLATE "C");
 insert into test2 values ('abcdef');
 insert into test2 values ('quark');
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index b18ae2b..65ed8b2 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -1814,7 +1814,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
          * consistent functions.
          */
         oldCtx = MemoryContextSwitchTo(so->tempCtx);
-        recheck = false;
+        recheck = so->forcedRecheck;
         match = true;

         for (i = 0; i < so->nkeys; i++)
@@ -1888,9 +1888,14 @@ gingetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
     {
         CHECK_FOR_INTERRUPTS();

+        /* Get next item ... */
         if (!scanGetItem(scan, iptr, &iptr, &recheck))
             break;

+        /* ... apply forced recheck if required ... */
+        recheck |= so->forcedRecheck;
+
+        /* ... and transfer it into bitmap */
         if (ItemPointerIsLossyPage(&iptr))
             tbm_add_page(tbm, ItemPointerGetBlockNumber(&iptr));
         else
diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c
index 74d9821..3d5c7e3 100644
--- a/src/backend/access/gin/ginscan.c
+++ b/src/backend/access/gin/ginscan.c
@@ -140,8 +140,12 @@ ginFillScanKey(GinScanOpaque so, OffsetNumber attnum,
     uint32        nUserQueryValues = nQueryValues;
     uint32        i;

-    /* Non-default search modes add one "hidden" entry to each key */
-    if (searchMode != GIN_SEARCH_MODE_DEFAULT)
+    /*
+     * Non-default search modes add one "hidden" entry to each key, unless ALL
+     * key with no entries as those only need to call triConsistentFn
+     */
+    if (searchMode != GIN_SEARCH_MODE_DEFAULT &&
+        !(searchMode == GIN_SEARCH_MODE_ALL && (nQueryValues <= 0)))
         nQueryValues++;
     key->nentries = nQueryValues;
     key->nuserentries = nUserQueryValues;
@@ -265,6 +269,7 @@ ginNewScanKey(IndexScanDesc scan)
     GinScanOpaque so = (GinScanOpaque) scan->opaque;
     int            i;
     bool        hasNullQuery = false;
+    bool        hasSearchAllMode = false;
     MemoryContext oldCtx;

     /*
@@ -286,6 +291,7 @@ ginNewScanKey(IndexScanDesc scan)
         palloc(so->allocentries * sizeof(GinScanEntry));

     so->isVoidRes = false;
+    so->forcedRecheck = false;

     for (i = 0; i < scan->numberOfKeys; i++)
     {
@@ -371,6 +377,18 @@ ginNewScanKey(IndexScanDesc scan)
                        skey->sk_argument, nQueryValues,
                        queryValues, categories,
                        partial_matches, extra_data);
+
+        if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0)
+        {
+            /*
+             * Don't emit ALL key with no entries, check only whether
+             * unconditional recheck is needed.
+             */
+            GinScanKey    key = &so->keys[--so->nkeys];
+
+            hasSearchAllMode = true;
+            so->forcedRecheck |= key->triConsistentFn(key) != GIN_TRUE;
+        }
     }

     /*
@@ -384,6 +402,13 @@ ginNewScanKey(IndexScanDesc scan)
                        InvalidStrategy, GIN_SEARCH_MODE_EVERYTHING,
                        (Datum) 0, 0,
                        NULL, NULL, NULL, NULL);
+
+        /*
+         * XXX Need to use ALL mode instead of EVERYTHING to skip NULLs if ALL
+         * mode has been seen.
+         */
+        if (hasSearchAllMode)
+            so->keys[so->nkeys - 1].scanEntry[0]->searchMode = GIN_SEARCH_MODE_ALL;
     }

     /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 7eba59e..1a9d76d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6326,6 +6326,16 @@ gincost_pattern(IndexOptInfo *index, int indexcol,
         return false;
     }

+    if (nentries <= 0 && searchMode == GIN_SEARCH_MODE_ALL)
+    {
+        /*
+         * GIN does not emit scan entries for empty GIN_SEARCH_MODE_ALL keys,
+         * and it can avoid full index scan if there are entries from other
+         * keys, so we can skip setting of 'haveFullScan' flag.
+         */
+        return true;
+    }
+
     for (i = 0; i < nentries; i++)
     {
         /*
@@ -6709,7 +6719,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
         return;
     }

-    if (counts.haveFullScan || indexQuals == NIL)
+    if (counts.haveFullScan || indexQuals == NIL || counts.searchEntries <= 0)
     {
         /*
          * Full index scan will be required.  We treat this as if every key in
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index afb3e15..b0251f7 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -359,6 +359,7 @@ typedef struct GinScanOpaqueData
     MemoryContext keyCtx;        /* used to hold key and entry data */

     bool        isVoidRes;        /* true if query is unsatisfiable */
+    bool        forcedRecheck;    /* must recheck all returned tuples */
 } GinScanOpaqueData;

 typedef GinScanOpaqueData *GinScanOpaque;
diff --git a/src/test/regress/expected/gin.out b/src/test/regress/expected/gin.out
index a3911a6..fb0d29c 100644
--- a/src/test/regress/expected/gin.out
+++ b/src/test/regress/expected/gin.out
@@ -1,7 +1,7 @@
 --
 -- Test GIN indexes.
 --
--- There are other tests to test different GIN opclassed. This is for testing
+-- There are other tests to test different GIN opclasses. This is for testing
 -- GIN itself.
 -- Create and populate a test table with a GIN index.
 create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off);
@@ -35,3 +35,31 @@ insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 1000) g;
 insert into gin_test_tbl select array[1, 3, g] from generate_series(1, 1000) g;
 delete from gin_test_tbl where i @> array[2];
 vacuum gin_test_tbl;
+-- Test optimization of empty queries
+create temp table t_gin_test_tbl(i int4[], j int4[]);
+create index on t_gin_test_tbl using gin (i, j);
+insert into t_gin_test_tbl select array[100,g], array[200,g]
+from generate_series(1, 10) g;
+insert into t_gin_test_tbl values(array[0,0], null);
+set enable_seqscan = off;
+explain
+select * from t_gin_test_tbl where array[0] <@ i;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Bitmap Heap Scan on t_gin_test_tbl  (cost=12.03..20.49 rows=4 width=64)
+   Recheck Cond: ('{0}'::integer[] <@ i)
+   ->  Bitmap Index Scan on t_gin_test_tbl_i_j_idx  (cost=0.00..12.03 rows=4 width=0)
+         Index Cond: (i @> '{0}'::integer[])
+(4 rows)
+
+select * from t_gin_test_tbl where array[0] <@ i;
+   i   | j
+-------+---
+ {0,0} |
+(1 row)
+
+select * from t_gin_test_tbl where array[0] <@ i and '{}'::int4[] <@ j;
+ i | j
+---+---
+(0 rows)
+
diff --git a/src/test/regress/sql/gin.sql b/src/test/regress/sql/gin.sql
index c566e9b..aaf9c19 100644
--- a/src/test/regress/sql/gin.sql
+++ b/src/test/regress/sql/gin.sql
@@ -1,7 +1,7 @@
 --
 -- Test GIN indexes.
 --
--- There are other tests to test different GIN opclassed. This is for testing
+-- There are other tests to test different GIN opclasses. This is for testing
 -- GIN itself.

 -- Create and populate a test table with a GIN index.
@@ -34,3 +34,15 @@ insert into gin_test_tbl select array[1, 3, g] from generate_series(1, 1000) g;

 delete from gin_test_tbl where i @> array[2];
 vacuum gin_test_tbl;
+
+-- Test optimization of empty queries
+create temp table t_gin_test_tbl(i int4[], j int4[]);
+create index on t_gin_test_tbl using gin (i, j);
+insert into t_gin_test_tbl select array[100,g], array[200,g]
+from generate_series(1, 10) g;
+insert into t_gin_test_tbl values(array[0,0], null);
+set enable_seqscan = off;
+explain
+select * from t_gin_test_tbl where array[0] <@ i;
+select * from t_gin_test_tbl where array[0] <@ i;
+select * from t_gin_test_tbl where array[0] <@ i and '{}'::int4[] <@ j;

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Corey Huinker
Дата:
Сообщение: Re: \describe*
Следующее
От: Tom Lane
Дата:
Сообщение: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)