Обсуждение: BUG #16303: A condtion whether an index-only scan is possible includes a wrong

Поиск
Список
Период
Сортировка

BUG #16303: A condtion whether an index-only scan is possible includes a wrong

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16303
Logged by:          Horimoto Yasuhiro
Email address:      horimoto@clear-code.com
PostgreSQL version: 12.2
Operating system:   Debian 10.3
Description:

Hello, developers.

I think that the condition of whether an index-only scan is possible
includes a wrong.

For example, in the following case, the index has no data to return. Because
the query doesn't use specify columns.
However, the query planner choice index-only scan.

create table gist_count_tbl (tsv tsvector);
insert into gist_count_tbl values (null);
create index gist_count_tbl_index on gist_count_tbl using gist (tsv);

vacuum analyze gist_count_tbl;

set enable_seqscan=off;
set enable_bitmapscan=off;
set enable_indexonlyscan=on;

explain (costs off)
select count(*) from gist_count_tbl;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Aggregate
   ->  Index Only Scan using gist_count_tbl_index on gist_count_tbl
(2 rows)

In my opinion, we expected that the query planner doesn't choose an
index-only scan in the above case. 

In fact, index_canreturn_attrs of
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/indxpath.c#L1951
is NULL in the above case.

thanks!


Re: BUG #16303: A condtion whether an index-only scan is possibleincludes a wrong

От
Horimoto Yasuhiro
Дата:
I send a patch for this problem.

thanks!

From: PG Bug reporting form <noreply@postgresql.org>
Subject: BUG #16303: A condtion whether an index-only scan is possible includes a wrong
Date: Mon, 16 Mar 2020 06:02:25 +0000

> The following bug has been logged on the website:
> 
> Bug reference:      16303
> Logged by:          Horimoto Yasuhiro
> Email address:      horimoto@clear-code.com
> PostgreSQL version: 12.2
> Operating system:   Debian 10.3
> Description:        
> 
> Hello, developers.
> 
> I think that the condition of whether an index-only scan is possible
> includes a wrong.
> 
> For example, in the following case, the index has no data to return. Because
> the query doesn't use specify columns.
> However, the query planner choice index-only scan.
> 
> create table gist_count_tbl (tsv tsvector);
> insert into gist_count_tbl values (null);
> create index gist_count_tbl_index on gist_count_tbl using gist (tsv);
> 
> vacuum analyze gist_count_tbl;
> 
> set enable_seqscan=off;
> set enable_bitmapscan=off;
> set enable_indexonlyscan=on;
> 
> explain (costs off)
> select count(*) from gist_count_tbl;
>                              QUERY PLAN                             
> --------------------------------------------------------------------
>  Aggregate
>    ->  Index Only Scan using gist_count_tbl_index on gist_count_tbl
> (2 rows)
> 
> In my opinion, we expected that the query planner doesn't choose an
> index-only scan in the above case. 
> 
> In fact, index_canreturn_attrs of
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/indxpath.c#L1951
> is NULL in the above case.
> 
> thanks!
>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..8fd3df8d69 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1960,8 +1960,12 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
     index_canreturn_attrs = bms_del_members(index_canreturn_attrs,
                                             index_cannotreturn_attrs);
 
-    /* Do we have all the necessary attributes? */
-    result = bms_is_subset(attrs_used, index_canreturn_attrs);
+    if (index_canreturn_attrs == NULL)
+        /* We don't have indexes that can return attributes. */
+        result = false;
+    else
+        /* Do we have all the necessary attributes? */
+        result = bms_is_subset(attrs_used, index_canreturn_attrs);
 
     bms_free(attrs_used);
     bms_free(index_canreturn_attrs);
diff --git a/src/test/regress/expected/gist.out b/src/test/regress/expected/gist.out
index 90edb4061d..a1b9fea2c2 100644
--- a/src/test/regress/expected/gist.out
+++ b/src/test/regress/expected/gist.out
@@ -317,3 +317,30 @@ reset enable_seqscan;
 reset enable_bitmapscan;
 reset enable_indexonlyscan;
 drop table gist_tbl;
+--
+-- Test Index-only plans on GiST indexes when columns needless query.
+--
+create table gist_count_tbl (tsv tsvector);
+insert into gist_count_tbl values (null);
+create index gist_count_tbl_index on gist_count_tbl using gist (tsv);
+vacuum analyze gist_count_tbl;
+set enable_seqscan=off;
+set enable_bitmapscan=off;
+set enable_indexonlyscan=on;
+-- Test that an index-only scan is not chosen,
+-- when the query doesn't use specify columns.
+-- Because the index has not data to return in this case.
+explain (costs off)
+select count(*) from gist_count_tbl;
+            QUERY PLAN            
+----------------------------------
+ Aggregate
+   ->  Seq Scan on gist_count_tbl
+(2 rows)
+
+-- Clean up
+reset enable_seqscan;
+reset enable_bitmapscan;
+reset enable_indexonlyscan;
+drop index gist_count_tbl_index;
+drop table gist_count_tbl;
diff --git a/src/test/regress/sql/gist.sql b/src/test/regress/sql/gist.sql
index b9d398ea94..09d4372a7f 100644
--- a/src/test/regress/sql/gist.sql
+++ b/src/test/regress/sql/gist.sql
@@ -148,3 +148,30 @@ reset enable_bitmapscan;
 reset enable_indexonlyscan;
 
 drop table gist_tbl;
+
+--
+-- Test Index-only plans on GiST indexes when columns needless query.
+--
+create table gist_count_tbl (tsv tsvector);
+insert into gist_count_tbl values (null);
+create index gist_count_tbl_index on gist_count_tbl using gist (tsv);
+
+vacuum analyze gist_count_tbl;
+
+set enable_seqscan=off;
+set enable_bitmapscan=off;
+set enable_indexonlyscan=on;
+
+-- Test that an index-only scan is not chosen,
+-- when the query doesn't use specify columns.
+-- Because the index has not data to return in this case.
+explain (costs off)
+select count(*) from gist_count_tbl;
+
+-- Clean up
+reset enable_seqscan;
+reset enable_bitmapscan;
+reset enable_indexonlyscan;
+
+drop index gist_count_tbl_index;
+drop table gist_count_tbl;

Re: BUG #16303: A condtion whether an index-only scan is possibleincludes a wrong

От
"David G. Johnston"
Дата:
On Sunday, March 15, 2020, Horimoto Yasuhiro <horimoto@clear-code.com> wrote:
I send a patch for this problem.!

From: PG Bug reporting form <noreply@postgresql.org>
Subject: BUG #16303: A condtion whether an index-only scan is possible includes a wrong
Date: Mon, 16 Mar 2020 06:02:25 +0000

> I think that the condition of whether an index-only scan is possible
> includes a wrong.
>
> For example, in the following case, the index has no data to return. Because
> the query doesn't use specify columns.
> However, the query planner choice index-only scan.

 
> In my opinion, we expected that the query planner doesn't choose an
> index-only scan in the above case.
>


I don't see a behavioral bug here. I would expect that any index would be an acceptable match for a query whose set of return columns is the empty set. The empty set is a subset of all sets.

David J.

Re: BUG #16303: A condtion whether an index-only scan is possibleincludes a wrong

От
Horimoto Yasuhiro
Дата:
Hello,
Thank you for your comments.

I should have explained much more about this case. Sorry.
In this case, gistcanreturn() returns false:

https://github.com/postgres/postgres/blob/master/src/backend/access/gist/gistget.c#L798-L799

Because tsvector_ops provides GIST_COMPRESS_PROC:

https://github.com/postgres/postgres/blob/master/src/include/access/gist.h#L31
https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_amproc.dat#L530-L532

and doesn't provide GIST_FETCH_PROC:and provides GIST_COMPRESS_PROC:

https://github.com/postgres/postgres/blob/master/src/include/access/gist.h#L37
https://github.com/postgres/postgres/blob/master/src/include/catalog/pg_amproc.dat#L524-L543

Reurning false from amcanreturn() function such as gistcanreturn() means the index can't support
index-only scans on the given column:

https://www.postgresql.org/docs/12/index-functions.html

    bool
    amcanreturn (Relation indexRelation, int attno);

    Check whether the index can support index-only scans on the given column, by returning the
    indexed column values for an index entry in the form of an IndexTuple.

So I think that choosing index-only scan for this case (false is returned by gistcanreturn() case)
isn't expected behavior.

Background:
We're developing a PostgreSQL index, PGroonga: https://pgroonga.github.io/
It supports index-only scans for most supported types but doesn't support
index-only scan for jsonb type. Because we can't reconstruct the original data
from data in indexed data.

We found the problem I reported with PGroonga:
https://github.com/pgroonga/pgroonga/issues/101

This problem causes an error because PGroonga can't return
IndexScanDesc::xs_itup but IndexScanDesc::xs_want_itup is true.
Because PostgreSQL requests index-only scan even when PGroonga
returns false by amcanreturn().

We've introduced the following workaround:
https://github.com/pgroonga/pgroonga/commit/0390f79f810c44422940c4ef701c186055b4899e

It returns a NULL tuple instead of real tuple data.

IMO, it's better that PostgreSQL doesn't choose index-only scan in this case.
Or index developers need to implement workaround in each index.

thanks!

From: "David G. Johnston" <david.g.johnston@gmail.com>
Subject: Re: BUG #16303: A condtion whether an index-only scan is possible includes a wrong
Date: Sun, 15 Mar 2020 23:35:53 -0700

> On Sunday, March 15, 2020, Horimoto Yasuhiro <horimoto@clear-code.com>
> wrote:
> 
>> I send a patch for this problem.!
>>
>> From: PG Bug reporting form <noreply@postgresql.org>
>> Subject: BUG #16303: A condtion whether an index-only scan is possible
>> includes a wrong
>> Date: Mon, 16 Mar 2020 06:02:25 +0000
>>
>> > I think that the condition of whether an index-only scan is possible
>> > includes a wrong.
>> >
>> > For example, in the following case, the index has no data to return.
>> Because
>> > the query doesn't use specify columns.
>> > However, the query planner choice index-only scan.
>>
>>
> 
>> > In my opinion, we expected that the query planner doesn't choose an
>> > index-only scan in the above case.
>> >
>>
>>
> I don't see a behavioral bug here. I would expect that any index would be
> an acceptable match for a query whose set of return columns is the empty
> set. The empty set is a subset of all sets.
> 
> David J.



Re: BUG #16303: A condtion whether an index-only scan is possible includes a wrong

От
Tom Lane
Дата:
Horimoto Yasuhiro <horimoto@clear-code.com> writes:
> So I think that choosing index-only scan for this case (false is returned by gistcanreturn() case)
> isn't expected behavior.

You're mistaken.  An index-only scan is not only useful but potentially
quite desirable for queries such as count(*); in the best case it ends
up just counting the index entries without ever visiting the heap at all.

It does look a bit weird that we might select an IOS for an index type
that can't ever return any column values, but so far as I can see it
should work fine.  The index isn't being asked to do anything that it
doesn't have to do anyway, ie, return the correct set of heap TIDs.
Testing locally with a GIST index gives the right answers, too.

> We found the problem I reported with PGroonga:
> https://github.com/pgroonga/pgroonga/issues/101

> This problem causes an error because PGroonga can't return
> IndexScanDesc::xs_itup but IndexScanDesc::xs_want_itup is true.
> Because PostgreSQL requests index-only scan even when PGroonga
> returns false by amcanreturn().

You should be returning an empty, zero-column tuple (either itup
or htup format) in that situation.  The limit of "I don't have
any columns I can return" is to form a tuple with no columns,
not to fail to form a tuple.

            regards, tom lane



Re: BUG #16303: A condtion whether an index-only scan is possibleincludes a wrong

От
Horimoto Yasuhiro
Дата:
Hello,

Thank you for your explanation.
I understand what you say.

Thank you,

From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: BUG #16303: A condtion whether an index-only scan is possible includes a wrong
Date: Mon, 16 Mar 2020 11:20:43 -0400

> Horimoto Yasuhiro <horimoto@clear-code.com> writes:
>> So I think that choosing index-only scan for this case (false is returned by gistcanreturn() case)
>> isn't expected behavior.
> 
> You're mistaken.  An index-only scan is not only useful but potentially
> quite desirable for queries such as count(*); in the best case it ends
> up just counting the index entries without ever visiting the heap at all.
> 
> It does look a bit weird that we might select an IOS for an index type
> that can't ever return any column values, but so far as I can see it
> should work fine.  The index isn't being asked to do anything that it
> doesn't have to do anyway, ie, return the correct set of heap TIDs.
> Testing locally with a GIST index gives the right answers, too.
> 
>> We found the problem I reported with PGroonga:
>> https://github.com/pgroonga/pgroonga/issues/101
> 
>> This problem causes an error because PGroonga can't return
>> IndexScanDesc::xs_itup but IndexScanDesc::xs_want_itup is true.
>> Because PostgreSQL requests index-only scan even when PGroonga
>> returns false by amcanreturn().
> 
> You should be returning an empty, zero-column tuple (either itup
> or htup format) in that situation.  The limit of "I don't have
> any columns I can return" is to form a tuple with no columns,
> not to fail to form a tuple.
> 
>             regards, tom lane
> 
>