Обсуждение: GIN vs. Partial Indexes

Поиск

Список

Период

Сортировка

GIN vs. Partial Indexes

От

Josh Berkus

Дата:

07 октября 2010 г., 21:49:38

All,

I thought we fixed this in 8.4.4, but apparently not.  In the event that
you have a GIN index containing a WHERE clause which is sufficiently
restrictive, PostgreSQL will attempt to use the index even though it
can't.  Since this is completely out of the control of the user, it
effectively prohibits using partial GIN indexes:

Setup:

postgres=# select version();                                                              version


--------------------------------------------------------------------------------------------------------------------------------------PostgreSQL
9.0.0on i386-apple-darwin9.8.0, compiled by GCC
 
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465), 32-bit

postgres=# create table gin_test ( id serial not null primary key, type
CREATE TABLE

DO $$
DECLARE i INT := 1;r INT;qts tsvector;del BOOLEAN;
BEGIN

qts := to_tsvector('The most anticipated PostgreSQL version in five
years has been released. With built-in binary replication and over a
dozen new major features, PostgreSQL 9.0 has compelling reasons to
upgrade or migrate for every database user and developer.');

WHILE i < 1000 LOOPr := ( random() * 20 )::INT;INSERT INTO gin_test ( "type", deleted, some_ts )VALUES ( r,    ( r % 2
)= 0,    qts );i := i + 1;
 
END LOOP;

END;$$;

create index gin_test_type ON gin_test("type");
create index gin_test_text ON gin_test USING GIN ( some_ts)WHERE deleted = FALSE AND "type" = 1;

postgres=# SELECT COUNT(*) from gin_test WHERE deleted = FALSE and
"type" = 1;
ERROR:  GIN indexes do not support whole-index scans

postgres-# EXPLAIN SELECT COUNT(*) from gin_test WHERE deleted = FALSE
and "type" = 1;                                    QUERY PLAN

------------------------------------------------------------------------------------Aggregate  (cost=54.01..54.02
rows=1width=0)  ->  Bitmap Heap Scan on gin_test  (cost=12.38..53.95 rows=24 width=0)        Recheck Cond: ((NOT
deleted)AND (type = 1))        ->  Bitmap Index Scan on gin_test_text  (cost=0.00..12.37
 
rows=24 width=0)
(4 rows)


I find the above error interesting, because: (a) I didn't actually
select the some_ts column, and (b) I can do an actual TS search which
hits the whole index with no problem:

postgres=# SELECT COUNT(*) from gin_test WHERE some_ts @@
to_tsquery('replication');
count
-------  999


Note that if I add the perfect index for that query, it works:

postgres=# create index gin_test_type_undeleted on gin_test("type")
where not deleted;
CREATE INDEX

postgres=# SELECT COUNT(*) from gin_test WHERE deleted = FALSE and
"type" = 1;                                               count
-------   46
(1 row)

postgres=# EXPLAIN
                                         SELECT COUNT(*) from gin_test
WHERE deleted = FALSE and "type" = 1;                                        QUERY PLAN

---------------------------------------------------------------------------------------------Aggregate
(cost=46.23..46.24rows=1 width=0)  ->  Bitmap Heap Scan on gin_test  (cost=4.60..46.17 rows=24 width=0)        Recheck
Cond:((type = 1) AND (NOT deleted))        ->  Bitmap Index Scan on gin_test_type_undeleted
 
(cost=0.00..4.60 rows=24 width=0)              Index Cond: (type = 1)
(5 rows)


Clearly the answer here seems to be that our planner should not pick GIN
indexes for any query in which the indexed column is not referenced.  Is
that practical to implement?

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: GIN vs. Partial Indexes

От

Tom Lane

Дата:

07 октября 2010 г., 23:52:35

Josh Berkus <josh@agliodbs.com> writes:
> I thought we fixed this in 8.4.4, but apparently not.  In the event that
> you have a GIN index containing a WHERE clause which is sufficiently
> restrictive, PostgreSQL will attempt to use the index even though it
> can't.

We could probably kluge the planner to avoid that case, but in view
of the other issues explained here:
http://developer.postgresql.org/pgdocs/postgres/gin-limit.html
I'm not sure it's worth the trouble.  There's nothing the planner can do
to guard against the equivalent issue of non-restrictive queries, ie
there is a WHERE clause but it's something like "array-column contains
empty-array".  The fact that the comparison value is empty might not be
known until runtime.

IMO, what's needed is to fix GIN so it doesn't go insane for empty
values or non-restrictive queries, by ensuring there's at least one
index entry for every row.  This has been discussed before; see the TODO
section for GIN.
        regards, tom lane

Re: GIN vs. Partial Indexes

От

Josh Berkus

Дата:

08 октября 2010 г., 13:28:06

> IMO, what's needed is to fix GIN so it doesn't go insane for empty
> values or non-restrictive queries, by ensuring there's at least one
> index entry for every row.  This has been discussed before; see the TODO
> section for GIN.

Well, what is that fix waiting on, then?  Oleg, Teodor?  We may even 
have a modest budget for a patch, so if funding would help, I'll ask.

--                                   -- Josh Berkus                                     PostgreSQL Experts Inc.
                           http://www.pgexperts.com

Re: GIN vs. Partial Indexes

От

Robert Haas

Дата:

08 октября 2010 г., 14:38:04

On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> I thought we fixed this in 8.4.4, but apparently not.  In the event that
>> you have a GIN index containing a WHERE clause which is sufficiently
>> restrictive, PostgreSQL will attempt to use the index even though it
>> can't.
>
> We could probably kluge the planner to avoid that case, but in view
> of the other issues explained here:
> http://developer.postgresql.org/pgdocs/postgres/gin-limit.html
> I'm not sure it's worth the trouble.  There's nothing the planner can do
> to guard against the equivalent issue of non-restrictive queries, ie
> there is a WHERE clause but it's something like "array-column contains
> empty-array".  The fact that the comparison value is empty might not be
> known until runtime.
>
> IMO, what's needed is to fix GIN so it doesn't go insane for empty
> values or non-restrictive queries, by ensuring there's at least one
> index entry for every row.  This has been discussed before; see the TODO
> section for GIN.

That seems like it could waste an awful lot of disk space (and
therefore I/O, etc.).  No?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GIN vs. Partial Indexes

От

Tom Lane

Дата:

08 октября 2010 г., 14:48:00

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>> values or non-restrictive queries, by ensuring there's at least one
>> index entry for every row. �This has been discussed before; see the TODO
>> section for GIN.

> That seems like it could waste an awful lot of disk space (and
> therefore I/O, etc.).  No?

How so?  In a typical application, there would not likely be very many
such rows --- we're talking about cases like documents containing zero
indexable words.  In any case, the problem right now is that GIN has
significant functional limitations because it fails to make any index
entry at all for such rows.  Even if there are in fact no such rows
in a particular table, it has to fail on some queries because there
*might* be such rows.  There is no way to fix those limitations
unless it undertakes to have some index entry for every row.  That
will take disk space, but it's *necessary*.  (To adapt the old saw,
I can make this index arbitrarily small if it doesn't have to give
the right answers.)

In any case, I would expect that GIN could actually do this quite
efficiently.  What we'd probably want is a concept of a "null word",
with empty indexable rows entered in the index as if they contained the
null word.  So there'd be just one index entry with a posting list of
however many such rows there are.
        regards, tom lane

Re: GIN vs. Partial Indexes

От

Robert Haas

Дата:

08 октября 2010 г., 18:44:59

On Fri, Oct 8, 2010 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>>> values or non-restrictive queries, by ensuring there's at least one
>>> index entry for every row.  This has been discussed before; see the TODO
>>> section for GIN.
>
>> That seems like it could waste an awful lot of disk space (and
>> therefore I/O, etc.).  No?
>
> How so?  In a typical application, there would not likely be very many
> such rows --- we're talking about cases like documents containing zero
> indexable words.  In any case, the problem right now is that GIN has
> significant functional limitations because it fails to make any index
> entry at all for such rows.  Even if there are in fact no such rows
> in a particular table, it has to fail on some queries because there
> *might* be such rows.  There is no way to fix those limitations
> unless it undertakes to have some index entry for every row.  That
> will take disk space, but it's *necessary*.  (To adapt the old saw,
> I can make this index arbitrarily small if it doesn't have to give
> the right answers.)
>
> In any case, I would expect that GIN could actually do this quite
> efficiently.  What we'd probably want is a concept of a "null word",
> with empty indexable rows entered in the index as if they contained the
> null word.  So there'd be just one index entry with a posting list of
> however many such rows there are.

<thinks about it more>

Yeah, I think you're right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GIN vs. Partial Indexes

От

"David E. Wheeler"

Дата:

08 октября 2010 г., 23:23:33

On Oct 8, 2010, at 1:47 PM, Tom Lane wrote:

> How so?  In a typical application, there would not likely be very many
> such rows --- we're talking about cases like documents containing zero
> indexable words.  In any case, the problem right now is that GIN has
> significant functional limitations because it fails to make any index
> entry at all for such rows.  Even if there are in fact no such rows
> in a particular table, it has to fail on some queries because there
> *might* be such rows.  There is no way to fix those limitations
> unless it undertakes to have some index entry for every row.  That
> will take disk space, but it's *necessary*.  (To adapt the old saw,
> I can make this index arbitrarily small if it doesn't have to give
> the right answers.)

And could you not keep it the same with a partial index?

Best,

David

Re: GIN vs. Partial Indexes

От

Josh Berkus

Дата:

23 октября 2010 г., 16:01:47

On 10/08/2010 02:44 PM, Robert Haas wrote:
>>  In any case, I would expect that GIN could actually do this quite
>>  efficiently.  What we'd probably want is a concept of a "null word",
>>  with empty indexable rows entered in the index as if they contained the
>>  null word.  So there'd be just one index entry with a posting list of
>>  however many such rows there are.

So, given the lack of objections to this idea, do we have a plan for 
fixing GIN?

--                                   -- Josh Berkus                                     PostgreSQL Experts Inc.
                           http://www.pgexperts.com

Re: GIN vs. Partial Indexes

От

Bruce Momjian

Дата:

12 ноября 2010 г., 14:11:24

Josh Berkus wrote:
> On 10/08/2010 02:44 PM, Robert Haas wrote:
> >>  In any case, I would expect that GIN could actually do this quite
> >>  efficiently.  What we'd probably want is a concept of a "null word",
> >>  with empty indexable rows entered in the index as if they contained the
> >>  null word.  So there'd be just one index entry with a posting list of
> >>  however many such rows there are.
> 
> So, given the lack of objections to this idea, do we have a plan for 
> fixing GIN?

Is this a TODO?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: GIN vs. Partial Indexes

От

Andrew Dunstan

Дата:

12 ноября 2010 г., 14:15:01


On 11/12/2010 01:11 PM, Bruce Momjian wrote:
> Josh Berkus wrote:
>> On 10/08/2010 02:44 PM, Robert Haas wrote:
>>>>   In any case, I would expect that GIN could actually do this quite
>>>>   efficiently.  What we'd probably want is a concept of a "null word",
>>>>   with empty indexable rows entered in the index as if they contained the
>>>>   null word.  So there'd be just one index entry with a posting list of
>>>>   however many such rows there are.
>> So, given the lack of objections to this idea, do we have a plan for
>> fixing GIN?
> Is this a TODO?

Yes.

cheers

andrew

Re: GIN vs. Partial Indexes

От

Bruce Momjian

Дата:

12 ноября 2010 г., 14:16:18

Andrew Dunstan wrote:
> 
> 
> On 11/12/2010 01:11 PM, Bruce Momjian wrote:
> > Josh Berkus wrote:
> >> On 10/08/2010 02:44 PM, Robert Haas wrote:
> >>>>   In any case, I would expect that GIN could actually do this quite
> >>>>   efficiently.  What we'd probably want is a concept of a "null word",
> >>>>   with empty indexable rows entered in the index as if they contained the
> >>>>   null word.  So there'd be just one index entry with a posting list of
> >>>>   however many such rows there are.
> >> So, given the lack of objections to this idea, do we have a plan for
> >> fixing GIN?
> > Is this a TODO?
> 
> Yes.

OK, can you add it or give me wording, or it is already on the TODO
list?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: GIN vs. Partial Indexes

От

Tom Lane

Дата:

12 ноября 2010 г., 14:32:20

Bruce Momjian <bruce@momjian.us> writes:
> OK, can you add it or give me wording, or it is already on the TODO
> list?

It's already there.
        regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: GIN vs. Partial Indexes