Обсуждение: GIN vs. Partial Indexes
All, I thought we fixed this in 8.4.4, but apparently not. In the event that you have a GIN index containing a WHERE clause which is sufficiently restrictive, PostgreSQL will attempt to use the index even though it can't. Since this is completely out of the control of the user, it effectively prohibits using partial GIN indexes: Setup: postgres=# select version(); version --------------------------------------------------------------------------------------------------------------------------------------PostgreSQL 9.0.0on i386-apple-darwin9.8.0, compiled by GCC i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465), 32-bit postgres=# create table gin_test ( id serial not null primary key, type CREATE TABLE DO $$ DECLARE i INT := 1;r INT;qts tsvector;del BOOLEAN; BEGIN qts := to_tsvector('The most anticipated PostgreSQL version in five years has been released. With built-in binary replication and over a dozen new major features, PostgreSQL 9.0 has compelling reasons to upgrade or migrate for every database user and developer.'); WHILE i < 1000 LOOPr := ( random() * 20 )::INT;INSERT INTO gin_test ( "type", deleted, some_ts )VALUES ( r, ( r % 2 )= 0, qts );i := i + 1; END LOOP; END;$$; create index gin_test_type ON gin_test("type"); create index gin_test_text ON gin_test USING GIN ( some_ts)WHERE deleted = FALSE AND "type" = 1; postgres=# SELECT COUNT(*) from gin_test WHERE deleted = FALSE and "type" = 1; ERROR: GIN indexes do not support whole-index scans postgres-# EXPLAIN SELECT COUNT(*) from gin_test WHERE deleted = FALSE and "type" = 1; QUERY PLAN ------------------------------------------------------------------------------------Aggregate (cost=54.01..54.02 rows=1width=0) -> Bitmap Heap Scan on gin_test (cost=12.38..53.95 rows=24 width=0) Recheck Cond: ((NOT deleted)AND (type = 1)) -> Bitmap Index Scan on gin_test_text (cost=0.00..12.37 rows=24 width=0) (4 rows) I find the above error interesting, because: (a) I didn't actually select the some_ts column, and (b) I can do an actual TS search which hits the whole index with no problem: postgres=# SELECT COUNT(*) from gin_test WHERE some_ts @@ to_tsquery('replication'); count ------- 999 Note that if I add the perfect index for that query, it works: postgres=# create index gin_test_type_undeleted on gin_test("type") where not deleted; CREATE INDEX postgres=# SELECT COUNT(*) from gin_test WHERE deleted = FALSE and "type" = 1; count ------- 46 (1 row) postgres=# EXPLAIN SELECT COUNT(*) from gin_test WHERE deleted = FALSE and "type" = 1; QUERY PLAN ---------------------------------------------------------------------------------------------Aggregate (cost=46.23..46.24rows=1 width=0) -> Bitmap Heap Scan on gin_test (cost=4.60..46.17 rows=24 width=0) Recheck Cond:((type = 1) AND (NOT deleted)) -> Bitmap Index Scan on gin_test_type_undeleted (cost=0.00..4.60 rows=24 width=0) Index Cond: (type = 1) (5 rows) Clearly the answer here seems to be that our planner should not pick GIN indexes for any query in which the indexed column is not referenced. Is that practical to implement? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I thought we fixed this in 8.4.4, but apparently not. In the event that > you have a GIN index containing a WHERE clause which is sufficiently > restrictive, PostgreSQL will attempt to use the index even though it > can't. We could probably kluge the planner to avoid that case, but in view of the other issues explained here: http://developer.postgresql.org/pgdocs/postgres/gin-limit.html I'm not sure it's worth the trouble. There's nothing the planner can do to guard against the equivalent issue of non-restrictive queries, ie there is a WHERE clause but it's something like "array-column contains empty-array". The fact that the comparison value is empty might not be known until runtime. IMO, what's needed is to fix GIN so it doesn't go insane for empty values or non-restrictive queries, by ensuring there's at least one index entry for every row. This has been discussed before; see the TODO section for GIN. regards, tom lane
> IMO, what's needed is to fix GIN so it doesn't go insane for empty > values or non-restrictive queries, by ensuring there's at least one > index entry for every row. This has been discussed before; see the TODO > section for GIN. Well, what is that fix waiting on, then? Oleg, Teodor? We may even have a modest budget for a patch, so if funding would help, I'll ask. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >> I thought we fixed this in 8.4.4, but apparently not. In the event that >> you have a GIN index containing a WHERE clause which is sufficiently >> restrictive, PostgreSQL will attempt to use the index even though it >> can't. > > We could probably kluge the planner to avoid that case, but in view > of the other issues explained here: > http://developer.postgresql.org/pgdocs/postgres/gin-limit.html > I'm not sure it's worth the trouble. There's nothing the planner can do > to guard against the equivalent issue of non-restrictive queries, ie > there is a WHERE clause but it's something like "array-column contains > empty-array". The fact that the comparison value is empty might not be > known until runtime. > > IMO, what's needed is to fix GIN so it doesn't go insane for empty > values or non-restrictive queries, by ensuring there's at least one > index entry for every row. This has been discussed before; see the TODO > section for GIN. That seems like it could waste an awful lot of disk space (and therefore I/O, etc.). No? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> IMO, what's needed is to fix GIN so it doesn't go insane for empty >> values or non-restrictive queries, by ensuring there's at least one >> index entry for every row. �This has been discussed before; see the TODO >> section for GIN. > That seems like it could waste an awful lot of disk space (and > therefore I/O, etc.). No? How so? In a typical application, there would not likely be very many such rows --- we're talking about cases like documents containing zero indexable words. In any case, the problem right now is that GIN has significant functional limitations because it fails to make any index entry at all for such rows. Even if there are in fact no such rows in a particular table, it has to fail on some queries because there *might* be such rows. There is no way to fix those limitations unless it undertakes to have some index entry for every row. That will take disk space, but it's *necessary*. (To adapt the old saw, I can make this index arbitrarily small if it doesn't have to give the right answers.) In any case, I would expect that GIN could actually do this quite efficiently. What we'd probably want is a concept of a "null word", with empty indexable rows entered in the index as if they contained the null word. So there'd be just one index entry with a posting list of however many such rows there are. regards, tom lane
On Fri, Oct 8, 2010 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> IMO, what's needed is to fix GIN so it doesn't go insane for empty >>> values or non-restrictive queries, by ensuring there's at least one >>> index entry for every row. This has been discussed before; see the TODO >>> section for GIN. > >> That seems like it could waste an awful lot of disk space (and >> therefore I/O, etc.). No? > > How so? In a typical application, there would not likely be very many > such rows --- we're talking about cases like documents containing zero > indexable words. In any case, the problem right now is that GIN has > significant functional limitations because it fails to make any index > entry at all for such rows. Even if there are in fact no such rows > in a particular table, it has to fail on some queries because there > *might* be such rows. There is no way to fix those limitations > unless it undertakes to have some index entry for every row. That > will take disk space, but it's *necessary*. (To adapt the old saw, > I can make this index arbitrarily small if it doesn't have to give > the right answers.) > > In any case, I would expect that GIN could actually do this quite > efficiently. What we'd probably want is a concept of a "null word", > with empty indexable rows entered in the index as if they contained the > null word. So there'd be just one index entry with a posting list of > however many such rows there are. <thinks about it more> Yeah, I think you're right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Oct 8, 2010, at 1:47 PM, Tom Lane wrote: > How so? In a typical application, there would not likely be very many > such rows --- we're talking about cases like documents containing zero > indexable words. In any case, the problem right now is that GIN has > significant functional limitations because it fails to make any index > entry at all for such rows. Even if there are in fact no such rows > in a particular table, it has to fail on some queries because there > *might* be such rows. There is no way to fix those limitations > unless it undertakes to have some index entry for every row. That > will take disk space, but it's *necessary*. (To adapt the old saw, > I can make this index arbitrarily small if it doesn't have to give > the right answers.) And could you not keep it the same with a partial index? Best, David
On 10/08/2010 02:44 PM, Robert Haas wrote: >> In any case, I would expect that GIN could actually do this quite >> efficiently. What we'd probably want is a concept of a "null word", >> with empty indexable rows entered in the index as if they contained the >> null word. So there'd be just one index entry with a posting list of >> however many such rows there are. So, given the lack of objections to this idea, do we have a plan for fixing GIN? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus wrote: > On 10/08/2010 02:44 PM, Robert Haas wrote: > >> In any case, I would expect that GIN could actually do this quite > >> efficiently. What we'd probably want is a concept of a "null word", > >> with empty indexable rows entered in the index as if they contained the > >> null word. So there'd be just one index entry with a posting list of > >> however many such rows there are. > > So, given the lack of objections to this idea, do we have a plan for > fixing GIN? Is this a TODO? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 11/12/2010 01:11 PM, Bruce Momjian wrote: > Josh Berkus wrote: >> On 10/08/2010 02:44 PM, Robert Haas wrote: >>>> In any case, I would expect that GIN could actually do this quite >>>> efficiently. What we'd probably want is a concept of a "null word", >>>> with empty indexable rows entered in the index as if they contained the >>>> null word. So there'd be just one index entry with a posting list of >>>> however many such rows there are. >> So, given the lack of objections to this idea, do we have a plan for >> fixing GIN? > Is this a TODO? Yes. cheers andrew
Andrew Dunstan wrote: > > > On 11/12/2010 01:11 PM, Bruce Momjian wrote: > > Josh Berkus wrote: > >> On 10/08/2010 02:44 PM, Robert Haas wrote: > >>>> In any case, I would expect that GIN could actually do this quite > >>>> efficiently. What we'd probably want is a concept of a "null word", > >>>> with empty indexable rows entered in the index as if they contained the > >>>> null word. So there'd be just one index entry with a posting list of > >>>> however many such rows there are. > >> So, given the lack of objections to this idea, do we have a plan for > >> fixing GIN? > > Is this a TODO? > > Yes. OK, can you add it or give me wording, or it is already on the TODO list? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > OK, can you add it or give me wording, or it is already on the TODO > list? It's already there. regards, tom lane