Обсуждение: How to enable partial matching on a GIN index
I recently setup a GIN index on a large database, and it's performance is spectacular. However, I noticed by default it doesn't allow searching by partial matches.
The docs (http://www.postgresql.org/docs/9.5/static/gin-implementation.html) claim partial matching is supported but don't actually explain how to enable it. Are there any examples of creating a GIN index that uses partial matching? I'm not finding anything through Google.Chris Spencer <chrisspen@gmail.com> writes: > I recently setup a GIN index on a large database, and it's performance is > spectacular. However, I noticed by default it doesn't allow searching by > partial matches. What's your grounds for claiming that? We recently fixed a bug whereby the planner overestimated the cost of partial-match index scans, which might discourage it from choosing an index scan versus other plans, but that's a lot different from "doesn't allow". In any case, the bug only manifests if you've never vacuumed the table since creating the index ... regards, tom lane
>What's your grounds for claiming that?
It returns no results whenever I use partial search terms. If I search for, say, "hospital" it returns results containing the exact word "hospital", but if I search for "hosp" it returns nothing. The doc page explaining that PG "can" do partial matches, and not that it "does" do partial matches, led me to believe this is the expected default behaviour. Is this not the case?ALTER TABLE mytable ADD COLUMN search_index tsvector;
CREATE INDEX mytable_search_index_gin ON mytable USING gin(search_index);
INSERT INTO mytable (name, search_index) VALUES ('hospital', plainto_tsquery('pg_catalog.english', 'hospital'));
SELECT * FROM mytable WHERE (search_index) @@ (plainto_tsquery('pg_catalog.english', 'hospital')); -- returns results
SELECT * FROM mytable WHERE (search_index) @@ (plainto_tsquery('pg_catalog.english', 'hosp')); -- returns nothing
On Mon, Jan 11, 2016 at 12:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Spencer <chrisspen@gmail.com> writes:
> I recently setup a GIN index on a large database, and it's performance is
> spectacular. However, I noticed by default it doesn't allow searching by
> partial matches.
What's your grounds for claiming that?
We recently fixed a bug whereby the planner overestimated the cost of
partial-match index scans, which might discourage it from choosing an
index scan versus other plans, but that's a lot different from "doesn't
allow". In any case, the bug only manifests if you've never vacuumed the
table since creating the index ...
regards, tom lane
Chris Spencer <chrisspen@gmail.com> writes: >> What's your grounds for claiming that? > It returns no results whenever I use partial search terms. Oh; that's got nothing to do with whether an index is used or not. The index just makes it faster. > If I search for, > say, "hospital" it returns results containing the exact word "hospital", > but if I search for "hosp" it returns nothing. The doc page explaining that > PG "can" do partial matches, and not that it "does" do partial matches, led > me to believe this is the expected default behaviour. Is this not the case? There are a few problems with your example ... > ALTER TABLE mytable ADD COLUMN search_index tsvector; > CREATE INDEX mytable_search_index_gin ON mytable USING gin(search_index); > INSERT INTO mytable (name, search_index) VALUES ('hospital', > plainto_tsquery('pg_catalog.english', 'hospital')); That couldn't have worked as written, because tsvector != tsquery. I assume you meant to_tsvector() in the INSERT. > SELECT * FROM mytable WHERE (search_index) @@ > (plainto_tsquery('pg_catalog.english', 'hospital')); -- returns results Right ... > SELECT * FROM mytable WHERE (search_index) @@ > (plainto_tsquery('pg_catalog.english', 'hosp')); -- returns nothing This isn't right, you have to use to_tsquery('pg_catalog.english', 'hosp:*') to specify a partial match. You didn't give a partial match indicator (":*"), and if you had, plainto_tsquery() would have stripped it off, because it throws away all punctuation. Another pitfall to keep in mind is that stemming may result in searches not matching that look like they should. For example, what's really stored in your tsvector column for this example is # select to_tsvector('pg_catalog.english', 'hospital'); to_tsvector ------------- 'hospit':1 (1 row) and a prefix match has to match or be a prefix of that string exactly. So you might expect to_tsquery('pg_catalog.english', 'hospita:*') to match that entry, but it won't. (In some cases, stemming of the query word will hide this effect, which is why a search for 'hospital' works; but in this example 'hospita' doesn't look enough like an English word to trigger removal of 'a'.) If you're confused, it always pays to look at the actual outputs of the to_tsvector and to_tsquery functions. There's no particular magic after that point, it's just string matching; but the word-break and stemmer functions can do surprising things. If you expect to be doing mostly prefix-type searches, you might end up deciding you want to use the "simple" text search configuration, which I'm pretty sure does no stemming at all. regards, tom lane
>This isn't right, you have to use
>
> to_tsquery('pg_catalog.english', 'hosp:*')
Thanks, that was my problem.>
> to_tsquery('pg_catalog.english', 'hosp:*')