Обсуждение: How to enable partial matching on a GIN index

Поиск
Список
Период
Сортировка

How to enable partial matching on a GIN index

От
Chris Spencer
Дата:
I recently setup a GIN index on a large database, and it's performance is spectacular. However, I noticed by default it doesn't allow searching by partial matches.

The docs (http://www.postgresql.org/docs/9.5/static/gin-implementation.html) claim partial matching is supported but don't actually explain how to enable it. Are there any examples of creating a GIN index that uses partial matching? I'm not finding anything through Google.

Re: How to enable partial matching on a GIN index

От
Tom Lane
Дата:
Chris Spencer <chrisspen@gmail.com> writes:
> I recently setup a GIN index on a large database, and it's performance is
> spectacular. However, I noticed by default it doesn't allow searching by
> partial matches.

What's your grounds for claiming that?

We recently fixed a bug whereby the planner overestimated the cost of
partial-match index scans, which might discourage it from choosing an
index scan versus other plans, but that's a lot different from "doesn't
allow".  In any case, the bug only manifests if you've never vacuumed the
table since creating the index ...

            regards, tom lane


Re: How to enable partial matching on a GIN index

От
Chris Spencer
Дата:
>What's your grounds for claiming that?

It returns no results whenever I use partial search terms. If I search for, say, "hospital" it returns results containing the exact word "hospital", but if I search for "hosp" it returns nothing. The doc page explaining that PG "can" do partial matches, and not that it "does" do partial matches, led me to believe this is the expected default behaviour. Is this not the case?

Here's code to reproduce the problem:

ALTER TABLE mytable ADD COLUMN search_index tsvector;
CREATE INDEX mytable_search_index_gin ON mytable USING gin(search_index);

INSERT INTO mytable (name, search_index) VALUES ('hospital', plainto_tsquery('pg_catalog.english', 'hospital'));

SELECT * FROM mytable WHERE (search_index) @@ (plainto_tsquery('pg_catalog.english', 'hospital')); -- returns results
SELECT * FROM mytable WHERE (search_index) @@ (plainto_tsquery('pg_catalog.english', 'hosp')); -- returns nothing

Am I using the index correctly?


On Mon, Jan 11, 2016 at 12:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Spencer <chrisspen@gmail.com> writes:
> I recently setup a GIN index on a large database, and it's performance is
> spectacular. However, I noticed by default it doesn't allow searching by
> partial matches.

What's your grounds for claiming that?

We recently fixed a bug whereby the planner overestimated the cost of
partial-match index scans, which might discourage it from choosing an
index scan versus other plans, but that's a lot different from "doesn't
allow".  In any case, the bug only manifests if you've never vacuumed the
table since creating the index ...

                        regards, tom lane

Re: How to enable partial matching on a GIN index

От
Tom Lane
Дата:
Chris Spencer <chrisspen@gmail.com> writes:
>> What's your grounds for claiming that?

> It returns no results whenever I use partial search terms.

Oh; that's got nothing to do with whether an index is used or not.
The index just makes it faster.

> If I search for,
> say, "hospital" it returns results containing the exact word "hospital",
> but if I search for "hosp" it returns nothing. The doc page explaining that
> PG "can" do partial matches, and not that it "does" do partial matches, led
> me to believe this is the expected default behaviour. Is this not the case?

There are a few problems with your example ...

> ALTER TABLE mytable ADD COLUMN search_index tsvector;
> CREATE INDEX mytable_search_index_gin ON mytable USING gin(search_index);
> INSERT INTO mytable (name, search_index) VALUES ('hospital',
> plainto_tsquery('pg_catalog.english', 'hospital'));

That couldn't have worked as written, because tsvector != tsquery.
I assume you meant to_tsvector() in the INSERT.

> SELECT * FROM mytable WHERE (search_index) @@
> (plainto_tsquery('pg_catalog.english', 'hospital')); -- returns results

Right ...

> SELECT * FROM mytable WHERE (search_index) @@
> (plainto_tsquery('pg_catalog.english', 'hosp')); -- returns nothing

This isn't right, you have to use

    to_tsquery('pg_catalog.english', 'hosp:*')

to specify a partial match.  You didn't give a partial match indicator
(":*"), and if you had, plainto_tsquery() would have stripped it off,
because it throws away all punctuation.

Another pitfall to keep in mind is that stemming may result in searches
not matching that look like they should.  For example, what's really
stored in your tsvector column for this example is

# select to_tsvector('pg_catalog.english', 'hospital');
 to_tsvector
-------------
 'hospit':1
(1 row)

and a prefix match has to match or be a prefix of that string exactly.
So you might expect to_tsquery('pg_catalog.english', 'hospita:*')
to match that entry, but it won't.  (In some cases, stemming of the query
word will hide this effect, which is why a search for 'hospital' works;
but in this example 'hospita' doesn't look enough like an English word
to trigger removal of 'a'.)

If you're confused, it always pays to look at the actual outputs of
the to_tsvector and to_tsquery functions.  There's no particular magic
after that point, it's just string matching; but the word-break and
stemmer functions can do surprising things.

If you expect to be doing mostly prefix-type searches, you might end
up deciding you want to use the "simple" text search configuration,
which I'm pretty sure does no stemming at all.

            regards, tom lane


Re: How to enable partial matching on a GIN index

От
Chris Spencer
Дата:
>This isn't right, you have to use
>
>        to_tsquery('pg_catalog.english', 'hosp:*')

Thanks, that was my problem.