Обсуждение: Question about trigram GIST index

Поиск
Список
Период
Сортировка

Question about trigram GIST index

От
Robert DiFalco
Дата:
So, for my use case I simply need to search for a case insensitive substring. It need not be super exact. It seems like there are two ways I can do this:

CREATE INDEX idx_users_name ON users USING GIST(lower(name) gist_trgm_ops);
SELECT * FROM users WHERE lower(name) LIKE '%john%';

Or I can do it like this:

CREATE INDEX idx_users_name ON users USING GIST(name gist_trgm_ops);
SELECT * FROM users WHERE name % 'john';

Unfortunately I cannot find any documentation on the trade-offs between these two approaches. For my test dataset of 75K records the query speed seems pretty damn similar. 

So, I guess my question is, what is the difference for querying and insert for the two approaches?

Thanks!

Re: Question about trigram GIST index

От
Tom Lane
Дата:
Robert DiFalco <robert.difalco@gmail.com> writes:
> So, for my use case I simply need to search for a case insensitive
> substring. It need not be super exact. It seems like there are two ways I
> can do this:

> CREATE INDEX idx_users_name ON users USING GIST(lower(name) gist_trgm_ops);
> SELECT * FROM users WHERE lower(name) LIKE '%john%';

> Or I can do it like this:

> CREATE INDEX idx_users_name ON users USING GIST(name gist_trgm_ops);
> SELECT * FROM users WHERE name % 'john';

Hm, I don't see anything in the pg_trgm docs suggesting that % is
case-insensitive.  But in any case, I'd go with the former as being
more understandable to someone who knows standard SQL.

            regards, tom lane


Re: Question about trigram GIST index

От
Robert DiFalco
Дата:
I know! I was surprised that % 'John' or % 'JOHN' or even % 'jOhn' all returned the same result.

Besides readability would there be any technical differences between a GIST index that is lower or not and using LIKE vs. %?

Thanks!


On Thu, Dec 18, 2014 at 9:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert DiFalco <robert.difalco@gmail.com> writes:
> So, for my use case I simply need to search for a case insensitive
> substring. It need not be super exact. It seems like there are two ways I
> can do this:

> CREATE INDEX idx_users_name ON users USING GIST(lower(name) gist_trgm_ops);
> SELECT * FROM users WHERE lower(name) LIKE '%john%';

> Or I can do it like this:

> CREATE INDEX idx_users_name ON users USING GIST(name gist_trgm_ops);
> SELECT * FROM users WHERE name % 'john';

Hm, I don't see anything in the pg_trgm docs suggesting that % is
case-insensitive.  But in any case, I'd go with the former as being
more understandable to someone who knows standard SQL.

                        regards, tom lane

Re: Question about trigram GIST index

От
Giuseppe Broccolo
Дата:
I'm not sure about the '%' operator, but I'm sure that the GIST index will never be used in the

  SELECT * FROM users WHERE lower(name) LIKE '%john%';

query; it is used for left or right anchored search, such as 'john%' or '%john'.

Giuseppe.
--
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe.broccolo@2ndQuadrant.it | www.2ndQuadrant.it

Re: Question about trigram GIST index

От
Robert DiFalco
Дата:
I'm pretty sure '%John%' uses the index.

explain analyze verbose SELECT name FROM wai_users WHERE lower(name) LIKE '%john%';

                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.wai_users  (cost=53.45..1345.46 rows=900 width=14) (actual time=18.474..32.093 rows=1596 loops=1)
   Output: name
   Recheck Cond: (lower((wai_users.name)::text) ~~ '%john%'::text)
   ->  Bitmap Index Scan on idx_user_name  (cost=0.00..53.41 rows=900 width=0) (actual time=18.227..18.227 rows=1596 loops=1)
         Index Cond: (lower((wai_users.name)::text) ~~ '%john%'::text)
 Total runtime: 33.662 ms
(6 rows)


On Thu, Dec 18, 2014 at 10:00 AM, Giuseppe Broccolo <giuseppe.broccolo@2ndquadrant.it> wrote:
I'm not sure about the '%' operator, but I'm sure that the GIST index will never be used in the

  SELECT * FROM users WHERE lower(name) LIKE '%john%';

query; it is used for left or right anchored search, such as 'john%' or '%john'.

Giuseppe.
--
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe.broccolo@2ndQuadrant.it | www.2ndQuadrant.it

Re: Question about trigram GIST index

От
Jeff Janes
Дата:
On Thu, Dec 18, 2014 at 10:00 AM, Giuseppe Broccolo <giuseppe.broccolo@2ndquadrant.it> wrote:
I'm not sure about the '%' operator, but I'm sure that the GIST index will never be used in the

  SELECT * FROM users WHERE lower(name) LIKE '%john%';

query; it is used for left or right anchored search, such as 'john%' or '%john'.

The point of the gist_trgm_ops operator is specifically to overcome that limitation.

It is pretty awesome.

Cheers,

Jeff

Re: Question about trigram GIST index

От
Robert DiFalco
Дата:
Jeff, I'm not seeing that limitation.

On Thu, Dec 18, 2014 at 10:33 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, Dec 18, 2014 at 10:00 AM, Giuseppe Broccolo <giuseppe.broccolo@2ndquadrant.it> wrote:
I'm not sure about the '%' operator, but I'm sure that the GIST index will never be used in the

  SELECT * FROM users WHERE lower(name) LIKE '%john%';

query; it is used for left or right anchored search, such as 'john%' or '%john'.

The point of the gist_trgm_ops operator is specifically to overcome that limitation.

It is pretty awesome.

Cheers,

Jeff

Re: Question about trigram GIST index

От
Kevin Grittner
Дата:
Giuseppe Broccolo wrote:

> I'm not sure about the '%' operator, but I'm sure that the GIST
> index will never be used in the
>
>     SELECT * FROM users WHERE lower(name) LIKE '%john%';
>
> query; it is used for left or right anchored search, such as
> 'john%' or '%john'.
It *will* use a *trigram* index for a non-anchored search.

test=# create table words (word text not null);
CREATE TABLE
test=# copy words from '/usr/share/dict/words';
COPY 99171
test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
test=# CREATE INDEX words_trgm ON words USING gist (word gist_trgm_ops);
CREATE INDEX
test=# vacuum analyze words;
VACUUM
test=# explain analyze select * from words where word like '%john%';
                                                                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words  (cost=4.36..40.24 rows=10 width=9) (actual time=17.758..17.772 rows=8 loops=1)
   Recheck Cond: (word ~~ '%john%'::text)
   Rows Removed by Index Recheck: 16
   Heap Blocks: exact=4
   ->   Bitmap Index Scan on words_trgm  (cost=0.00..4.36 rows=10 width=0) (actual time=17.708..17.708 rows=24 loops=1)
         Index Cond: (word ~~ '%john%'::text)
Planning time: 0.227 ms
Execution time: 17.862 ms
(8 rows)

test=# explain analyze select * from words where word ilike '%john%';
                                                                                                             QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words  (cost=44.05..556.57 rows=1002 width=9) (actual time=12.151..12.197 rows=24 loops=1)
   Recheck Cond: (word ~~* '%john%'::text)
   Heap Blocks: exact=4
   ->  Bitmap Index Scan on words_trgm  (cost=0.00..43.80 rows=1002 width=0) (actual time=12.124..12.124 rows=24
loops=1)
        Index Cond: (word ~~* '%john%'::text)
Planning time: 0.392 ms
Execution time: 12.252 ms
(7 rows)

Note that a trigram index is case-insensitive; doing a
case-sensitive search requires an extra Recheck node to eliminate
the rows that match in the case-insensitive index scan but have
different capitalization.  Because of that case-sensitive is
slower.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company