Hackers,
Here is first version of patch, which enable index support of wildcard search in pg_trgm contrib module. The idea of the patch is to extract from wildcard trigrams which should occurs in wildcard matching string. For example, for '%sector%' wildcard such trigrams would be: 'sec', 'ect', 'tor'.
create table words (word text);
copy words from '/usr/share/dict/american-english';
test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Seq Scan on words (cost=0.00..1703.11 rows=10 width=9) (actual time=18.818..174.146 rows=7 loops=1)
Filter: (word ~~* '%independ%'::text)
Total runtime: 174.200 ms
(3 rows)
CREATE INDEX trgm_idx ON words USING gist (word gist_trgm_ops);
test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=4.36..40.11 rows=10 width=9) (actual time=2.445..2.529 rows=7 loops=1)
Recheck Cond: (word ~~* '%independ%'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..4.35 rows=10 width=0) (actual time=2.406..2.406 rows=7 loops=1)
Index Cond: (word ~~* '%independ%'::text)
Total runtime: 2.612 ms
(5 rows)
CREATE INDEX trgm_idx ON words USING gin (word gin_trgm_ops);
test=# explain analyze select * from words where word ilike '%independ%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=76.08..111.83 rows=10 width=9) (actual time=2.675..2.755 rows=7 loops=1)
Recheck Cond: (word ~~* '%independ%'::text)
-> Bitmap Index Scan on trgm_idx (cost=0.00..76.07 rows=10 width=0) (actual time=2.642..2.642 rows=7 loops=1)
Index Cond: (word ~~* '%independ%'::text)
Total runtime: 2.839 ms
(5 rows)
I've encountered with following problems:
1) Indexing support for ilike is possible only with case-insensetive wildcards, e.g. when IGNORECASE macro is enabled. But I can't use this macro in
pg_trgm.sql.in, where list of operators is defined. Probably, is it enuogh to put comment near IGNORECASE, which tells that if one disable this macro he should also remove oparators from
pg_trgm.sql.in?
2) I found gist index not very useful with default SIGLENINT = 3. I've changed this value to 15 and I found gist index performs very good on dictionary. But on longer strings greater values of SIGLENINT may be required (probably even SIGLENINT > 122 will give benefit in some cases in spite of TOAST).
----
With best regards,
Alexander Korotkov.