Wildcard search support for pg_trgm

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Wildcard search support for pg_trgm
Дата
Msg-id AANLkTi=2iL7_ZdQgnm2d1qiNC2rJM-faCVtFu2K5A8Lz@mail.gmail.com
обсуждение исходный текст
Ответы Re: Wildcard search support for pg_trgm  (Alexander Korotkov <aekorotkov@gmail.com>)
Re: Wildcard search support for pg_trgm  (Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
Список pgsql-hackers
Hackers,

Here is first version of patch, which enable index support of wildcard search in pg_trgm contrib module. The idea of the patch is to extract from wildcard trigrams which should occurs in wildcard matching string. For example, for '%sector%' wildcard such trigrams would be: 'sec', 'ect', 'tor'.

create table words (word text);
copy words from '/usr/share/dict/american-english';

test=# explain analyze select * from words where word ilike '%independ%';
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on words  (cost=0.00..1703.11 rows=10 width=9) (actual time=18.818..174.146 rows=7 loops=1)
   Filter: (word ~~* '%independ%'::text)
 Total runtime: 174.200 ms
(3 rows)

CREATE INDEX trgm_idx ON words USING gist (word gist_trgm_ops);

test=# explain analyze select * from words where word ilike '%independ%';
                                                    QUERY PLAN                                                
    
------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on words  (cost=4.36..40.11 rows=10 width=9) (actual time=2.445..2.529 rows=7 loops=1)
   Recheck Cond: (word ~~* '%independ%'::text)
   ->  Bitmap Index Scan on trgm_idx  (cost=0.00..4.35 rows=10 width=0) (actual time=2.406..2.406 rows=7 loops=1)
         Index Cond: (word ~~* '%independ%'::text)
 Total runtime: 2.612 ms
(5 rows)

CREATE INDEX trgm_idx ON words USING gin (word gin_trgm_ops);

test=# explain analyze select * from words where word ilike '%independ%';
                                                    QUERY PLAN                                                
     
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on words  (cost=76.08..111.83 rows=10 width=9) (actual time=2.675..2.755 rows=7 loops=1)
   Recheck Cond: (word ~~* '%independ%'::text)
   ->  Bitmap Index Scan on trgm_idx  (cost=0.00..76.07 rows=10 width=0) (actual time=2.642..2.642 rows=7 loops=1)
         Index Cond: (word ~~* '%independ%'::text)
 Total runtime: 2.839 ms
(5 rows)

I've encountered with following problems:
1) Indexing support for ilike is possible only with case-insensetive wildcards, e.g. when IGNORECASE macro is enabled. But I can't use this macro in pg_trgm.sql.in, where list of operators is defined. Probably, is it enuogh to put comment near IGNORECASE, which tells that if one disable this macro he should also remove oparators from pg_trgm.sql.in?
2) I found gist index not very useful with default SIGLENINT = 3. I've changed this value to 15 and I found gist index performs very good on dictionary. But on longer strings greater values of SIGLENINT may be required (probably even SIGLENINT > 122 will give benefit in some cases in spite of TOAST).

----
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Следующее
От: Dimitri Fontaine
Дата:
Сообщение: pg_execute_from_file, patch v10