Re: Trigram (pg_trgm) GIN index not used

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Trigram (pg_trgm) GIN index not used
Дата
Msg-id CAHyXU0xZWHhHQjPjXKKsbS2PSX+O=+Wa-p-pw49varaOoWCo_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Trigram (pg_trgm) GIN index not used  (Ivan Voras <ivoras@freebsd.org>)
Список pgsql-general
On Thu, Feb 21, 2013 at 6:06 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> On 21/02/2013 12:52, Ivan Voras wrote:
>
>> I'd like to use pg_trgm for matching substrings case-insensitively, but
>> it doesn't seem to use the index:
>
>
> As a sort-of followup, the '%' operator kind of works but takes
> incredibly long time, and the selectivity estimates are completely wrong:
>
> nn=> vacuum analyze documents;
> VACUUM
> nn=> explain select id,title from documents where raw_data % 'zagreb';
>                                        QUERY PLAN
>
> -----------------------------------------------------------------------------------------
>  Bitmap Heap Scan on documents  (cost=128.42..330.87 rows=54 width=108)
>    Recheck Cond: ((raw_data)::text % 'zagreb'::text)
>    ->  Bitmap Index Scan on documents_raw_data_trgm  (cost=0.00..128.40
> rows=54 width=0)
>          Index Cond: ((raw_data)::text % 'zagreb'::text)
> (4 rows)
>
> nn=> explain analyze select id,title from documents where raw_data %
> 'zagreb';
>                                                                QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on documents  (cost=128.42..330.87 rows=54 width=108)
> (actual time=98750.283..98750.283 rows=0 loops=1)
>    Recheck Cond: ((raw_data)::text % 'zagreb'::text)
>    ->  Bitmap Index Scan on documents_raw_data_trgm  (cost=0.00..128.40
> rows=54 width=0) (actual time=26.748..26.748 rows=51874 loops=1)
>          Index Cond: ((raw_data)::text % 'zagreb'::text)
>  Total runtime: 98750.623 ms
> (5 rows)
>
>
> There is no IO load during this query.

pg_trgm is not really designed for indexing large documents, but for
fuzzy simple string (company name, address, etc) matching.   probably
better off with full text search.

merlin

В списке pgsql-general по дате отправления:

Предыдущее
От: Ioana Danes
Дата:
Сообщение: Ioana Danes
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: subselects vs WITH in views