Re: pg_trgm performance

Поиск
Список
Период
Сортировка
От Guillaume Smet
Тема Re: pg_trgm performance
Дата
Msg-id 1d4e0c10702260442m1fba9dc4r30ada1044f869a3b@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_trgm performance  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Ответы Re: pg_trgm performance  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-performance
On 2/24/07, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:

Thanks for your time.

> GiN version, short:
>    ->  Bitmap Heap Scan on tags  (cost=8.64..151.79 rows=41 width=0) (actual time=5.555..30.157 rows=7 loops=1)
>          Filter: (title % 'foo'::text)
>          ->  Bitmap Index Scan on trgm_idx  (cost=0.00..8.63 rows=41 width=0) (actual time=2.857..2.857 rows=5555
loops=1)
>                Index Cond: (title % 'foo'::text)

This is currently the worst case in the gist - gin comparison because
in the index scan, gin version doesn't have the length of the indexed
string. So it returns a lot of rows which have every trigram of your
search string but has in fact a low similarity due to the length of
the indexed string (5555 rows -> 7 rows).
It cannot be fixed at the moment due to the way GIN indexes work.

> So, the GiN version seems to be a bit faster for long queries, but it's still
> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for these
> three queries, so for the longer queries, the gain is only about a factor
> two. (By the way, I would like to stress that this is not my personal music
> collection! :-P)

The fact is that pg_trgm is designed to index words and not to index
long sentences. I'm not that surprised it's slow in your case.

It's also my case but following the instructions in README.pg_trgm I
created a dictionary of words using tsearch2 (stat function) and I use
pg_trgm on this dictionary to find similar words in my dictionary.

For example, I rewrite the search:
auberge cevenes
as:
(auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol | cevennes)
using pg_trgm and my query can find Auberge des Cévennes (currently
it's limited to the 4th most similar words but I can change it
easily).

--
Guillaume

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Steinar H. Gunderson"
Дата:
Сообщение: Re: Query Planner
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: pg_trgm performance