Re: pg_trgm performance

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: pg_trgm performance
Дата
Msg-id Pine.LNX.4.64.0702261552120.400@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: pg_trgm performance  ("Guillaume Smet" <guillaume.smet@gmail.com>)
Ответы Re: pg_trgm performance  ("Guillaume Smet" <guillaume.smet@gmail.com>)
Список pgsql-performance
On Mon, 26 Feb 2007, Guillaume Smet wrote:

> On 2/24/07, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
>
> Thanks for your time.
>
>> GiN version, short:
>>    ->  Bitmap Heap Scan on tags  (cost=8.64..151.79 rows=41 width=0)
>> (actual time=5.555..30.157 rows=7 loops=1)
>>          Filter: (title % 'foo'::text)
>>          ->  Bitmap Index Scan on trgm_idx  (cost=0.00..8.63 rows=41
>> width=0) (actual time=2.857..2.857 rows=5555 loops=1)
>>                Index Cond: (title % 'foo'::text)
>
> This is currently the worst case in the gist - gin comparison because
> in the index scan, gin version doesn't have the length of the indexed
> string. So it returns a lot of rows which have every trigram of your
> search string but has in fact a low similarity due to the length of
> the indexed string (5555 rows -> 7 rows).
> It cannot be fixed at the moment due to the way GIN indexes work.
>
>> So, the GiN version seems to be a bit faster for long queries, but it's
>> still
>> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for
>> these
>> three queries, so for the longer queries, the gain is only about a factor
>> two. (By the way, I would like to stress that this is not my personal music
>> collection! :-P)
>
> The fact is that pg_trgm is designed to index words and not to index
> long sentences. I'm not that surprised it's slow in your case.
>
> It's also my case but following the instructions in README.pg_trgm I
> created a dictionary of words using tsearch2 (stat function) and I use
> pg_trgm on this dictionary to find similar words in my dictionary.
>
> For example, I rewrite the search:
> auberge cevenes
> as:
> (auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol |
> cevennes)
> using pg_trgm and my query can find Auberge des C?vennes (currently
> it's limited to the 4th most similar words but I can change it
> easily).

Did you rewrite query manually or use rewrite feature of tsearch2 ?

>
> --
> Guillaume
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Guillaume Smet"
Дата:
Сообщение: Re: pg_trgm performance
Следующее
От: "Guillaume Smet"
Дата:
Сообщение: Re: pg_trgm performance