Re: Very bad FTS performance with the Polish config

Поиск
Список
Период
Сортировка
От Sushant Sinha
Тема Re: Very bad FTS performance with the Polish config
Дата
Msg-id 9fb559330911182029p67e5d282r1941d929ceb66246@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Very bad FTS performance with the Polish config  (Wojciech Knapik <webmaster@wolniartysci.pl>)
Ответы Re: Very bad FTS performance with the Polish config
Список pgsql-hackers
ts_headline calls ts_lexize equivalent to break the text. Off course there is algorithm to process the tokens and generate the headline. I would be really surprised if the algorithm to generate the headline is somehow dependent on language (as it only processes the tokens). So Oleg is right when he says ts_lexize is something to be checked.

I will try to replicate what you are trying to do but in the meantime can you run the same ts_headline under psql multiple times and paste the result.

-Sushant.

2009/11/19 Wojciech Knapik <webmaster@wolniartysci.pl>

Oleg Bartunov wrote:

Yes, for 4-word texts the results are similar.
Try that with a longer text and the difference becomes more and more significant. For the lorem ipsum text, 'polish' is about 4 times slower, than 'english'. For 5 repetitions of the text, it's 6 times, for 10 repetitions - 7.5 times...

Again, I see nothing unclear here, since dictionaries (as specified
in configuration) apply to ALL words in document. The more words in document, the more overhead.

You're missing the point. I'm not surprised that the function takes more time for larger input texts - that's obvious. The thing is, the computation times rise more steeply when the Polish config is used. Steeply enough, that the difference between the Polish and English configs becomes enormous in practical cases.

Now this may be expected behaviour, but since I don't know if it is, I posted to the mailing lists to find out. If you're saying this is ok and there's nothing to fix here, then there's nothing more to discuss and we may consider the thread closed.
If not, ts_headline deserves a closer look.

cheers,
Wojciech Knapik


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Timezones (in 8.5?)
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: "Not safe to send CSV data" message