Re: Very bad FTS performance with the Polish config

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Very bad FTS performance with the Polish config
Дата
Msg-id 15251.1258645873@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Very bad FTS performance with the Polish config  (Wojciech Knapik <webmaster@wolniartysci.pl>)
Ответы Re: Very bad FTS performance with the Polish config
Список pgsql-hackers
Wojciech Knapik <webmaster@wolniartysci.pl> writes:
> Tom Lane wrote:
>> I tried to duplicate this test, but got no further than here:
>> ERROR:  syntax error
>> CONTEXT:  line 174 of configuration file "/home/tgl/testversion/share/postgresql/tsearch_data/polish.affix": "  L E
C                  >       -C,G�EM         #zalec (15a)
 

> Here are the files I used (polish.affix, polish.dict already generated):
> http://wolniartysci.pl/pl.tar.gz

Your files were the same as mine.  I eventually figured out the problem
was I was using C locale, in which some of those letters aren't letters.
(I wonder whether the tsearch config file parsers could be made less
sensitive to this by avoiding t_isalpha tests.)  In pl_PL.ut8 locale
I could see that the example is indeed much slower.  Oleg is right that
the fundamental difference is that this Polish configuration is using
an ispell dictionary where the simple English configuration is not.
But, just for the record, here's what an oprofile profile looks like:

samples  %        image name               symbol name
7480     20.9477  postgres                 RS_execute
5370     15.0386  postgres                 pg_utf_mblen
4138     11.5884  postgres                 pg_mblen
3756     10.5187  postgres                 mb_strchr
2880      8.0654  postgres                 FindWord
2754      7.7126  postgres                 CheckAffix
1576      4.4136  postgres                 NormalizeSubWord
966       2.7053  postgres                 FindAffixes
896       2.5092  postgres                 TParserGet
742       2.0780  postgres                 AllocSetAlloc
420       1.1762  postgres                 AllocSetFree
396       1.1090  postgres                 addHLParsedLex
384       1.0754  postgres                 LexizeExec

So about 55% of the time is going into affix pattern matching.
I wonder whether that couldn't be made faster.  A lot of the cycles
are spent on coping with variable-length characters --- perhaps the
ispell code should convert to wchar representation before doing this?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Guillaume Lelarge
Дата:
Сообщение: Patch to change a pg_restore message
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Syntax for partitioning