Re: tsearch2: enable non ascii stop words with C locale

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: tsearch2: enable non ascii stop words with C locale
Дата
Msg-id 45D07FCF.7020407@sigaev.ru
обсуждение исходный текст
Ответ на tsearch2: enable non ascii stop words with C locale  (Tatsuo Ishii <ishii@postgresql.org>)
Ответы Re: tsearch2: enable non ascii stop words with C locale
Список pgsql-hackers
> Currently tsearch2 does not accept non ascii stop words if locale is
> C. Included patches should fix the problem. Patches against PostgreSQL
> 8.2.3.

I'm not sure about correctness of patch's description.

First, p_islatin() function is used only in words/lexemes parser, not stop-word 
code. Second, p_islatin() function is used for catching lexemes like URL or HTML 
entities, so, it's important to define real latin characters. And it works 
right: it calls p_isalpha (already patched for your case),  then it calls 
p_isascii which should be correct for any encodings with C-locale.
Third (and last):
contrib_regression=# show server_encoding; server_encoding
----------------- UTF8
contrib_regression=# show lc_ctype; lc_ctype
---------- C
contrib_regression=# select lexize('ru_stem_utf8', RUSSIAN_STOP_WORD); lexize
-------- {}

Russian characters with UTF8 take two bytes.



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: mark@mark.mielke.cc
Дата:
Сообщение: Re: HOT for PostgreSQL 8.3
Следующее
От: Alvaro Herrera
Дата:
Сообщение: DROP DATABASE and prepared xacts