Re: TSearch2 / German compound words / UTF-8

Поиск
Список
Период
Сортировка
От Teodor Sigaev
Тема Re: TSearch2 / German compound words / UTF-8
Дата
Msg-id 43DA55A7.9010803@sigaev.ru
обсуждение исходный текст
Ответ на Re: TSearch2 / German compound words / UTF-8  (Alexander Presber <aljoscha@weisshuhn.de>)
Ответы Re: TSearch2 / German compound words / UTF-8  (Harald Armin Massa <haraldarminmassa@gmail.com>)
Re: TSearch2 / German compound words / UTF-8  (Alexander Presber <aljoscha@weisshuhn.de>)
Список pgsql-general
contrib_regression=# insert into pg_ts_dict values (
          'norwegian_ispell',
           (select dict_init from pg_ts_dict where dict_name='ispell_template'),
           'DictFile="/usr/local/share/ispell/norsk.dict" ,'
           'AffFile ="/usr/local/share/ispell/norsk.aff"',
          (select dict_lexize from pg_ts_dict where dict_name='ispell_template'),
          'Norwegian ISpell dictionary'
    );
INSERT 16681 1
contrib_regression=# select lexize('norwegian_ispell','politimester');
                   lexize
------------------------------------------
  {politimester,politi,mester,politi,mest}
(1 row)

contrib_regression=# select lexize('norwegian_ispell','sjokoladefabrikk');
                 lexize
--------------------------------------
  {sjokoladefabrikk,sjokolade,fabrikk}
(1 row)

contrib_regression=# select lexize('norwegian_ispell','overtrekksgrilldresser');
          lexize
-------------------------
  {overtrekk,grill,dress}
(1 row)
% psql -l
            List of databases
         Name        | Owner  | Encoding
--------------------+--------+----------
  contrib_regression | teodor | KOI8
  postgres           | pgsql  | KOI8
  template0          | pgsql  | KOI8
  template1          | pgsql  | KOI8
(4 rows)


I'm afraid that UTF-8 problem. We just committed in CVS HEAD multibyte support
for tsearch2, so you can try it.

Pls, notice, the dict, aff stopword files should be in server encoding. Snowball
sources for german (and other) in UTF8 can be founded in
http://snowball.tartarus.org/dist/libstemmer_c.tgz

To all: May be, we should put all snowball's stemmers (for all available
languages and encodings) to tsearch2 directory?

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

В списке pgsql-general по дате отправления:

Предыдущее
От: Doug McNaught
Дата:
Сообщение: Re: Accessing an old database from a new OS installation.
Следующее
От: Rick Gigger
Дата:
Сообщение: Re: incremental backups