Обсуждение: Packaging problem: using myspell dictionaries for tsearch2

Поиск
Список
Период
Сортировка

Packaging problem: using myspell dictionaries for tsearch2

От
Martin Pitt
Дата:
Hello all,

for providing proper English stemming support in searches (tsearch2 on
PostgreSQL 8.3), tsearch needs the British/American myspell
dictionaries. However, this system currently seems to be very
inconvenient to packagers like me (I'm responsible for the
Debian and Ubuntu packages of PostgreSQL), who would like to
provide a good out-of-the-box experience.

Ideally, installing the myspell-en-gb package would automatically make
tsearch2 aware of it and use the dictionary and affix rules. However,
we found several problems which don't make this possible:

 - tsearch2 looks for these files in ${configure_datadir}/tsearch_data, i. e.
   /usr/share/postgresql/8.3/tsearch_data/ in Debian, whereas myspell
   dictionaries are shipped in /usr/share/myspell/dicts/.

   This by itself is probably fixable easily, by adding another search
   path to tsearch2. This would probably end up as a configure option.

 - Reportedly PostgreSQL expects those myspell files to be encoded in
   the server encoding. However, the server encoding can be changed at
   runtime, whereas the myspell files are shipped statically.

   Reencoding the .dic from latin1 to UTF-8 during build is possible,
   but first it's inconvenient, and more importantly, it is either a
   package maintenance nightmare (when shipping static files), or
   involves some dirty tricks (rebuilding the postgresql myspell files
   whenever one of the original myspell files changes).

   Also, even a reencoding doesn't change the fact that as soon as you
   change the server locale, you end up with broken tsearch again.

Is there any better approach to this? In my dream world, tsearch2
would look in /usr/share/myspell/dicts/, use the dictionary/affix
rules there, reencode them on the fly to the server encoding, and
otherwise use them as they are.

Thanks in advance for any insight!

Martin

--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

Вложения

Re: Packaging problem: using myspell dictionaries for tsearch2

От
Tom Lane
Дата:
Martin Pitt <mpitt@debian.org> writes:
>  - Reportedly PostgreSQL expects those myspell files to be encoded in
>    the server encoding.

This is incorrect, at least as of 8.3 --- they are supposed to be utf-8 always.

            regards, tom lane