Re: [tsvector] to_tsvector called multiple times

Поиск
Список
Период
Сортировка
От Sven R. Kunze
Тема Re: [tsvector] to_tsvector called multiple times
Дата
Msg-id 55643D11.1040604@tbz-pariv.de
обсуждение исходный текст
Ответ на Re: [tsvector] to_tsvector called multiple times  (Oleg Bartunov <obartunov@gmail.com>)
Список pgsql-general
Thanks, Oleg. Unfortunately, that does not work quite well as German is comprised of many compound nouns.

In fact, I discovered that anomaly by searching through a domain-specific word table. For example: Waferhandlingsystem. There are many '*system' but the PostgreSQL does not allow me to have a suffix; only a prefix and only for to_tsquery (http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-SYNONYM-DICTIONARY).

Is there another possibility?


On 26.05.2015 11:05, Oleg Bartunov wrote:
You can ask http://snowball.tartarus.org/ for stemmer. Meanwhile,
you can have small personal dictionary (before stemmer) with such exceptions, for example, use synonym template

system system

Oleg


On Tue, May 26, 2015 at 11:18 AM, Sven R. Kunze <srkunze@tbz-pariv.de> wrote:
Hi everybody,

the following stemming results made me curious:

select to_tsvector('german', 'systeme'); > 'system':1
select to_tsvector('german', 'systemes'); > 'system':1
select to_tsvector('german', 'systems'); > 'system':1
select to_tsvector('german', 'systemen'); > 'system':1
select to_tsvector('german', 'system'); >  'syst':1


First of all, this seems to be a bug in the German stemmer. Where can I fix it?

Second, and more importantly, as I understand it, the stemmed version of a word should be considered normalized. That is, all other versions of that stem should be mapped to it as well. The interesting problem here is that PostgreSQL maps the stem itself ('system') to a completely different stem ('syst').

Should a stem not remain stable even when to_tsvector is called on it multiple times?

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



-- 
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

В списке pgsql-general по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: [tsvector] to_tsvector called multiple times
Следующее
От: "Sven R. Kunze"
Дата:
Сообщение: Re: [tsvector] to_tsvector called multiple times