Re: snowball ASCII stemmer configuration

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: snowball ASCII stemmer configuration
Дата
Msg-id 1301915.1592318237@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: snowball ASCII stemmer configuration  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: snowball ASCII stemmer configuration  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: snowball ASCII stemmer configuration  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
I wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> Moreover, AFAIK, the following other languages do not use Latin-based 
>> alphabets:

>> arabic      arabic      \
>> greek       greek       \
>> nepali      nepali      \
>> tamil       tamil       \

> Hmm.  I think all of those entries are ones that got added by me while
> absorbing post-2007 Snowball updates, and I confess that I did not think
> about this point.  Maybe these should be changed.

After further reflection, I think these are indeed mistakes and we should
change them all.  The argument for the Russian/English case, AIUI, is
"if we come across an all-ASCII word, it is most certainly not Russian,
and the most likely Latin-based language is English".  Given the world
as it is, I think the same argument works for all non-Latin-alphabet
languages.  Obviously specific applications might have a different idea
of the best fallback language, but that's why we let users make their
own text search configurations.  For general-purpose use, falling back
to English seems reasonable.  And we can be dead certain that applying
a Greek stemmer to an ASCII word will do nothing useful, so the
configuration choice shown above is unhelpful.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Transactions involving multiple postgres foreign servers, take2
Следующее
От: Georgios
Дата:
Сообщение: Use TableAm API in pg_table_size