Re: snowball ASCII stemmer configuration

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: snowball ASCII stemmer configuration
Дата
Msg-id 1300297.1592315626@sss.pgh.pa.us
обсуждение исходный текст
Ответ на snowball ASCII stemmer configuration  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы Re: snowball ASCII stemmer configuration  (Oleg Bartunov <obartunov@postgrespro.ru>)
Re: snowball ASCII stemmer configuration  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> There are two cases where these two columns are not the same:

>      hindi       english     \
>      russian     english     \

> The second one is old; the first one I added using the second one as 
> example.  But I wonder what the rationale for this is.  Maybe for hindi 
> one could make some kind of cultural argument, but for russian this 
> seems entirely arbitrary.

Perhaps it is, but we have actual Russians who think it's a good idea.
I recall questioning that point some years ago, and Oleg replied that
they'd done that intentionally because (a) technical Russian uses a lot
of English words, and (b) it's easy to tell which is which thanks to
the disjoint letter sets.

Whether the same is true for Hindi, I have no idea.

> Moreover, AFAIK, the following other languages do not use Latin-based 
> alphabets:

>      arabic      arabic      \
>      greek       greek       \
>      nepali      nepali      \
>      tamil       tamil       \

Hmm.  I think all of those entries are ones that got added by me while
absorbing post-2007 Snowball updates, and I confess that I did not think
about this point.  Maybe these should be changed.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Transactions involving multiple postgres foreign servers, take 2
Следующее
От: amul sul
Дата:
Сообщение: [Patch] ALTER SYSTEM READ ONLY