Re: snowball ASCII stemmer configuration

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: snowball ASCII stemmer configuration
Дата
Msg-id 5d69019d-35e8-1adb-c110-b456b9a93dbd@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: snowball ASCII stemmer configuration  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: snowball ASCII stemmer configuration  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2020-06-16 16:37, Tom Lane wrote:
> After further reflection, I think these are indeed mistakes and we should
> change them all.  The argument for the Russian/English case, AIUI, is
> "if we come across an all-ASCII word, it is most certainly not Russian,
> and the most likely Latin-based language is English".  Given the world
> as it is, I think the same argument works for all non-Latin-alphabet
> languages.  Obviously specific applications might have a different idea
> of the best fallback language, but that's why we let users make their
> own text search configurations.  For general-purpose use, falling back
> to English seems reasonable.  And we can be dead certain that applying
> a Greek stemmer to an ASCII word will do nothing useful, so the
> configuration choice shown above is unhelpful.

Do we *have* to have an ASCII stemmer that corresponds to an actual 
language?  Couldn't we use the simple stemmer or no stemmer at all?

In my experience, ASCII text in, say, Russian or Greek will typically be 
acronyms or brand names or the like, and there doesn't seem to be a 
great need to stem that kind of thing.  Just doing nothing seems at 
least as good.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: update substring pattern matching syntax
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Transactions involving multiple postgres foreign servers, take 2