Re: BUG #13440: unaccent does not remove all diacritics

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: BUG #13440: unaccent does not remove all diacritics
Дата
Msg-id 5589642C.3000201@gmx.net
обсуждение исходный текст
Ответ на Re: BUG #13440: unaccent does not remove all diacritics  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-bugs
On 6/18/15 5:17 PM, Alvaro Herrera wrote:
> To me, conceptually what unaccent does is turn whatever junk you have
> into a very basic common alphabet (ascii); then it's very easy to do
> full text searches without having to worry about what accents the people
> did or did not use in their searches.  If we say "okay, but that funny
> char is not an accent so let's leave it alone" then the charter doesn't
> sound so useful to me.

I think unaccent is one of those contrib things that are useful but not
really fully thought out and therefore won't ever become an official
core feature.  It is what it is, and we can tweak it slightly, but
thinking too hard about what it "should" do won't lead anywhere.

If we wanted to do this "properly", we could do something like: perform
Unicode canonical decomposition, then strip out all combining
characters.  I don't know how useful that is in practice, though.  And
it won't "solve" issues such as German ß, which probably doesn't have a
one-size-fits-all solution.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Марк Коренберг
Дата:
Сообщение: Re: BUG #13462: Impossible to use COPY FORMAT BINARY in chunks through libpq
Следующее
От: nanaska_91@mail.ru
Дата:
Сообщение: BUG #13463: fatal 28000 no pg_hba.conf entry for host