Re: BUG #13440: unaccent does not remove all diacritics

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: BUG #13440: unaccent does not remove all diacritics
Дата
Msg-id 20150618211722.GJ133018@postgresql.org
обсуждение исходный текст
Ответ на Re: BUG #13440: unaccent does not remove all diacritics  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #13440: unaccent does not remove all diacritics
Re: BUG #13440: unaccent does not remove all diacritics
Список pgsql-bugs
Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > On Wed, Jun 17, 2015 at 10:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm really dubious that we should be translating those ligatures at
> >> all (since the standard file is only advertised to do "unaccenting"),
> >> and if we do translate them, shouldn't they convert to AE, ae, etc?
>
> > Perhaps these conversions are intended only for comparisons, full text
> > indexing etc but not showing the converted text to a user, in which
> > case it doesn't matter too much if the conversions are a bit weird
> > (œuf and oeuf are interchangeable in French, but euf is nonsense).
> > But can we actually change them?  That could cause difficulty for
> > users with existing unaccented data stored/indexed...  But I suppose
> > even adding new mappings could cause problems.
>
> Yeah, if we do anything other than adding new mappings, I suspect that
> part could not be back-patched.  Maybe adding new mappings shouldn't
> be back-patched either, though it seems relatively safe to me.

To me, conceptually what unaccent does is turn whatever junk you have
into a very basic common alphabet (ascii); then it's very easy to do
full text searches without having to worry about what accents the people
did or did not use in their searches.  If we say "okay, but that funny
char is not an accent so let's leave it alone" then the charter doesn't
sound so useful to me.

The cases I care about are okay anyway, because all the funny chars in
spanish are already covered; and maybe German people always enter their
queries using the funny ss thing I can't even write, and then this is
not a problem for them.


Regarding back-patching unaccent.rules changes as discussed downthread,
I think it's okay to simply document that any indexes using the module
should be reindexed immediately after upgrading to that minor version.
The consequence of not doing so is not *that* serious anyway.  But then,
since I'm not actually affected in any way, I'm not strongly holding
this position either.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #13440: unaccent does not remove all diacritics
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #13440: unaccent does not remove all diacritics