Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
Дата
Msg-id 20181218045708.GI1532@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Tue, Dec 18, 2018 at 03:05:00PM +1100, Thomas Munro wrote:
> I don't think this is quite right.  Those don't seem to be the
> combining codepoints[1], and in any case they are being replaced with
> ASCII characters, whereas I thought we wanted to replace them with
> nothing at all.  Here is my attempt to come up with a test case using
> combining characters:
>
>   select unaccent('un café crème s''il vous plaît');
>
> It's not stripping the accents.  I've attached that in a file for
> reference so you can run it with psql -f x.sql, and you can see that
> it's using combining code points (code points 0301, 0300, 0302 which
> come out as cc81, cc80, cc82 in UTF-8) like so:

Could you also add some tests in contrib/unaccent/sql/unaccent.sql at
the same time?  That would be nice to check easily the extent of the
patches proposed on this thread.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #15548: Unaccent does not remove combining diacritical characters
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction