Re: BUG #13440: unaccent does not remove all diacritics

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #13440: unaccent does not remove all diacritics
Дата
Msg-id CAEepm=2b1df83h68tTiuk_xGC-PVmru02+rE2xp6_Hs5q_zHSg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #13440: unaccent does not remove all diacritics  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Mon, Jun 15, 2015 at 5:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> mike@busbud.com writes:
>> Sorry, I couldn't install the most recent minor release, but I did try t=
his
>> on several different versions. I used Heroku to try a 9.4.3 build, and g=
ot
>> the same results
>
>> select '=C8=9B' as input, unaccent('=C8=9B') as observed, 't' as expecte=
d;
>>  input | observed | expected
>> -------+----------+----------
>>  =C8=9B     | =C8=9B        | t
>> (1 row)
>
> Hm, I do see
>
> =C5=A3       t
>
> in unaccent.rules, so the transformation ought to happen.  I suspect
> an encoding issue, eg your terminal window is not transmitting characters
> in the encoding Postgres thinks you're using.  You did not provide any
> info about server encoding, client encoding, or client LC_xxx environment=
,
> so it's hard to debug from here.

The one that is in unaccent.rules is apparently t-cedilla, from Gagauz
and Romanian:

https://en.wiktionary.org/wiki/%C5%A3

The one that is referred to above is apparently t-comma, from Livonian
and Romanian, but "[o]ften replaced by =C5=A2 / =C5=A3 (t with cedilla),
especially in computing":

https://en.wiktionary.org/wiki/%C8%9B

--=20
Thomas Munro
http://www.enterprisedb.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Gradek
Дата:
Сообщение: Re: BUG #13440: unaccent does not remove all diacritics
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #13440: unaccent does not remove all diacritics