Re: BUG #13440: unaccent does not remove all diacritics

Поиск
Список
Период
Сортировка
От Michael Gradek
Тема Re: BUG #13440: unaccent does not remove all diacritics
Дата
Msg-id CAEP8ZNVKxwBNyQx-CxcTL0hiNax3AScy208fs=8_Qp2cHt8y1A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #13440: unaccent does not remove all diacritics  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-bugs
Thanks everyone, I've been comparing the behavior to that of
https://github.com/andrewrk/node-diacritics/blob/master/index.js if that
can be of any help.

On Monday, June 15, 2015, Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

> On Tue, Jun 16, 2015 at 12:55 AM, Tom Lane <tgl@sss.pgh.pa.us
> <javascript:;>> wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com <javascript:;>> writes:
> >> My terminal shows these characters to be different.  One is
> >> http://graphemica.com/%C8%9B
> >>       latin small letter t with comma below (U+021B)
> >
> >> The other is
> >> http://graphemica.com/%C5%A3
> >>       latin small letter t with cedilla (U+0163)
> >
> > Ah-hah -- I did not look closely enough.  So the immediate answer for
> > Michael is to add another entry to his unaccent.rules file.
> >
> > Should we add the missing character to the standard unaccent.rules file=
?
>
> It looks like Romanian also has s with comma.  Perhaps we should have
> all these characters:
>
> $ curl -s http://unicode.org/Public/7.0.0/ucd/UnicodeData.txt | egrep
> ';LATIN (SMALL|CAPITAL) LETTER [A-Z] WITH ' | wc -l
>      702
>
> That's quite a lot more than the 187 we currently have.  Of those, I
> think only the following ligature characters don't fit the above
> pattern: =C3=86, =C3=A6, =C4=B2, =C4=B3, =C5=92, =C5=93, =C3=9F.  Inciden=
tally, I don't believe that the
> way we "unaccent" ligatures is correct anyway.  Maybe they should be
> expanded to AE, ae, IJ, ij, OE, oe, ss, respectively, not A, a, I, i,
> O, o, S as we have it, but I guess it depends what the purpose of
> unaccent is...
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


--=20
Cheers,
Mike
--=20
Mike Gradek
Co-founder and CTO, Busbud
Busbud.com <http://busbud.com/> | mike@busbud.com
*We're hiring!: Jobs at Busbud <http://www.busbud.com/en/about/jobs>*

В списке pgsql-bugs по дате отправления:

Предыдущее
От: 德哥
Дата:
Сообщение: Re: BUG #13443: master will remove dead rows when hot standby(use slot) disconnect
Следующее
От: cpt@novozymes.com
Дата:
Сообщение: BUG #13446: pg_dump fails with large tuples