Re: PostgreSQL fails to convert decomposed utf-8 to other encodings
От | Craig Ringer |
---|---|
Тема | Re: PostgreSQL fails to convert decomposed utf-8 to other encodings |
Дата | |
Msg-id | 53E1AB15.8050702@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: PostgreSQL fails to convert decomposed utf-8 to other encodings (Craig Ringer <craig@2ndquadrant.com>) |
Список | pgsql-bugs |
On 08/06/2014 11:54 AM, Craig Ringer wrote: > On 08/06/2014 09:14 AM, Tom Lane wrote: >> We don't actually support "decomposed" utf8; if there is any bug here, >> it's that the input you show isn't rejected. But I think there was >> some intentional choice to not check \u escapes fully. > > Combining characters (i.e. decomposed utf-8 form, for chars where there > is a combined equivalent) are part of utf-8. They're not an optional add-on. ... though we can advertise partial Unicode support, saying that we support UTF-8 for UCS (ISO 10646-1:2000 Annex D / RFC 3629) implementation level 1 only, requiring Normalization Form C (NFC) input. Given that Pg doesn't seem to understand \xf8 or \xfc utf-8 chars, so it doesn't cover the full utf-8 range, it doesn't look like it meets Level 1 either. So it supports "mostly-utf8". With level 1 we should really _reject_ combining chars, but can't do that w/o breaking BC. I guess I should turn this: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt into a regression test. Possibly also parts of this: http://www.columbia.edu/~fdc/utf8/ though it's more oriented toward rendering. It's worth noting that Konsole and Thunderbird had no issues with combining chars when I was testing this. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-bugs по дате отправления: