Re: Unicode string literals versus the world

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Unicode string literals versus the world
Дата
Msg-id 17658.1239893656@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Unicode string literals versus the world  (Sam Mason <sam@samason.me.uk>)
Ответы Re: Unicode string literals versus the world  (Sam Mason <sam@samason.me.uk>)
Re: Unicode string literals versus the world  (Andrew Dunstan <andrew@dunslane.net>)
Re: Unicode string literals versus the world  (Marko Kreen <markokr@gmail.com>)
Список pgsql-hackers
Sam Mason <sam@samason.me.uk> writes:
> I'd never heard of UTF-16 surrogate pairs before this discussion and
> hence didn't realise that it's valid to have a surrogate pair in place
> of a single code point.  The docs say that <D800 DF02> corresponds to
> U+10302, Python would appear to follow my intuitions in that:

>   ord(u'\uD800\uDF02')

> results in an error instead of giving back 66306, as I'd expect.  Is
> this a bug in Python, my understanding, or something else?

I might be wrong, but I think surrogate pairs are expressly forbidden in
all representations other than UTF16/UCS2.  We definitely forbid them
when validating UTF-8 strings --- that's per an RFC recommendation.
It sounds like Python is doing the same.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sam Mason
Дата:
Сообщение: Re: Unicode string literals versus the world
Следующее
От: David Fetter
Дата:
Сообщение: Re: [GENERAL] Performance of full outer join in 8.3