Re: JSON and unicode surrogate pairs

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: JSON and unicode surrogate pairs
Дата	12 июня 2013 г. 03:42:59
Msg-id	51B7C3F2.8010001@dunslane.net обсуждение исходный текст
Ответ на	Re: JSON and unicode surrogate pairs (Noah Misch <noah@leadboat.com>)
Ответы	Re: JSON and unicode surrogate pairs (Noah Misch <noah@leadboat.com>) Re: JSON and unicode surrogate pairs (Craig Ringer <craig@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On 06/11/2013 08:18 PM, Noah Misch wrote:
> On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote:
>> On 06/11/2013 06:26 PM, Noah Misch wrote:
>>>> As a final counter example, let me note that Postgres itself handles
>>>> Unicode escapes differently in UTF8 databases - in other databases it
>>>> only accepts Unicode escapes up to U+007f, i.e. ASCII characters.
>>> I don't see a counterexample there; every database that accepts without error
>>> a given Unicode escape produces from it the same text value.  The proposal to
>>> which I objected was akin to having non-UTF8 databases silently translate
>>> E'\u0220' to E'\\u0220'.
>> What?
>>
>> There will be no silent translation. The only debate here is about how
>> these databases turn strings values inside a json datum into PostgreSQL
>> text values via the documented operation of certain functions and
>> operators. If the JSON datum doesn't already contain a unicode escape
>> then nothing of what's been discussed would apply. Nothing whatever
>> that's been proposed would cause a unicode escape sequence to be emitted
>> that wasn't already there in the first place, and no patch that I have
>> submitted has contained any escape sequence generation at all.
> Under your proposal to which I was referring, this statement would return true
> in UTF8 databases and false in databases of other encodings:
>
>      SELECT '["\u0220"]'::json ->> 0 = E'\u0220'
>
> Contrast the next statement, which would return false in UTF8 databases and
> true in databases of other encodings:
>
>      SELECT '["\u0220"]'::json ->> 0 = E'\\u0220'
>
> Defining ->>(json,int) and ->>(json,text) in this way would be *akin to*
> having "SELECT E'\u0220' = E'\\u0220'" return true in non-UTF8 databases.  I
> refer to user-visible semantics, not matters of implementation.  Does that
> help to clarify my earlier statement?

Well, I think that's drawing a bit of a long bow, but never mind.

If we work by analogy to Postgres' own handling of Unicode escapes, 
we'll raise an error on any Unicode escape beyond ASCII (not on input 
for legacy reasons, but on trying to process such datums). I gather that 
would meet your objection.

cheers

andrew

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 12 июня 2013 г., 03:35:37
Сообщение: Re: Adding IEEE 754:2008 decimal floating point and hardware support for it

Следующее

От: Tom Dunstan
Дата: 12 июня 2013 г., 03:52:30
Сообщение: Re: Configurable location for extension .control files

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: JSON and unicode surrogate pairs

Предыдущее

Следующее