Re: JSON and unicode surrogate pairs

Поиск

Список

Период

Сортировка

От	Noah Misch
Тема	Re: JSON and unicode surrogate pairs
Дата	12 июня 2013 г. 03:18:32
Msg-id	20130612001824.GA578390@tornado.leadboat.com обсуждение исходный текст
Ответ на	Re: JSON and unicode surrogate pairs (Andrew Dunstan <andrew@dunslane.net>)
Ответы	Re: JSON and unicode surrogate pairs (Andrew Dunstan <andrew@dunslane.net>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote:
>
> On 06/11/2013 06:26 PM, Noah Misch wrote:
>>
>>> As a final counter example, let me note that Postgres itself handles
>>> Unicode escapes differently in UTF8 databases - in other databases it
>>> only accepts Unicode escapes up to U+007f, i.e. ASCII characters.
>> I don't see a counterexample there; every database that accepts without error
>> a given Unicode escape produces from it the same text value.  The proposal to
>> which I objected was akin to having non-UTF8 databases silently translate
>> E'\u0220' to E'\\u0220'.
>
> What?
>
> There will be no silent translation. The only debate here is about how  
> these databases turn strings values inside a json datum into PostgreSQL  
> text values via the documented operation of certain functions and  
> operators. If the JSON datum doesn't already contain a unicode escape  
> then nothing of what's been discussed would apply. Nothing whatever  
> that's been proposed would cause a unicode escape sequence to be emitted  
> that wasn't already there in the first place, and no patch that I have  
> submitted has contained any escape sequence generation at all.

Under your proposal to which I was referring, this statement would return true
in UTF8 databases and false in databases of other encodings:
   SELECT '["\u0220"]'::json ->> 0 = E'\u0220'

Contrast the next statement, which would return false in UTF8 databases and
true in databases of other encodings:
   SELECT '["\u0220"]'::json ->> 0 = E'\\u0220'

Defining ->>(json,int) and ->>(json,text) in this way would be *akin to*
having "SELECT E'\u0220' = E'\\u0220'" return true in non-UTF8 databases.  I
refer to user-visible semantics, not matters of implementation.  Does that
help to clarify my earlier statement?

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Craig Ringer
Дата: 12 июня 2013 г., 02:56:53
Сообщение: Adding IEEE 754:2008 decimal floating point and hardware support for it

Следующее

От: Tom Lane
Дата: 12 июня 2013 г., 03:35:37
Сообщение: Re: Adding IEEE 754:2008 decimal floating point and hardware support for it

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: JSON and unicode surrogate pairs

Предыдущее

Следующее