Re: JSON and unicode surrogate pairs

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: JSON and unicode surrogate pairs
Дата
Msg-id 20130611032208.GA569740@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: JSON and unicode surrogate pairs  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: JSON and unicode surrogate pairs  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers
On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote:
>
> On 06/10/2013 10:18 AM, Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> After thinking about this some more I have come to the conclusion that
>>> we should only do any de-escaping of \uxxxx sequences, whether or not
>>> they are for BMP characters, when the server encoding is utf8. For any
>>> other encoding, which is already a violation of the JSON standard
>>> anyway, and should be avoided if you're dealing with JSON, we should
>>> just pass them through even in text output. This will be a simple and
>>> very localized fix.
>> Hmm.  I'm not sure that users will like this definition --- it will seem
>> pretty arbitrary to them that conversion of \u sequences happens in some
>> databases and not others.

Yep.  Suppose you have a LATIN1 database.  Changing it to a UTF8 database
where everyone uses client_encoding = LATIN1 should not change the semantics
of successful SQL statements.  Some statements that fail with one database
encoding will succeed in the other, but a user should not witness a changed
non-error result.  (Except functions like decode() that explicitly expose byte
representations.)  Having "SELECT '["\u00e4"]'::json ->> 0" emit 'ä' in the
UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in the
wrong direction relative to that ideal.

> Then what should we do when there is no matching codepoint in the  
> database encoding? First we'll have to delay the evaluation so it's not  
> done over-eagerly, and then we'll have to try the conversion and throw  
> an error if it doesn't work. The second part is what's happening now,  
> but the delayed evaluation is not.

+1 for doing it that way.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: DO ... RETURNING