Re: JSON and unicode surrogate pairs

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: JSON and unicode surrogate pairs
Дата	11 июня 2013 г. 21:11:17
Msg-id	51B76825.2020803@dunslane.net обсуждение исходный текст
Ответ на	Re: JSON and unicode surrogate pairs (Noah Misch <noah@leadboat.com>)
Ответы	Re: JSON and unicode surrogate pairs (Tom Lane <tgl@sss.pgh.pa.us>) Re: JSON and unicode surrogate pairs (Noah Misch <noah@leadboat.com>)
Список	pgsql-hackers

Дерево обсуждения

On 06/10/2013 11:22 PM, Noah Misch wrote:
> On Mon, Jun 10, 2013 at 11:20:13AM -0400, Andrew Dunstan wrote:
>> On 06/10/2013 10:18 AM, Tom Lane wrote:
>>> Andrew Dunstan <andrew@dunslane.net> writes:
>>>> After thinking about this some more I have come to the conclusion that
>>>> we should only do any de-escaping of \uxxxx sequences, whether or not
>>>> they are for BMP characters, when the server encoding is utf8. For any
>>>> other encoding, which is already a violation of the JSON standard
>>>> anyway, and should be avoided if you're dealing with JSON, we should
>>>> just pass them through even in text output. This will be a simple and
>>>> very localized fix.
>>> Hmm.  I'm not sure that users will like this definition --- it will seem
>>> pretty arbitrary to them that conversion of \u sequences happens in some
>>> databases and not others.
> Yep.  Suppose you have a LATIN1 database.  Changing it to a UTF8 database
> where everyone uses client_encoding = LATIN1 should not change the semantics
> of successful SQL statements.  Some statements that fail with one database
> encoding will succeed in the other, but a user should not witness a changed
> non-error result.  (Except functions like decode() that explicitly expose byte
> representations.)  Having "SELECT '["\u00e4"]'::json ->> 0" emit 'ä' in the
> UTF8 database and '\u00e4' in the LATIN1 database would move PostgreSQL in the
> wrong direction relative to that ideal.
>
>> Then what should we do when there is no matching codepoint in the
>> database encoding? First we'll have to delay the evaluation so it's not
>> done over-eagerly, and then we'll have to try the conversion and throw
>> an error if it doesn't work. The second part is what's happening now,
>> but the delayed evaluation is not.
> +1 for doing it that way.
>



As a final counter example, let me note that Postgres itself handles
Unicode escapes differently in UTF8 databases - in other databases it
only accepts Unicode escapes up to U+007f, i.e. ASCII characters.

cheers

andrew

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Teodor Sigaev
Дата: 11 июня 2013 г., 20:58:48
Сообщение: Re: SPGist "triple parity" concept doesn't work

Следующее

От: Josh Berkus
Дата: 11 июня 2013 г., 21:12:13
Сообщение: 9.4 CF1 Starts Saturday: need patches, reviewers, asst. CFM

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: JSON and unicode surrogate pairs

Предыдущее

Следующее