Re: JSON for PG 9.2

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: JSON for PG 9.2
Дата
Msg-id CA+TgmoaJbketfb1yiZwWoiiMa92bgBNkDfkyLfFhzcdV=MVqGg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: JSON for PG 9.2  (Abhijit Menon-Sen <ams@toroid.org>)
Ответы Re: JSON for PG 9.2  (Abhijit Menon-Sen <ams@toroid.org>)
Список pgsql-hackers
On Thu, Feb 2, 2012 at 4:54 AM, Abhijit Menon-Sen <ams@toroid.org> wrote:
> At 2012-02-01 11:28:50 -0500, robertmhaas@gmail.com wrote:
>> It's also pretty clear that JSON
>> string -> PG text data type is going to admit of a number of error
>> conditions (transcoding errors and perhaps invalid surrogate pairs) so
>> throwing one more on the pile doesn't cost much.
>
> I'm sorry for being slow, but I don't understand what you're proposing
> to do here (if anything). Could I ask you to explain, please?
>
> Are you talking about allowing the six literal bytes "\u0000" to be
> present in the JSON? If so, I agree, there seems to be no reason to
> disallow it.
>
> Are you also saying we should allow any "\uNNNN" sequence, without
> checking for errors (e.g. invalid surrogate pairs or parts thereof)?
>
> And what transcoding errors are you referring to?

Consider the following JSON object:

"abc"

This is a JSON string.  Someone is eventually going to propose a
function with a  name like json_to_string() which, when given this
JSON object, returns a three-character string with the PostgreSQL text
type.  That's useful and I support it.  But now suppose we pass this
JSON object to that same function:

"a\u0062c"

The user will quite rightly expect that since this JSON object
represents the same value as the first JSON object, they're going to
get the same answer back from json_to_string(), namely "abc".  So far,
so good.  But now suppose we pass this JSON object to that same
function:

"a\u0000c"

This is going to have to be an error condition, because PostgreSQL
does not allow values of type text to contain embedded NUL characters.Now consider this:

"a\uABCDc"

Suppose that \uABCD represents a character that exists in Unicode, but
the server-encoding is SHIFT-JIS or EUC-JP or some other system which
has no equivalent for the character represented by \uABCD.  Again,
when json_to_string() is applied to a value of this type, it must
fail.

In other words, we're knowingly allowing JSON strings to contain
characters which might not be representable as PostgreSQL strings,
because JSON allows any Unicode character, and the server encoding
might not be Unicode, and the server doesn't allow embedded NULs in
any encoding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: spgist text_ops and LIKE
Следующее
От: "ktm@rice.edu"
Дата:
Сообщение: Re: Progress on fast path sorting, btree index creation time