Обсуждение: JSON validation behavior

Поиск
Список
Период
Сортировка

JSON validation behavior

От
Sergei Kornilov
Дата:
Hi

We have some json regression tests in src/test/regress/expected/json_encoding_1.out with \u0000 symbol

select json '{ "a":  "null \u0000 escape" }' as not_unescaped;
         not_unescaped          
--------------------------------
 { "a":  "null \u0000 escape" }
(1 row)

select json '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.
CONTEXT:  JSON data, line 1: { "a":...

Well, requested text type can not have \u0000 byte. But seems strange: we test json type with this value but raise same
errorfor -> operator:
 

melkij=> select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.
CONTEXT:  JSON data, line 1: { "a":...

Result was requested in json datatype, like 'SELECT '"\u0000"'::json;' from tests before.
Similar error with access by different key:

melkij=> select json '{ "a": "null \u0000 escape", "b":1 }' ->>'b' as fails;
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.
CONTEXT:  JSON data, line 1: { "a":...

We allow write such json to table, we allow read whole json, but we can not use native operators. Is this behavior
expected?

regards, Sergei


Re: JSON validation behavior

От
"David G. Johnston"
Дата:
On Wed, Oct 24, 2018 at 7:25 AM Sergei Kornilov <sk@zsrv.org> wrote:

DETAIL:  \u0000 cannot be converted to text.

Well, requested text type can not have \u0000 byte. But seems strange: we test json type with this value but raise same error for -> operator:

We allow write such json to table, we allow read whole json, but we can not use native operators. Is this behavior expected?

It isn't that different than saying:

'123bcd'::integer -- error, invalid input for type integer

While text can hold just about everything it cannot contain an actual ASCII NUL character and so a JSON value with a unicode represented NUL cannot be converted to text.  Text doesn't have a stored concept of escaped values, using escape is only valid during entry.

The following does seem buggy though:

select json '{ "a": "null \u0000 escape"}' -> 'a' as fails; 

The final result should be json yet somewhere it seems there is an intermediate text being constructructed and that implementation detail is causing an error in an otherwise valid situation.

David J.


Re: JSON validation behavior

От
Sergei Kornilov
Дата:
Hi

24.10.2018, 17:40, "David G. Johnston" <david.g.johnston@gmail.com>:
> On Wed, Oct 24, 2018 at 7:25 AM Sergei Kornilov <sk@zsrv.org> wrote:
>
>> DETAIL:  \u0000 cannot be converted to text.
>>
>> Well, requested text type can not have \u0000 byte. But seems strange: we test json type with this value but raise
sameerror for -> operator:
 
>>
>> We allow write such json to table, we allow read whole json, but we can not use native operators. Is this behavior
expected?
>
> It isn't that different than saying:
>
> '123bcd'::integer -- error, invalid input for type integer
>
> While text can hold just about everything it cannot contain an actual ASCII NUL character and so a JSON value with a
unicoderepresented NUL cannot be converted to text.  Text doesn't have a stored concept of escaped values, using escape
isonly valid during entry.
 
Yes, it is reasonable for operators which returns text, such as ->>  (and i do not have question on this)
I was surprised by operators with json type result

regards, Sergei


Re: JSON validation behavior

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> The following does seem buggy though:
> select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;

Perhaps, but AFAICS it's entirely accidental that any variant of this
doesn't fail.  Although this manages not to fail:

regression=# select json '{ "a": "null \u0000 escape"}' ;
             json             
------------------------------
 { "a": "null \u0000 escape"}
(1 row)

this does:

regression=# select jsonb '{ "a": "null \u0000 escape"}' ;
ERROR:  unsupported Unicode escape sequence

The error message is actually being thrown in the JSON parser, and it
only doesn't get thrown if the parser knows that it's just error-checking
the data and not producing any converted output (cf the
"if (lex->strval != NULL)" block starting at json.c:832).  It actually
seems to me that this behavior is a bug, in that there are a bunch of
error checks there (not only this one) that are skipped in the allegedly
"error checking only" path.

But anyway, making it work as suggested here would take a substantial
amount of refactoring, and it would not (I think) work anyway for jsonb,
so it doesn't quite seem worth a lot of work.  I could get behind fixing
it to always throw the error, but that's not what Sergei was hoping for.

            regards, tom lane


Re: JSON validation behavior

От
Sergei Kornilov
Дата:
Hi

> I could get behind fixing
> it to always throw the error, but that's not what Sergei was hoping for.
On the contrary i think it is reasonable way. It is much better to have error on input value instead of finding wrong
valueduring table processing. We always reject this value for jsonb and i expected the same behavior for json. Not sure
aboutJSON specification compliance, didn't find anything about \u0000 in rfc8259.
 

regards, Sergei


Re: JSON validation behavior

От
Andrew Dunstan
Дата:

On 10/24/2018 11:54 AM, Tom Lane wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
>> The following does seem buggy though:
>> select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;
> Perhaps, but AFAICS it's entirely accidental that any variant of this
> doesn't fail.  Although this manages not to fail:
>
> regression=# select json '{ "a": "null \u0000 escape"}' ;
>               json
> ------------------------------
>   { "a": "null \u0000 escape"}
> (1 row)
>
> this does:
>
> regression=# select jsonb '{ "a": "null \u0000 escape"}' ;
> ERROR:  unsupported Unicode escape sequence
>
> The error message is actually being thrown in the JSON parser, and it
> only doesn't get thrown if the parser knows that it's just error-checking
> the data and not producing any converted output (cf the
> "if (lex->strval != NULL)" block starting at json.c:832).  It actually
> seems to me that this behavior is a bug, in that there are a bunch of
> error checks there (not only this one) that are skipped in the allegedly
> "error checking only" path.
>
> But anyway, making it work as suggested here would take a substantial
> amount of refactoring, and it would not (I think) work anyway for jsonb,
> so it doesn't quite seem worth a lot of work.  I could get behind fixing
> it to always throw the error, but that's not what Sergei was hoping for.
>
>             


I think we might be able to do something that doesn't have too high an 
impact. I'll take a look.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services