Обсуждение: inconsistent behaviour of json_to_record and friends with embeddedjson

Поиск
Список
Период
Сортировка

inconsistent behaviour of json_to_record and friends with embeddedjson

От
Robert Vollmert
Дата:
First off, I’m not sure what the bug is here; it might even be that there’s
just missing documentation. That said, I see internally inconsistent behaviour
in Postgres 10 and above, and a change of behaviour between 9.6 and 10 that’s
not really documented as intentional in the changelog.

In short, the differing statements are:

Postgres 9.6:

each of

1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json);
2. select * from json_to_record('{"out": {"key": 1}}')     as (out json);
3. select * from jsonb_to_record('{"out": "{\"key\": 1}"}') as (out json);
4. select * from jsonb_to_record('{"out": {"key": 1}}')     as (out json);

gives

    out
------------
 {"key": 1}

Postgres 10/11:

1. gives

ERROR:  invalid input syntax for type json
DETAIL:  Token "key" is invalid.
CONTEXT:  JSON data, line 1: "{“key…

2. and 4. give

    out
------------
 {"key": 1}

3. gives

      out
----------------
 "{\"key\": 1}”


I get similar results whith `… as (out jsonb)` or with json_populate_record in the form

create type j as (out json);
select out from json_populate_record(null::j, '{"out": "{\"key\": 1}"}’);

These tests were run on macos with postgresql versions 9.6, 10 and 11 installed
via nix, but I’ve confirmed that the issue also exists in other environments.

Apologies if this has been discussed before; the closest bug report I found is

BUG #10728: json_to_recordset with nested json objects NULLs columns

https://www.postgresql.org/message-id/flat/CAB7nPqTP1m5H%3DkNsm6mmv-5f7A99O7AP2X6E9ubb4ShZWq-COQ%40mail.gmail.com#1978f3e80110804859829f0f5cb9392e


The potentially buggy things:

1. Why do cases 1 and 3 behave differently in Postgres 10 and later? Should they?

2. Could the change in behaviour from 9.6 to 10 be documented more clearly if this part
is intentional? The release notes list:

> Make json_populate_record() and related functions process JSON arrays
> and objects recursively (Nikita Glukhov)

> With this change, array-type fields in the destination SQL type are
> properly converted from JSON arrays, and composite-type fields are
> properly converted from JSON objects. Previously, such cases would
> fail because the text representation of the JSON value would be fed
> to array_in() or record_in(), and its syntax would not match what
> those input functions expect.

It seems likely these changes are involved, but it’s not clear this concrete effect was
intended.

3. Could the documentation of the family of functions be extended to describe how
embedded quoted and unquoted json fields are treated?




Re: inconsistent behaviour of json_to_record and friends withembedded json

От
Robert Vollmert
Дата:

> On 30. May 2019, at 16:24, Robert Vollmert <rob@vllmrt.net> wrote:
>
> 1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json);

> Postgres 10/11:
>
> 1. gives
>
> ERROR:  invalid input syntax for type json
> DETAIL:  Token "key" is invalid.
> CONTEXT:  JSON data, line 1: "{“key…

By now, I’m inclined to believe that this part is a bug in Postgres >= 10.

Compare:

# select ('{"a":"{\"b\":1}"}'::json)->'a' as a;
      a
-------------
 "{\"b\":1}"

# select * from jsonb_to_record('{"a":"{\"b\":1}"}') as (a json);
      a
-------------
 "{\"b\":1}"


# select * from json_to_record('{"a":"{\"b\":1}"}') as (a json);
ERROR:  invalid input syntax for type json
DETAIL:  Token "b" is invalid.
CONTEXT:  JSON data, line 1: "{"b...

# select ('{"a":"{"b":1}"}'::json)->'a' as a;
ERROR:  invalid input syntax for type json
LINE 1: select ('{"a":"{"b":1}"}'::json)->'a' as a;
                ^
DETAIL:  Token "b" is invalid.
CONTEXT:  JSON data, line 1: {"a":"{"b…

It seems that json_to_record is messing up the escaping of quotes, and that
json_to_record should behave like jsonb_to_record here (interpreting the
quoted field as a JSON string, and not parsing it as a JSON object as it did
in version 9.6). Thoughts?





Re: inconsistent behaviour of json_to_record and friends withembedded json

От
Jeff Janes
Дата:
On Thu, May 30, 2019 at 10:41 AM Robert Vollmert <rob@vllmrt.net> wrote:

1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json);
...
 
Postgres 10/11:

1. gives

ERROR:  invalid input syntax for type json
DETAIL:  Token "key" is invalid.
CONTEXT:  JSON data, line 1: "{“key…

This is caused by commit:

commit cf35346e813e5a1373f308d397bb0a8f3f21d530
Author: Andrew Dunstan <andrew@dunslane.net>
Date:   Thu Apr 6 22:11:21 2017 -0400

    Make json_populate_record and friends operate recursively
 

As far as I can tell, this was not an intended change, so I agree it is probably a bug.

Cheers,

Jeff

Re: inconsistent behaviour of json_to_record and friends withembedded json

От
Michael Paquier
Дата:
On Sat, Jun 01, 2019 at 03:06:27PM -0400, Jeff Janes wrote:
> This is caused by commit:
>
> commit cf35346e813e5a1373f308d397bb0a8f3f21d530
> Author: Andrew Dunstan <andrew@dunslane.net>
> Date:   Thu Apr 6 22:11:21 2017 -0400
>
>     Make json_populate_record and friends operate recursively
>
>
> As far as I can tell, this was not an intended change, so I agree it is
> probably a bug.

Thanks for digging into this one.  I have read through quickly at the
conference, and indeed it smells like a bug.  I'll try to look at
that asap, which is not before a couple of days unfortunately.
--
Michael

Вложения

Re: inconsistent behaviour of json_to_record and friends with embedded json

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Thu, May 30, 2019 at 10:41 AM Robert Vollmert <rob@vllmrt.net> wrote:
>> 1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json);
>> Postgres 10/11:
>> 1. gives
>> ERROR:  invalid input syntax for type json
>> DETAIL:  Token "key" is invalid.
>> CONTEXT:  JSON data, line 1: "{"key...

> This is caused by commit:

> commit cf35346e813e5a1373f308d397bb0a8f3f21d530
> Author: Andrew Dunstan <andrew@dunslane.net>
> Date:   Thu Apr 6 22:11:21 2017 -0400
>
>     Make json_populate_record and friends operate recursively

> As far as I can tell, this was not an intended change, so I agree it is
> probably a bug.

I agree.  It looks to me like the problem is this over-optimistic
assumption:

            /*
             * Add quotes around string value (should be already escaped) if
             * converting to json/jsonb.
             */

No, it's *not* already escaped.  Fixing the code to use escape_json()
is a bit tedious, because for some reason that function wasn't designed
to support non-null-terminated input, but with the attached patch we get
what seems to me like sane behavior:

regression=# select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json);
      out
----------------
 "{\"key\": 1}"
(1 row)

regression=# select * from json_to_record('{"out": {"key": 1}}') as (out json);
    out
------------
 {"key": 1}
(1 row)

i.e. what you get is either a string or an object, the same as it was in
the input (same for all combinations of json/jsonb in/out).

Not terribly surprisingly, this patch changes no existing regression test
cases.  We should add some, but I've not done so here.

Also, modulo this bug for the json-input case, this *is* an intentional
behavioral change from 9.6, but it seems fairly poorly documented to
me.  Should we try to improve that?

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 28bdc72..9e7035c 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -2803,26 +2803,7 @@ populate_scalar(ScalarIOData *io, Oid typid, int32 typmod, JsValue *jsv)

         json = jsv->val.json.str;
         Assert(json);
-
-        /* already done the hard work in the json case */
-        if ((typid == JSONOID || typid == JSONBOID) &&
-            jsv->val.json.type == JSON_TOKEN_STRING)
-        {
-            /*
-             * Add quotes around string value (should be already escaped) if
-             * converting to json/jsonb.
-             */
-
-            if (len < 0)
-                len = strlen(json);
-
-            str = palloc(len + sizeof(char) * 3);
-            str[0] = '"';
-            memcpy(&str[1], json, len);
-            str[len + 1] = '"';
-            str[len + 2] = '\0';
-        }
-        else if (len >= 0)
+        if (len >= 0)
         {
             /* Need to copy non-null-terminated string */
             str = palloc(len + 1 * sizeof(char));
@@ -2830,7 +2811,21 @@ populate_scalar(ScalarIOData *io, Oid typid, int32 typmod, JsValue *jsv)
             str[len] = '\0';
         }
         else
-            str = json;            /* null-terminated string */
+            str = json;            /* string is already null-terminated */
+
+        /* If converting to json/jsonb, make string into valid JSON literal */
+        if ((typid == JSONOID || typid == JSONBOID) &&
+            jsv->val.json.type == JSON_TOKEN_STRING)
+        {
+            StringInfoData buf;
+
+            initStringInfo(&buf);
+            escape_json(&buf, str);
+            /* free temporary buffer */
+            if (str != json)
+                pfree(str);
+            str = buf.data;
+        }
     }
     else
     {

Re: inconsistent behaviour of json_to_record and friends with embedded json

От
Tom Lane
Дата:
I wrote:
> I agree.  It looks to me like the problem is this over-optimistic
> assumption:
>             /*
>              * Add quotes around string value (should be already escaped) if
>              * converting to json/jsonb.
>              */
> No, it's *not* already escaped.  Fixing the code to use escape_json()
> is a bit tedious, because for some reason that function wasn't designed
> to support non-null-terminated input, but with the attached patch we get
> what seems to me like sane behavior:

Hearing no comments, I've pushed this patch.  I also rewrote the
documentation to provide something approaching a specification for
what json_to_record() and friends do.

            regards, tom lane



Re: inconsistent behaviour of json_to_record and friends withembedded json

От
Andrew Dunstan
Дата:
On 6/11/19 1:35 PM, Tom Lane wrote:
> I wrote:
>> I agree.  It looks to me like the problem is this over-optimistic
>> assumption:
>>             /*
>>              * Add quotes around string value (should be already escaped) if
>>              * converting to json/jsonb.
>>              */
>> No, it's *not* already escaped.  Fixing the code to use escape_json()
>> is a bit tedious, because for some reason that function wasn't designed
>> to support non-null-terminated input, but with the attached patch we get
>> what seems to me like sane behavior:
> Hearing no comments, I've pushed this patch.  I also rewrote the
> documentation to provide something approaching a specification for
> what json_to_record() and friends do.


Sorry, I meant to tell you your fix looked good to me, it slipped
through the net.


Thanks for fixing this.


cheers


andrew