Обсуждение: inconsistent behaviour of json_to_record and friends with embeddedjson
First off, I’m not sure what the bug is here; it might even be that there’s just missing documentation. That said, I see internally inconsistent behaviour in Postgres 10 and above, and a change of behaviour between 9.6 and 10 that’s not really documented as intentional in the changelog. In short, the differing statements are: Postgres 9.6: each of 1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json); 2. select * from json_to_record('{"out": {"key": 1}}') as (out json); 3. select * from jsonb_to_record('{"out": "{\"key\": 1}"}') as (out json); 4. select * from jsonb_to_record('{"out": {"key": 1}}') as (out json); gives out ------------ {"key": 1} Postgres 10/11: 1. gives ERROR: invalid input syntax for type json DETAIL: Token "key" is invalid. CONTEXT: JSON data, line 1: "{“key… 2. and 4. give out ------------ {"key": 1} 3. gives out ---------------- "{\"key\": 1}” I get similar results whith `… as (out jsonb)` or with json_populate_record in the form create type j as (out json); select out from json_populate_record(null::j, '{"out": "{\"key\": 1}"}’); These tests were run on macos with postgresql versions 9.6, 10 and 11 installed via nix, but I’ve confirmed that the issue also exists in other environments. Apologies if this has been discussed before; the closest bug report I found is BUG #10728: json_to_recordset with nested json objects NULLs columns https://www.postgresql.org/message-id/flat/CAB7nPqTP1m5H%3DkNsm6mmv-5f7A99O7AP2X6E9ubb4ShZWq-COQ%40mail.gmail.com#1978f3e80110804859829f0f5cb9392e The potentially buggy things: 1. Why do cases 1 and 3 behave differently in Postgres 10 and later? Should they? 2. Could the change in behaviour from 9.6 to 10 be documented more clearly if this part is intentional? The release notes list: > Make json_populate_record() and related functions process JSON arrays > and objects recursively (Nikita Glukhov) > With this change, array-type fields in the destination SQL type are > properly converted from JSON arrays, and composite-type fields are > properly converted from JSON objects. Previously, such cases would > fail because the text representation of the JSON value would be fed > to array_in() or record_in(), and its syntax would not match what > those input functions expect. It seems likely these changes are involved, but it’s not clear this concrete effect was intended. 3. Could the documentation of the family of functions be extended to describe how embedded quoted and unquoted json fields are treated?
> On 30. May 2019, at 16:24, Robert Vollmert <rob@vllmrt.net> wrote: > > 1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json); > Postgres 10/11: > > 1. gives > > ERROR: invalid input syntax for type json > DETAIL: Token "key" is invalid. > CONTEXT: JSON data, line 1: "{“key… By now, I’m inclined to believe that this part is a bug in Postgres >= 10. Compare: # select ('{"a":"{\"b\":1}"}'::json)->'a' as a; a ------------- "{\"b\":1}" # select * from jsonb_to_record('{"a":"{\"b\":1}"}') as (a json); a ------------- "{\"b\":1}" # select * from json_to_record('{"a":"{\"b\":1}"}') as (a json); ERROR: invalid input syntax for type json DETAIL: Token "b" is invalid. CONTEXT: JSON data, line 1: "{"b... # select ('{"a":"{"b":1}"}'::json)->'a' as a; ERROR: invalid input syntax for type json LINE 1: select ('{"a":"{"b":1}"}'::json)->'a' as a; ^ DETAIL: Token "b" is invalid. CONTEXT: JSON data, line 1: {"a":"{"b… It seems that json_to_record is messing up the escaping of quotes, and that json_to_record should behave like jsonb_to_record here (interpreting the quoted field as a JSON string, and not parsing it as a JSON object as it did in version 9.6). Thoughts?
On Thu, May 30, 2019 at 10:41 AM Robert Vollmert <rob@vllmrt.net> wrote:
1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json);
...
Postgres 10/11:
1. gives
ERROR: invalid input syntax for type json
DETAIL: Token "key" is invalid.
CONTEXT: JSON data, line 1: "{“key…
This is caused by commit:
Author: Andrew Dunstan <andrew@dunslane.net>
Date: Thu Apr 6 22:11:21 2017 -0400
Make json_populate_record and friends operate recursively
As far as I can tell, this was not an intended change, so I agree it is probably a bug.
Cheers,
Jeff
On Sat, Jun 01, 2019 at 03:06:27PM -0400, Jeff Janes wrote: > This is caused by commit: > > commit cf35346e813e5a1373f308d397bb0a8f3f21d530 > Author: Andrew Dunstan <andrew@dunslane.net> > Date: Thu Apr 6 22:11:21 2017 -0400 > > Make json_populate_record and friends operate recursively > > > As far as I can tell, this was not an intended change, so I agree it is > probably a bug. Thanks for digging into this one. I have read through quickly at the conference, and indeed it smells like a bug. I'll try to look at that asap, which is not before a couple of days unfortunately. -- Michael
Вложения
Jeff Janes <jeff.janes@gmail.com> writes: > On Thu, May 30, 2019 at 10:41 AM Robert Vollmert <rob@vllmrt.net> wrote: >> 1. select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json); >> Postgres 10/11: >> 1. gives >> ERROR: invalid input syntax for type json >> DETAIL: Token "key" is invalid. >> CONTEXT: JSON data, line 1: "{"key... > This is caused by commit: > commit cf35346e813e5a1373f308d397bb0a8f3f21d530 > Author: Andrew Dunstan <andrew@dunslane.net> > Date: Thu Apr 6 22:11:21 2017 -0400 > > Make json_populate_record and friends operate recursively > As far as I can tell, this was not an intended change, so I agree it is > probably a bug. I agree. It looks to me like the problem is this over-optimistic assumption: /* * Add quotes around string value (should be already escaped) if * converting to json/jsonb. */ No, it's *not* already escaped. Fixing the code to use escape_json() is a bit tedious, because for some reason that function wasn't designed to support non-null-terminated input, but with the attached patch we get what seems to me like sane behavior: regression=# select * from json_to_record('{"out": "{\"key\": 1}"}') as (out json); out ---------------- "{\"key\": 1}" (1 row) regression=# select * from json_to_record('{"out": {"key": 1}}') as (out json); out ------------ {"key": 1} (1 row) i.e. what you get is either a string or an object, the same as it was in the input (same for all combinations of json/jsonb in/out). Not terribly surprisingly, this patch changes no existing regression test cases. We should add some, but I've not done so here. Also, modulo this bug for the json-input case, this *is* an intentional behavioral change from 9.6, but it seems fairly poorly documented to me. Should we try to improve that? regards, tom lane diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index 28bdc72..9e7035c 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -2803,26 +2803,7 @@ populate_scalar(ScalarIOData *io, Oid typid, int32 typmod, JsValue *jsv) json = jsv->val.json.str; Assert(json); - - /* already done the hard work in the json case */ - if ((typid == JSONOID || typid == JSONBOID) && - jsv->val.json.type == JSON_TOKEN_STRING) - { - /* - * Add quotes around string value (should be already escaped) if - * converting to json/jsonb. - */ - - if (len < 0) - len = strlen(json); - - str = palloc(len + sizeof(char) * 3); - str[0] = '"'; - memcpy(&str[1], json, len); - str[len + 1] = '"'; - str[len + 2] = '\0'; - } - else if (len >= 0) + if (len >= 0) { /* Need to copy non-null-terminated string */ str = palloc(len + 1 * sizeof(char)); @@ -2830,7 +2811,21 @@ populate_scalar(ScalarIOData *io, Oid typid, int32 typmod, JsValue *jsv) str[len] = '\0'; } else - str = json; /* null-terminated string */ + str = json; /* string is already null-terminated */ + + /* If converting to json/jsonb, make string into valid JSON literal */ + if ((typid == JSONOID || typid == JSONBOID) && + jsv->val.json.type == JSON_TOKEN_STRING) + { + StringInfoData buf; + + initStringInfo(&buf); + escape_json(&buf, str); + /* free temporary buffer */ + if (str != json) + pfree(str); + str = buf.data; + } } else {
I wrote: > I agree. It looks to me like the problem is this over-optimistic > assumption: > /* > * Add quotes around string value (should be already escaped) if > * converting to json/jsonb. > */ > No, it's *not* already escaped. Fixing the code to use escape_json() > is a bit tedious, because for some reason that function wasn't designed > to support non-null-terminated input, but with the attached patch we get > what seems to me like sane behavior: Hearing no comments, I've pushed this patch. I also rewrote the documentation to provide something approaching a specification for what json_to_record() and friends do. regards, tom lane
On 6/11/19 1:35 PM, Tom Lane wrote: > I wrote: >> I agree. It looks to me like the problem is this over-optimistic >> assumption: >> /* >> * Add quotes around string value (should be already escaped) if >> * converting to json/jsonb. >> */ >> No, it's *not* already escaped. Fixing the code to use escape_json() >> is a bit tedious, because for some reason that function wasn't designed >> to support non-null-terminated input, but with the attached patch we get >> what seems to me like sane behavior: > Hearing no comments, I've pushed this patch. I also rewrote the > documentation to provide something approaching a specification for > what json_to_record() and friends do. Sorry, I meant to tell you your fix looked good to me, it slipped through the net. Thanks for fixing this. cheers andrew