Обсуждение: Weird special case in jsonb_concat()

Поиск
Список
Период
Сортировка

Weird special case in jsonb_concat()

От
Tom Lane
Дата:
The discussion in [1] pointed out that the existing documentation
for the "jsonb || jsonb" concatenation operator is far short of
reality: it fails to acknowledge that the operator will accept
any cases other than two jsonb array inputs or two jsonb object
inputs.

I'd about concluded that other cases were handled as if by
wrapping non-array inputs in one-element arrays and then
proceeding as for two arrays.  That works for most scenarios, eg

regression=# select '[3]'::jsonb || '{}'::jsonb;
 ?column?
----------
 [3, {}]
(1 row)

regression=# select '3'::jsonb || '[]'::jsonb;
 ?column?
----------
 [3]
(1 row)

regression=# select '3'::jsonb || '4'::jsonb;
 ?column?
----------
 [3, 4]
(1 row)

However, further experimentation found a case that fails:

regression=# select '3'::jsonb || '{}'::jsonb;
ERROR:  invalid concatenation of jsonb objects

I wonder what is the point of this weird exception, and whether
whoever devised it can provide a concise explanation of what
they think the full behavior of "jsonb || jsonb" is.  Why isn't
'[3, {}]' a reasonable result here, if the cases above are OK?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/0d72b76d-ca2b-4263-8888-d6dfca861c51%40www.fastmail.com



Re: Weird special case in jsonb_concat()

От
Tom Lane
Дата:
I wrote:
> However, further experimentation found a case that fails:
> regression=# select '3'::jsonb || '{}'::jsonb;
> ERROR:  invalid concatenation of jsonb objects
> I wonder what is the point of this weird exception, and whether
> whoever devised it can provide a concise explanation of what
> they think the full behavior of "jsonb || jsonb" is.  Why isn't
> '[3, {}]' a reasonable result here, if the cases above are OK?

Here is a proposed patch for that.  It turns out that the third
else-branch in IteratorConcat() already does the right thing, if
we just remove its restrictive else-condition and let it handle
everything except the two-objects and two-arrays cases.  But it
seemed to me that trying to handle both the object || array
and array || object cases in that one else-branch was poorly
thought out: only one line of code can actually be shared, and it
took several extra lines of infrastructure to support the sharing.
So I split those cases into separate else-branches.

This also addresses the inadequate documentation that was the
original complaint.

Thoughts?  Should we back-patch this?  The existing behavior
seems to me to be inconsistent enough to be arguably a bug,
but we've not had field complaints saying "this should work".

            regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..5f3c393a25 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14715,8 +14715,12 @@ table2-mapping
        </para>
        <para>
         Concatenates two <type>jsonb</type> values.
-        Concatenating two objects generates an object with the union of their
+        Concatenating two arrays generates an array containing all the
+        elements of each input.  Concatenating two objects generates an
+        object containing the union of their
         keys, taking the second object's value when there are duplicate keys.
+        All other cases are treated by converting a non-array input into a
+        single-element array, and then proceeding as for two arrays.
         Does not operate recursively: only the top-level array or object
         structure is merged.
        </para>
@@ -14727,6 +14731,14 @@ table2-mapping
        <para>
         <literal>'{"a": "b"}'::jsonb || '{"c": "d"}'::jsonb</literal>
         <returnvalue>{"a": "b", "c": "d"}</returnvalue>
+       </para>
+       <para>
+        <literal>'[1, 2]'::jsonb || '3'::jsonb</literal>
+        <returnvalue>[1, 2, 3]</returnvalue>
+       </para>
+       <para>
+        <literal>'{"a": "b"}'::jsonb || '42'::jsonb</literal>
+        <returnvalue>[{"a": "b"}, 42]</returnvalue>
        </para></entry>
       </row>

diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 7a25415078..5beade00f4 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -4690,11 +4690,14 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,
     rk2 = JsonbIteratorNext(it2, &v2, false);

     /*
-     * Both elements are objects.
+     * JsonbIteratorNext reports raw scalars as if they were single-element
+     * arrays; hence we only need consider "object" and "array" cases here.
      */
     if (rk1 == WJB_BEGIN_OBJECT && rk2 == WJB_BEGIN_OBJECT)
     {
         /*
+         * Both elements are objects.
+         *
          * Append all the tokens from v1 to res, except last WJB_END_OBJECT
          * (because res will not be finished yet).
          */
@@ -4703,18 +4706,18 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,
             pushJsonbValue(state, r1, &v1);

         /*
-         * Append all the tokens from v2 to res, include last WJB_END_OBJECT
-         * (the concatenation will be completed).
+         * Append all the tokens from v2 to res, including last WJB_END_OBJECT
+         * (the concatenation will be completed).  Any duplicate keys will
+         * automatically override the value from the first object.
          */
         while ((r2 = JsonbIteratorNext(it2, &v2, true)) != WJB_DONE)
             res = pushJsonbValue(state, r2, r2 != WJB_END_OBJECT ? &v2 : NULL);
     }
-
-    /*
-     * Both elements are arrays (either can be scalar).
-     */
     else if (rk1 == WJB_BEGIN_ARRAY && rk2 == WJB_BEGIN_ARRAY)
     {
+        /*
+         * Both elements are arrays.
+         */
         pushJsonbValue(state, rk1, NULL);

         while ((r1 = JsonbIteratorNext(it1, &v1, true)) != WJB_END_ARRAY)
@@ -4731,46 +4734,40 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,

         res = pushJsonbValue(state, WJB_END_ARRAY, NULL /* signal to sort */ );
     }
-    /* have we got array || object or object || array? */
-    else if (((rk1 == WJB_BEGIN_ARRAY && !(*it1)->isScalar) && rk2 == WJB_BEGIN_OBJECT) ||
-             (rk1 == WJB_BEGIN_OBJECT && (rk2 == WJB_BEGIN_ARRAY && !(*it2)->isScalar)))
+    else if (rk1 == WJB_BEGIN_OBJECT)
     {
-        JsonbIterator **it_array = rk1 == WJB_BEGIN_ARRAY ? it1 : it2;
-        JsonbIterator **it_object = rk1 == WJB_BEGIN_OBJECT ? it1 : it2;
-        bool        prepend = (rk1 == WJB_BEGIN_OBJECT);
+        /*
+         * We have object || array.
+         */
+        Assert(rk2 == WJB_BEGIN_ARRAY);

         pushJsonbValue(state, WJB_BEGIN_ARRAY, NULL);

-        if (prepend)
-        {
-            pushJsonbValue(state, WJB_BEGIN_OBJECT, NULL);
-            while ((r1 = JsonbIteratorNext(it_object, &v1, true)) != WJB_DONE)
-                pushJsonbValue(state, r1, r1 != WJB_END_OBJECT ? &v1 : NULL);
-
-            while ((r2 = JsonbIteratorNext(it_array, &v2, true)) != WJB_DONE)
-                res = pushJsonbValue(state, r2, r2 != WJB_END_ARRAY ? &v2 : NULL);
-        }
-        else
-        {
-            while ((r1 = JsonbIteratorNext(it_array, &v1, true)) != WJB_END_ARRAY)
-                pushJsonbValue(state, r1, &v1);
-
-            pushJsonbValue(state, WJB_BEGIN_OBJECT, NULL);
-            while ((r2 = JsonbIteratorNext(it_object, &v2, true)) != WJB_DONE)
-                pushJsonbValue(state, r2, r2 != WJB_END_OBJECT ? &v2 : NULL);
+        pushJsonbValue(state, WJB_BEGIN_OBJECT, NULL);
+        while ((r1 = JsonbIteratorNext(it1, &v1, true)) != WJB_DONE)
+            pushJsonbValue(state, r1, r1 != WJB_END_OBJECT ? &v1 : NULL);

-            res = pushJsonbValue(state, WJB_END_ARRAY, NULL);
-        }
+        while ((r2 = JsonbIteratorNext(it2, &v2, true)) != WJB_DONE)
+            res = pushJsonbValue(state, r2, r2 != WJB_END_ARRAY ? &v2 : NULL);
     }
     else
     {
         /*
-         * This must be scalar || object or object || scalar, as that's all
-         * that's left. Both of these make no sense, so error out.
+         * We have array || object.
          */
-        ereport(ERROR,
-                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("invalid concatenation of jsonb objects")));
+        Assert(rk1 == WJB_BEGIN_ARRAY);
+        Assert(rk2 == WJB_BEGIN_OBJECT);
+
+        pushJsonbValue(state, WJB_BEGIN_ARRAY, NULL);
+
+        while ((r1 = JsonbIteratorNext(it1, &v1, true)) != WJB_END_ARRAY)
+            pushJsonbValue(state, r1, &v1);
+
+        pushJsonbValue(state, WJB_BEGIN_OBJECT, NULL);
+        while ((r2 = JsonbIteratorNext(it2, &v2, true)) != WJB_DONE)
+            pushJsonbValue(state, r2, r2 != WJB_END_OBJECT ? &v2 : NULL);
+
+        res = pushJsonbValue(state, WJB_END_ARRAY, NULL);
     }

     return res;
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index a70cd0b7c1..1e6c6ef200 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -4111,9 +4111,41 @@ select '{"a":"b"}'::jsonb || '[]'::jsonb;
 (1 row)

 select '"a"'::jsonb || '{"a":1}';
-ERROR:  invalid concatenation of jsonb objects
+    ?column?
+-----------------
+ ["a", {"a": 1}]
+(1 row)
+
 select '{"a":1}' || '"a"'::jsonb;
-ERROR:  invalid concatenation of jsonb objects
+    ?column?
+-----------------
+ [{"a": 1}, "a"]
+(1 row)
+
+select '[3]'::jsonb || '{}'::jsonb;
+ ?column?
+----------
+ [3, {}]
+(1 row)
+
+select '3'::jsonb || '[]'::jsonb;
+ ?column?
+----------
+ [3]
+(1 row)
+
+select '3'::jsonb || '4'::jsonb;
+ ?column?
+----------
+ [3, 4]
+(1 row)
+
+select '3'::jsonb || '{}'::jsonb;
+ ?column?
+----------
+ [3, {}]
+(1 row)
+
 select '["a", "b"]'::jsonb || '{"c":1}';
        ?column?
 ----------------------
diff --git a/src/test/regress/sql/jsonb.sql b/src/test/regress/sql/jsonb.sql
index 3e2b8f66df..b6409767f6 100644
--- a/src/test/regress/sql/jsonb.sql
+++ b/src/test/regress/sql/jsonb.sql
@@ -1056,6 +1056,11 @@ select '{"a":"b"}'::jsonb || '[]'::jsonb;
 select '"a"'::jsonb || '{"a":1}';
 select '{"a":1}' || '"a"'::jsonb;

+select '[3]'::jsonb || '{}'::jsonb;
+select '3'::jsonb || '[]'::jsonb;
+select '3'::jsonb || '4'::jsonb;
+select '3'::jsonb || '{}'::jsonb;
+
 select '["a", "b"]'::jsonb || '{"c":1}';
 select '{"c": 1}'::jsonb || '["a", "b"]';


Re: Weird special case in jsonb_concat()

От
Pavel Stehule
Дата:


so 19. 12. 2020 v 21:35 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
I wrote:
> However, further experimentation found a case that fails:
> regression=# select '3'::jsonb || '{}'::jsonb;
> ERROR:  invalid concatenation of jsonb objects
> I wonder what is the point of this weird exception, and whether
> whoever devised it can provide a concise explanation of what
> they think the full behavior of "jsonb || jsonb" is.  Why isn't
> '[3, {}]' a reasonable result here, if the cases above are OK?

Here is a proposed patch for that.  It turns out that the third
else-branch in IteratorConcat() already does the right thing, if
we just remove its restrictive else-condition and let it handle
everything except the two-objects and two-arrays cases.  But it
seemed to me that trying to handle both the object || array
and array || object cases in that one else-branch was poorly
thought out: only one line of code can actually be shared, and it
took several extra lines of infrastructure to support the sharing.
So I split those cases into separate else-branches.

This also addresses the inadequate documentation that was the
original complaint.

Thoughts?  Should we back-patch this?  The existing behavior
seems to me to be inconsistent enough to be arguably a bug,
but we've not had field complaints saying "this should work".

+1

Pavel


                        regards, tom lane

Re: Weird special case in jsonb_concat()

От
"Joel Jacobson"
Дата:
On Sat, Dec 19, 2020, at 21:35, Tom Lane wrote:
>Here is a proposed patch for that.

I've tested the patch and "All 202 tests passed".

In addition, I've tested it on a json intensive project,
which passes all its own tests.

I haven't studied the jsonfuncs.c code in detail,
but the new code looks much cleaner, nice.

>This also addresses the inadequate documentation that was the
>original complaint.

Looks good.

In addition, to the user wondering how to append a json array-value "as is",
I think it would be useful to provide an example on how to do this
in the documentation.

I think there is a risk users will attempt much more fragile
hacks to achieve this, if we don't provide guidance
in the documentation.

Suggestion:

         <literal>'["a", "b"]'::jsonb || '["a", "d"]'::jsonb</literal>
         <returnvalue>["a", "b", "a", "d"]</returnvalue>
        </para>
+       <para>
+        <literal>'["a", "b"]'::jsonb || jsonb_build_array('["a", "d"]'::jsonb)</literal>
+        <returnvalue>["a", "b", ["a", "d"]]</returnvalue>
+       </para>
        <para>
         <literal>'{"a": "b"}'::jsonb || '{"c": "d"}'::jsonb</literal>
         <returnvalue>{"a": "b", "c": "d"}</returnvalue>

> Thoughts?  Should we back-patch this?  The existing behavior
> seems to me to be inconsistent enough to be arguably a bug,
> but we've not had field complaints saying "this should work".

+1 back-patch, I think it's a bug.

Best regards,

Joel

Re: Weird special case in jsonb_concat()

От
Zhihong Yu
Дата:
Hi,
w.r.t. the patch,

+select '[3]'::jsonb || '{}'::jsonb;
+ ?column?
+----------
+ [3, {}]
+(1 row)
+
+select '3'::jsonb || '[]'::jsonb;

Should cases where the empty array precedes non-empty jsonb be added ?

select '[]'::jsonb || '3'::jsonb;
select '{}'::jsonb || '[3]'::jsonb;

Cheers

Re: Weird special case in jsonb_concat()

От
Tom Lane
Дата:
"Joel Jacobson" <joel@compiler.org> writes:
> On Sat, Dec 19, 2020, at 21:35, Tom Lane wrote:
>> Here is a proposed patch for that.

> In addition, to the user wondering how to append a json array-value "as is",
> I think it would be useful to provide an example on how to do this
> in the documentation.

Done in v13 and HEAD; the older table format doesn't really have room
for more examples.

> +1 back-patch, I think it's a bug.

I'm not quite sure it's a bug, but it does seem like fairly unhelpful
behavior to throw an error instead of doing something useful, so
back-patched.

            regards, tom lane