Обсуждение: BUG #16828: duplicate results when using ** recursive expression in JSON path
BUG #16828: duplicate results when using ** recursive expression in JSON path
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16828 Logged by: Thomas Kellerer Email address: shammat@gmx.net PostgreSQL version: 13.1 Operating system: Windows 10, CentOS 8 Description: The following query select jsonb_path_query_array(col, '$.**.itemName') from ( values ('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb) ) as t(col) returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value only appears once in the JSON value. This also happens on Postgres 12.x
PG Bug reporting form <noreply@postgresql.org> writes: > The following query > select jsonb_path_query_array(col, '$.**.itemName') > from ( > values ('{"items": [{"itemName": "a", "items": [{"itemName": > "b"}]}]}'::jsonb) > ) as t(col) > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value > only appears once in the JSON value. jsonpath_exec.c is a muddle of undocumented code, but I think I found where the duplication is occurring: in executeAnyItem(), the first occurrence comes out while recursing down from the executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then the duplicate comes out while recursing down from the executeAnyItem call at line 1439. So I'd say that that logic for "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this code relying on cxt->ignoreStructuralErrors while the other half receives a passed-down flag? That seems like a recipe for bugs, especially with the lack of commentary about the reason for it.) Alexander, git blame says all this code is your fault ... regards, tom lane
Re: BUG #16828: duplicate results when using ** recursive expression in JSON path
От
Alexander Korotkov
Дата:
On Sun, Jan 17, 2021 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following query
>
> > select jsonb_path_query_array(col, '$.**.itemName')
> > from (
> > values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> > ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439. So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag? That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)
I think that's not just a bug in the code, but a high-level problem.
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following query
>
> > select jsonb_path_query_array(col, '$.**.itemName')
> > from (
> > values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> > ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439. So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag? That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)
I think that's not just a bug in the code, but a high-level problem.
** operator enumerates all the subdocuments.
# select * from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**');
jsonb_path_query
--------------------------------------------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
[{"items": [{"itemName": "b"}], "itemName": "a"}]
{"items": [{"itemName": "b"}], "itemName": "a"}
[{"itemName": "b"}]
{"itemName": "b"}
"b"
"a"
(7 rows)
After that, .itemName accessor is applied. But in the lax mode this accessor automatically unwraps the arrays.
# select subres, jsonb_path_query_array(subres, '$.itemName') from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**') subres;
subres | jsonb_path_query_array
--------------------------------------------------------------+------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
[{"items": [{"itemName": "b"}], "itemName": "a"}] | ["a"]
{"items": [{"itemName": "b"}], "itemName": "a"} | ["a"]
[{"itemName": "b"}] | ["b"]
{"itemName": "b"} | ["b"]
"b" | []
"a" | []
(7 rows)
So, everything works as designed, but the design is probably wrong. The issue here is that ** is our extension of the standard, and it works strangely in lax mode. But in strict mode everything looks OK.
# select * from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**');
jsonb_path_query
--------------------------------------------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
[{"items": [{"itemName": "b"}], "itemName": "a"}]
{"items": [{"itemName": "b"}], "itemName": "a"}
[{"itemName": "b"}]
{"itemName": "b"}
"b"
"a"
(7 rows)
After that, .itemName accessor is applied. But in the lax mode this accessor automatically unwraps the arrays.
# select subres, jsonb_path_query_array(subres, '$.itemName') from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**') subres;
subres | jsonb_path_query_array
--------------------------------------------------------------+------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
[{"items": [{"itemName": "b"}], "itemName": "a"}] | ["a"]
{"items": [{"itemName": "b"}], "itemName": "a"} | ["a"]
[{"itemName": "b"}] | ["b"]
{"itemName": "b"} | ["b"]
"b" | []
"a" | []
(7 rows)
So, everything works as designed, but the design is probably wrong. The issue here is that ** is our extension of the standard, and it works strangely in lax mode. But in strict mode everything looks OK.
# select jsonb_path_query_array('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, 'strict $.**.itemName');
jsonb_path_query_array
------------------------
["a", "b"]
(1 row)
Should we consider everything that comes after ** always in strict mode?
> Alexander, git blame says all this code is your fault ...
I do agree all faults in this code are mine. I do agree there are faults in this code. But I don't think *all* this code is fault :)
jsonb_path_query_array
------------------------
["a", "b"]
(1 row)
Should we consider everything that comes after ** always in strict mode?
> Alexander, git blame says all this code is your fault ...
I do agree all faults in this code are mine. I do agree there are faults in this code. But I don't think *all* this code is fault :)
I'll plan my time to improve the clarity of this code.
------
Regards,
Alexander Korotkov
------
Regards,
Alexander Korotkov