Re: BUG #16828: duplicate results when using ** recursive expression in JSON path
От | Alexander Korotkov |
---|---|
Тема | Re: BUG #16828: duplicate results when using ** recursive expression in JSON path |
Дата | |
Msg-id | CAPpHfdtGcnt8p7pRi6s6qEszqQbQkDjq5pWQGRh_EDExuTW0QA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #16828: duplicate results when using ** recursive expression in JSON path (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On Sun, Jan 17, 2021 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following query
>
> > select jsonb_path_query_array(col, '$.**.itemName')
> > from (
> > values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> > ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439. So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag? That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)
I think that's not just a bug in the code, but a high-level problem.
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following query
>
> > select jsonb_path_query_array(col, '$.**.itemName')
> > from (
> > values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> > ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439. So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag? That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)
I think that's not just a bug in the code, but a high-level problem.
** operator enumerates all the subdocuments.
# select * from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**');
jsonb_path_query
--------------------------------------------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
[{"items": [{"itemName": "b"}], "itemName": "a"}]
{"items": [{"itemName": "b"}], "itemName": "a"}
[{"itemName": "b"}]
{"itemName": "b"}
"b"
"a"
(7 rows)
After that, .itemName accessor is applied. But in the lax mode this accessor automatically unwraps the arrays.
# select subres, jsonb_path_query_array(subres, '$.itemName') from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**') subres;
subres | jsonb_path_query_array
--------------------------------------------------------------+------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
[{"items": [{"itemName": "b"}], "itemName": "a"}] | ["a"]
{"items": [{"itemName": "b"}], "itemName": "a"} | ["a"]
[{"itemName": "b"}] | ["b"]
{"itemName": "b"} | ["b"]
"b" | []
"a" | []
(7 rows)
So, everything works as designed, but the design is probably wrong. The issue here is that ** is our extension of the standard, and it works strangely in lax mode. But in strict mode everything looks OK.
# select * from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**');
jsonb_path_query
--------------------------------------------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
[{"items": [{"itemName": "b"}], "itemName": "a"}]
{"items": [{"itemName": "b"}], "itemName": "a"}
[{"itemName": "b"}]
{"itemName": "b"}
"b"
"a"
(7 rows)
After that, .itemName accessor is applied. But in the lax mode this accessor automatically unwraps the arrays.
# select subres, jsonb_path_query_array(subres, '$.itemName') from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**') subres;
subres | jsonb_path_query_array
--------------------------------------------------------------+------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
[{"items": [{"itemName": "b"}], "itemName": "a"}] | ["a"]
{"items": [{"itemName": "b"}], "itemName": "a"} | ["a"]
[{"itemName": "b"}] | ["b"]
{"itemName": "b"} | ["b"]
"b" | []
"a" | []
(7 rows)
So, everything works as designed, but the design is probably wrong. The issue here is that ** is our extension of the standard, and it works strangely in lax mode. But in strict mode everything looks OK.
# select jsonb_path_query_array('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, 'strict $.**.itemName');
jsonb_path_query_array
------------------------
["a", "b"]
(1 row)
Should we consider everything that comes after ** always in strict mode?
> Alexander, git blame says all this code is your fault ...
I do agree all faults in this code are mine. I do agree there are faults in this code. But I don't think *all* this code is fault :)
jsonb_path_query_array
------------------------
["a", "b"]
(1 row)
Should we consider everything that comes after ** always in strict mode?
> Alexander, git blame says all this code is your fault ...
I do agree all faults in this code are mine. I do agree there are faults in this code. But I don't think *all* this code is fault :)
I'll plan my time to improve the clarity of this code.
------
Regards,
Alexander Korotkov
------
Regards,
Alexander Korotkov
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #16828: duplicate results when using ** recursive expression in JSON path