Re: BUG #16828: duplicate results when using ** recursive expression in JSON path

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: BUG #16828: duplicate results when using ** recursive expression in JSON path
Дата
Msg-id CAPpHfdtGcnt8p7pRi6s6qEszqQbQkDjq5pWQGRh_EDExuTW0QA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16828: duplicate results when using ** recursive expression in JSON path  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Sun, Jan 17, 2021 at 10:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following query
>
> >   select jsonb_path_query_array(col, '$.**.itemName')
> >   from (
> >     values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> >   ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439.  So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink.  (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag?  That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)

I think that's not just a bug in the code, but a high-level problem.

** operator enumerates all the subdocuments.

# select * from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**');
                       jsonb_path_query
--------------------------------------------------------------
 {"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
 [{"items": [{"itemName": "b"}], "itemName": "a"}]
 {"items": [{"itemName": "b"}], "itemName": "a"}
 [{"itemName": "b"}]
 {"itemName": "b"}
 "b"
 "a"
(7 rows)

After that, .itemName accessor is applied.  But in the lax mode this accessor automatically unwraps the arrays.

# select subres, jsonb_path_query_array(subres, '$.itemName') from jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, '$.**') subres;
                            subres                            | jsonb_path_query_array
--------------------------------------------------------------+------------------------
 {"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
 [{"items": [{"itemName": "b"}], "itemName": "a"}]            | ["a"]
 {"items": [{"itemName": "b"}], "itemName": "a"}              | ["a"]
 [{"itemName": "b"}]                                          | ["b"]
 {"itemName": "b"}                                            | ["b"]
 "b"                                                          | []
 "a"                                                          | []
(7 rows)

So, everything works as designed, but the design is probably wrong.  The issue here is that ** is our extension of the standard, and it works strangely in lax mode.  But in strict mode everything looks OK.

#  select jsonb_path_query_array('{"items": [{"itemName": "a", "items": [{"itemName": "b"}]}]}'::jsonb, 'strict $.**.itemName');
 jsonb_path_query_array
------------------------
 ["a", "b"]
(1 row)

Should we consider everything that comes after ** always in strict mode?

> Alexander, git blame says all this code is your fault ...

I do agree all faults in this code are mine.  I do agree there are faults in this code.  But I don't think *all* this code is fault :)

I'll plan my time to improve the clarity of this code.

------
Regards,
Alexander Korotkov

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16828: duplicate results when using ** recursive expression in JSON path
Следующее
От: Rupert Gallagher
Дата:
Сообщение: Re: BUG #16823: Unreachable code