Re: jsonpath
От | Jonathan S. Katz |
---|---|
Тема | Re: jsonpath |
Дата | |
Msg-id | 04981bad-2156-a927-b04c-50ff2c9709ae@postgresql.org обсуждение исходный текст |
Ответ на | Re: jsonpath (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Ответы |
Re: jsonpath
|
Список | pgsql-hackers |
On 3/17/19 3:13 AM, Alexander Korotkov wrote: > On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal: >>> >>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov >>> <a.korotkov@postgrespro.ru> wrote: >>>> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov >>>> <a.korotkov@postgrespro.ru> wrote: >>>>> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: >>>>>> Attached 36th version of the patches. >>>>> >>>>> Thank yo for the revision! >>>>> >>>>> In the attached revision following changes are made: >>>>> >>>>>> "unknown" refers here to ordinary three-valued logical Unknown, which is >>>>>> represented in SQL by NULL. >>>>>> >>>>>> JSON path expressions return sequences of SQL/JSON items, which are defined by >>>>>> SQL/JSON data model. But JSON path predicates (logical expressions), which are >>>>>> used in filters, return three-valued logical values: False, True, or Unknown. >>>>> >>>>> * I've added short explanation of this to the documentation. >>>>> * Removed no longer present data structures from typedefs.list of the >>>>> first patch. >>>>> * Moved GIN support patch to number 3. Seems to be well-isolated and >>>>> not very complex patch. I propose to consider this to 12 too. I >>>>> added high-level comment there, commit message and made some code >>>>> beautification. >>>> >>>> I think patches 1 and 2 are in committable shape (I reached Tomas >>>> off-list, he doesn't have more notes regarding them). While patch 3 >>>> requires more review. >>>> >>>> I'm going to push 1 and 2 if no objections. >>> >>> So, pushed. Many thanks to reviewers and authors! >>> >>> Remaining part I'm proposing for 12 is attached. I appreciate review of it. >> >> >> I tested this patch and I didn't find any issue - just I tested basic functionality and regress tests. >> >> looks well > > Thank you for your feedback! Like Pavel, I did some basic testing of the patch (on current HEAD 4178d8b91c) trying out various JSON path expressions, and yes, it all worked. I had a brief scare while testing on 4178d8b91c where initdb was failing on the bootstrapping step, but after doing a thorough wipe of build files and my output directory, it seems to be initializing okay. I also did some testing of the GIN patch upthread, as the quickness of retrieval of the data using JSON path is of course important as well. Using a schema roughly like this: CREATE TABLE news_feed ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, data jsonb NOT NULL ); CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data); I loaded in a data set of roughly 420,000 rows. Each row had all the same keys but differing values (e.g. "length" and "content" as keys) I tested a few different JSON path scenarios. Some of the index scans performed way better than the equivalent sequential scans, for instance: SELECT count(*) FROM news_feed WHERE data @? '$.length ? (@ == 200)'; SELECT * FROM news_feed WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")'; Using the index outperformed the sequential scan (and parallel seq scan) by ~10-100x based on my config + laptop hardware! However, when I did something a little more complex, like the below: SELECT count(*) FROM news_feed WHERE data @? '$.length ? (@ < 150)'; SELECT count(*) FROM news_feed WHERE data @? '$.content ? (@ like_regex "^Start")'; SELECT id, jsonb_path_query(data, '$.content') FROM news_feed WHERE data @? '$.content ? (@ like_regex "risk" flag "i")'; I would find that the index scan performed as well as the sequential scan. Additionally, on my laptop, the parallel sequential scan would beat the index scan by ~2.5x in some cases. Reading up on what the GIN patch does, this all makes sense: it's optimized for equality, I understand there are challenges to be able to handle inequality, regex exps, etc. And the cases where it really does work well, it's _incredibly_ fast. My suggestion would be adding some additional guidance in the user documentation around how GIN works with the @@ and @? operators so they can understand where GIN will work very well with JSON path + their data and not be surprised when other types of JSON path queries are performing on par with a sequential scan (or worse than a parallel seq scan). Thanks, Jonathan
Вложения
В списке pgsql-hackers по дате отправления: