Re: jsonpath

Поиск
Список
Период
Сортировка
От Jonathan S. Katz
Тема Re: jsonpath
Дата
Msg-id 04981bad-2156-a927-b04c-50ff2c9709ae@postgresql.org
обсуждение исходный текст
Ответ на Re: jsonpath  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: jsonpath
Список pgsql-hackers
On 3/17/19 3:13 AM, Alexander Korotkov wrote:
> On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov <a.korotkov@postgrespro.ru> napsal:
>>>
>>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
>>> <a.korotkov@postgrespro.ru> wrote:
>>>> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
>>>> <a.korotkov@postgrespro.ru> wrote:
>>>>> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>>>>>> Attached 36th version of the patches.
>>>>>
>>>>> Thank yo for the revision!
>>>>>
>>>>> In the attached revision following changes are made:
>>>>>
>>>>>> "unknown" refers here to ordinary three-valued logical Unknown, which is
>>>>>> represented in SQL by NULL.
>>>>>>
>>>>>> JSON path expressions return sequences of SQL/JSON items, which are defined by
>>>>>> SQL/JSON data model.  But JSON path predicates (logical expressions), which are
>>>>>> used in filters, return three-valued logical values: False, True, or Unknown.
>>>>>
>>>>>  * I've added short explanation of this to the documentation.
>>>>>  * Removed no longer present data structures from typedefs.list of the
>>>>> first patch.
>>>>>  * Moved GIN support patch to number 3.  Seems to be well-isolated and
>>>>> not very complex patch.  I propose to consider this to 12 too.  I
>>>>> added high-level comment there, commit message and made some code
>>>>> beautification.
>>>>
>>>> I think patches 1 and 2 are in committable shape (I reached Tomas
>>>> off-list, he doesn't have more notes regarding them).  While patch 3
>>>> requires more review.
>>>>
>>>> I'm going to push 1 and 2 if no objections.
>>>
>>> So, pushed.  Many thanks to reviewers and authors!
>>>
>>> Remaining part I'm proposing for 12 is attached.  I appreciate review of it.
>>
>>
>> I tested this patch and I didn't  find any issue - just I tested basic functionality and regress tests.
>>
>> looks well
>
> Thank you for your feedback!

Like Pavel, I did some basic testing of the patch (on current HEAD
4178d8b91c) trying out various JSON path expressions, and yes, it all
worked. I had a brief scare while testing on 4178d8b91c where initdb was
failing on the bootstrapping step, but after doing a thorough wipe of
build files and my output directory, it seems to be initializing okay.

I also did some testing of the GIN patch upthread, as the quickness of
retrieval of the data using JSON path is of course important as well.
Using a schema roughly like this:

CREATE TABLE news_feed (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    data jsonb NOT NULL
);
CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);

I loaded in a data set of roughly 420,000 rows. Each row had all the
same keys but differing values (e.g. "length" and "content" as keys)

I tested a few different JSON path scenarios. Some of the index scans
performed way better than the equivalent sequential scans, for instance:

SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ == 200)';

SELECT *
FROM news_feed
WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';

Using the index outperformed the sequential scan (and parallel seq scan)
by ~10-100x based on my config + laptop hardware!

However, when I did something a little more complex, like the below:

SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ < 150)';

SELECT count(*)
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "^Start")';

SELECT id, jsonb_path_query(data, '$.content')
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';

I would find that the index scan performed as well as the sequential
scan. Additionally, on my laptop, the parallel sequential scan would
beat the index scan by ~2.5x in some cases.

Reading up on what the GIN patch does, this all makes sense: it's
optimized for equality, I understand there are challenges to be able to
handle inequality, regex exps, etc. And the cases where it really does
work well, it's _incredibly_ fast.

My suggestion would be adding some additional guidance in the user
documentation around how GIN works with the @@ and @? operators so they
can understand where GIN will work very well with JSON path + their data
and not be surprised when other types of JSON path queries are
performing on par with a sequential scan (or worse than a parallel seq
scan).

Thanks,

Jonathan


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: jsonpath
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: jsonpath