Re: jsonpath

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: jsonpath
Дата
Msg-id CAPpHfdtYvL+h5W7NYPVf-CWfVbMyqytNu-jV1xUhbPM6rnN8Xg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: jsonpath  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Ответы Re: jsonpath
Список pgsql-hackers
Hi!

On Sun, Mar 17, 2019 at 7:46 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> Like Pavel, I did some basic testing of the patch (on current HEAD
> 4178d8b91c) trying out various JSON path expressions, and yes, it all
> worked. I had a brief scare while testing on 4178d8b91c where initdb was
> failing on the bootstrapping step, but after doing a thorough wipe of
> build files and my output directory, it seems to be initializing okay.
>
> I also did some testing of the GIN patch upthread, as the quickness of
> retrieval of the data using JSON path is of course important as well.

Thank you very much for testing!

> Using a schema roughly like this:
>
> CREATE TABLE news_feed (
>     id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
>     data jsonb NOT NULL
> );
> CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);
>
> I loaded in a data set of roughly 420,000 rows. Each row had all the
> same keys but differing values (e.g. "length" and "content" as keys)
>
> I tested a few different JSON path scenarios. Some of the index scans
> performed way better than the equivalent sequential scans, for instance:
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.length ? (@ == 200)';
>
> SELECT *
> FROM news_feed
> WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';
>
> Using the index outperformed the sequential scan (and parallel seq scan)
> by ~10-100x based on my config + laptop hardware!

Great!

> However, when I did something a little more complex, like the below:
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.length ? (@ < 150)';
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.content ? (@ like_regex "^Start")';
>
> SELECT id, jsonb_path_query(data, '$.content')
> FROM news_feed
> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
>
> I would find that the index scan performed as well as the sequential
> scan. Additionally, on my laptop, the parallel sequential scan would
> beat the index scan by ~2.5x in some cases.

Yeah, this cases are not supported.  Did optimizer automatically
select sequential scan in this case (if not touching enable_*
variables)?  It should, because optimizer understands that GIN scan
will be bad if extract_query method failed to extract anything.

> Reading up on what the GIN patch does, this all makes sense: it's
> optimized for equality, I understand there are challenges to be able to
> handle inequality, regex exps, etc. And the cases where it really does
> work well, it's _incredibly_ fast.

Yes, for more complex cases, we need different opclasses.  For
instance, we can consider porting jsquery opclasses to PG 13.  And it
become even more important to get parametrized opclasses, because we
don't necessary want to index all the json fields in this same way.
That's another challenge for future releases.  But what we have now is
just support for some of jsonpathes for existing opclasses.

> My suggestion would be adding some additional guidance in the user
> documentation around how GIN works with the @@ and @? operators so they
> can understand where GIN will work very well with JSON path + their data
> and not be surprised when other types of JSON path queries are
> performing on par with a sequential scan (or worse than a parallel seq
> scan).

Good point.  Will do.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: jsonpath
Следующее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: jsonpath