JSON Path and GIN Questions

Поиск
Список
Период
Сортировка
От David E. Wheeler
Тема JSON Path and GIN Questions
Дата
Msg-id 15DD78A5-B5C4-4332-ACFE-55723259C07F@justatheory.com
обсуждение исходный текст
Ответы Re: JSON Path and GIN Questions  (Erik Wienhold <ewie@ewie.name>)
Список pgsql-hackers
Greetings Hackers,

Been a while! I’m working on some experiments with JSONB columns and GIN indexes, and have operated on the assumption
thatJSON Path operations would take advantage of GIN indexes, with json_path_ops as a nice optimization. But I’ve run
intowhat appear to be some inconsistencies and oddities I’m hoping to figure out with your help. 

For the examples in this email, I’m using this simple table:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed
"s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;

That gives me a simple table with around 3600 rows. Not a lot of data, but hopefully enough to demonstrate the issues.

Issue 1: @@ vs @?
-----------------

I have been confused as to the difference between @@ vs @?: Why do these return different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
 id
----
(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
 id
----
 10
(1 row)

I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), and from the suggestion I got
there,it seems that @@ expects a boolean to be returned by the path query, while @? wraps it in an implicit exists().
Isthat right? 

If so, I’d like to submit a patch to the docs talking about this, and suggesting the use of jsonb_path_query() to test
pathsto see if they return a boolean or not. 


Issue 2: @? Index Use
---------------------

From Oleg’s (happy belated birthday!) notes
(https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):


> Operators @? and @@ are interchangeable:
>
>     js @? '$.a' <=> js @@ 'exists($.a)’
>     js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’

For the purposes of the above example, this appears to hold true: if I wrap the path query in exists(), @@ returns a
result:

david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life Rescue"))';
 id
----
 10
(1 row)

Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN index while @? does not.

Or, no, fiddling with it again just now, I think I have still been confusing these operators! @@ was using the index
withan an explicit exists(), but @? was not…because I was still using an explicit exists. 

In other words:

* @@ 'exists($ ?($.year == 1944))'  Uses the index
* @? '$ ?(@.year == 1944)'          Uses the index
* @? 'exists($ ?($.year == 1944))'  Does not use the index

That last one presumably doesn’t work, because there is an implicit exists() around the exists(), making it
`exists(exists($?($.year == 1944)))`, which returns true for every row  (true and false both exists)! 🤦🏻‍♂️. 

Anyway, if I have this right, I’d like to flesh out the docs a bit.

Issue 3: Index Use for Comparison
---------------------------------

From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path
querywould be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage
ofthe GIN index, apparently the >= operator cannot: 

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
   Rows Removed by Filter: 36081
 Planning Time: 1.864 ms
 Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of
paths,which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is
thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on
`movie-> 'year'`? 

Thanks your your patience with my questions!

Best,

David


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?
Следующее
От: Gurjeet Singh
Дата:
Сообщение: Re: Document that server will start even if it's unable to open some TCP/IP ports