Re: jsonpath

Поиск
Список
Период
Сортировка
От Nikita Glukhov
Тема Re: jsonpath
Дата
Msg-id aae2a342-28ff-1801-1e8e-c80203001078@postgrespro.ru
обсуждение исходный текст
Ответ на Re: jsonpath  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Ответы Re: jsonpath
Список pgsql-hackers


On 17.03.2019 21:29, Jonathan S. Katz wrote:
On 3/17/19 1:14 PM, Alexander Korotkov wrote:
On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
On 3/17/19 1:02 PM, Alexander Korotkov wrote:
Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?
I just used "USING gin(col)" so jsonb_ops.
I see.  So, jsonb_ops extracts from this query only existence of
.length key.  And I can bet it exists in all (or almost all) the
documents.  Thus, optimizer thinks index might be useful, while it's
useless.  There is not much can be done while we don't have statistics
for jsonb (and access to it from GIN extract_query).  So, for now we
can just refuse from extracting only keys from jsonpath in jsonb_ops.
But I think it would be better to just document this issue.  In future
we should improve that with statistics.
That seems to make sense, especially given how I've typically stored
JSON documents in PostgreSQL. It sounds like this particular problem
would be solved appropriately with JSONB statistics.
GIN jsonb_ops extracts from query
 data @? '$.length ? (@ < 150)'

the same GIN entries as from queries
 data @? '$.length' data ? 'length'


If you don't want to extract entries from unsupported expressions, you can try 
to use another jsonpath operator @@. Queries will also look like a bit simpler:
 data @@ '$.length < 150' data @@ '$.content like_regex "^Start"' data @@ '$.content like_regex "risk" flag "i"'

All this queries emit no GIN entries. But note that
 data @@ '$ ? (@.content == "foo").length < 150'

emits the same entries as
 data @@ '$.content == "foo"'



We already have a POC implementation of jsonb statistics that was written
2 years ago.  I rebased it onto the current master yesterday.  If it is
interesting, you can find it on my GitHub [1].  But note, that there is 
no support for  jsonpath operators yet, only boolean EXISTS ?, ?|, ?&, and
CONTAINS @> operators are supported.  Also there is no docs, and it works
slowly  (a more effective storage method for statistics of individual JSON
paths is needed).

Also there is ability to calculate derived statistics of expressions like
 js -> 'x' -> 0 -> 'y' js #> '{x,0,y}'

using jsonb statistics for columns "js". So the selectivity of expressions
 js -> 'x' -> 0 -> 'y' = '123' js #> '{x,0,y}' >= '123'

also can be estimated (but these expressions can't be used by index on "js").


This topic deserves a separate discussion.  I hope, we will start the 
corresponding thread for PG13 after we find a more effective way of jsonb 
statistics storing.


[1] https://github.com/glukhovn/postgres/tree/jsonb_stats
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Rare SSL failures on eelpout
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: insensitive collations