Re: Unable to make use of "deep" JSONB index

Поиск
Список
Период
Сортировка
От Shaheed Haque
Тема Re: Unable to make use of "deep" JSONB index
Дата
Msg-id CAHAc2jfQvVMUKQ4rZ16G3Nsn+Rn-5_NFabzXnJ+q41OXmH+pOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unable to make use of "deep" JSONB index  (Erik Rijkers <er@xs4all.nl>)
Ответы Re: Unable to make use of "deep" JSONB index  (Jeff Janes <jeff.janes@gmail.com>)
Re: Unable to make use of "deep" JSONB index  (Erik Rijkers <er@xs4all.nl>)
Список pgsql-bugs
(Resend, wrong version was sent before)

Thanks Erik. Is the point that the index has to be on the JSON field
as a whole (i.e. "snapshot") rather than deep inside it (e.g.
"snapshot.something.further[down]")?

In my case, the snapshot is several MB in size (perhaps 10MB or even
20MB), dominated by the snapshot.employee (cardinality 10k, each sized
as a dict 1-2kB as text). My expectation/guess is that an index of
"snapshot" will itself be of a size of similar order. However the
design as-is works very well except for this one case where to speed
it up, in principle, the index need contain no more than one boolean
per employee. So that's what I'd like to achieve, if possible.

I've seen no hint in the documentation that creating the index on
"snapshot.something.further[down]" should not work, and PG certainly
allows it to be created. Also, Tom has suggested that I should not
look to some magical ability to infer the use of the index from a
differently structured query, and I've taken that on board with the
updated query + index.

AFAIK, there are 3 possibilities:

- I've not done things right, in which case I'd love to know my mistake.
- It is not supposed to work, in which case it would be good to have
that stated, and maybe have PG not allow useless indices to be
created.
- It is a bug.

All input much appreciated,

Thanks, Shaheed


On Sun, 12 Jun 2022 at 11:53, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 12-06-2022 om 11:34 schreef Shaheed Haque:
> > OK, I have corrected and simplified the test case (including switching
> > to a btree index). The WHERE clause and the inex now look like this:
> >
> >      ...WHERE         ((snapshot -> 'employee' -> '999' ->>
> > 'pay_graph')::integer != 0);
> >      ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
> > 'pay_graph')::integer != 0));
> >
> > But the index is still not being used (test case below). I have
> > confirmed that the equality operator is listed for "search" (I assume
> > inequality is the same as equality, but I tried both):
> >
> > # \dAo btree jsonb*
> >              List of operators of operator families
> >   AM   | Operator family |    Operator     | Strategy | Purpose
> > -------+-----------------+-----------------+----------+---------
> > btree | jsonb_ops       | <(jsonb,jsonb)  |        1 | search
> > btree | jsonb_ops       | <=(jsonb,jsonb) |        2 | search
> > btree | jsonb_ops       | =(jsonb,jsonb)  |        3 | search
> > btree | jsonb_ops       | >=(jsonb,jsonb) |        4 | search
> > btree | jsonb_ops       | >(jsonb,jsonb)  |        5 | search
> > (5 rows)
> >
> > If this is not a bug, then how should the query or the index be
> > changed to make this work?
> >
> >
> > === begin test case ===
> >
> > CREATE TABLE payrun (
> >      id serial primary key,
> >      snapshot JSONB
> > );
> >
> > INSERT INTO payrun(snapshot)
> > VALUES
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
> > false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
> > true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"998": {"id": 998, "state": {"employment":
> > [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
> > false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
> > ;
> >
> > SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
> > WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
> > 0);
> >
> > --
> > -- Create index designed to match the query.
> > --
> > CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
> > '$.*' ->> 'pay_graph')::integer != 0));
> >
> > set enable_seqscan = OFF;
> >
>
> How is this?
>
> I took the triple condition from your earlier email.
> I did not use your index.
> I added one index using gin jsonb_path_ops.
>
> create index payrun_jspathop_idx ON payrun using gin (snapshot
> jsonb_path_ops);
> set enable_seqscan = OFF;
> \timing on
>      select id, snapshot #>'{employee,999,state,employment}' from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>   id |         ?column?
> ----+---------------------------
>    2 | [["1970-01-01", null, 3]]
> (1 row)
>
> Time: 0.897 ms
> explain analyze  select id, snapshot #>'{employee,999,state,employment}'
> from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>
>         QUERY PLAN
>
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on payrun  (cost=136.00..140.02 rows=1 width=36)
> (actual time=0.018..0.019 rows=1 loops=1)
>     Recheck Cond: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0
> || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>     Rows Removed by Index Recheck: 2
>     Heap Blocks: exact=1
>     ->  Bitmap Index Scan on payrun_jspathop_idx  (cost=0.00..136.00
> rows=1 width=0) (actual time=0.007..0.007 rows=3 loops=1)
>           Index Cond: (snapshot @? '$."employee"."999"?((@."pay_graph"
> != 0 || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>   Planning Time: 0.034 ms
>   Execution Time: 0.033 ms
> (8 rows)
>
> Time: 0.284 ms
>
>
> hope that helps.
>
> Erik Rijkers



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Shaheed Haque
Дата:
Сообщение: Re: Unable to make use of "deep" JSONB index
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Unable to make use of "deep" JSONB index