Обсуждение: Unable to make use of "deep" JSONB index

Поиск
Список
Период
Сортировка

Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
Hi,

I'm having trouble using an index on the "deep" innards of a JSONB field in that the matching query it is trying to accelerate never uses it. I did get some advice on a simplified version of the problem at [1], but the actual problem remains the same in that "EXPLAIN ANALYZE" never refers to the index. Here are the details including a test case below...

1. The JSONB can be several MB in size. This works fine for all but one access pattern.
2. The JSON in the problem use case looks like this:

{
  "...stuff...": ...
  "employee": {
    "999": {"id": 999, "integer attribute": 0, "boolean-may-be-missing": true, "state": {
      "nested-list": [[], [], ...]
    }
  }
}

3. As per the discussion at [2], using a SELECT with a WHERE on the 3 attributes of interest ("integer attribute", the "boolean-may-be-missing" and "nested-list") incurs a significant overheard which suggests that the JSONB storage is being accessed 3 times. In order to optimise for this case, I constructed a query using the jsonpath support which seems to successfully avoid the triple-fetch by keeping the logic inside the jsonpath query like this:

WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 || @.last_run_of_employment
== true || @.state.employment[last][2] == 0)')

4. Then I created an index "matching" this query.
5. According to EXPLAIN ANALYSE, the index is never used.


=== version and platform ===

Version: PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
Platform: Ubuntu 22.04 (Jammy), using OS-supplied build

=== test case ===
CREATE TABLE payrun (
    id serial primary key,
    snapshot JSONB
);

INSERT INTO payrun(snapshot)
VALUES
  ('{"employee": {"999": {"id": 999, "state": {"employment": [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment": false}, "111": {"id": 111, "state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
  ('{"employee": {"999": {"id": 999, "state": {"employment": [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment": true}, "222": {"id": 222, "state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
  ('{"employee": {"998": {"id": 998, "state": {"employment": [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment": false}, "333": {"id": 333, "state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
;

SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)');

--
-- Create index designed to match the query.
--
create index idx1 on payrun using gin ((snapshot->'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)'));

set enable_seqscan = OFF;

--
-- EXPLAIN ANALYZE ...query above...
--
explain analyse SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 || @.last_run_of_employment
== true || @.state.employment[last][2] == 0)');
                                                                        QUERY PLAN                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on payrun  (cost=10000000000.00..10000000001.04 rows=1 width=36) (actual time=0.040..0.042 rows=1 loops=1)
  Filter: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0 || @."last_run_of_employment" == true) || @."state"."employment"[last][2] == 0)'::jsonpath)
  Rows Removed by Filter: 2
Planning Time: 0.883 ms
Execution Time: 0.078 ms
(5 rows)



=== end test case ===

The expected result is that with enable_seqscan = OFF, the index should be used, but instead a sequential scan is reported as above. The same happens without enable_seqscan = OFF on
a moderately large test set of over 2000 rows (with extended run times circa 10+ seconds on my hardware).

I have tried the same with similar results on PG13.

Have I constructed the index incorrectly, or is there some other way to convince the query to use it?

Re: Unable to make use of "deep" JSONB index

От
Tom Lane
Дата:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> -- Create index designed to match the query.
> --
> create index idx1 on payrun using gin ((snapshot->'$.employee.* ?
> (@.pay_graph <> 0 || @.last_run_of_employment == true ||
> @.state.employment[last][2] == 0)'));

But that doesn't match the query; it's not even the same topmost
operator:

> explain analyse SELECT id,snapshot #>'{employee,999,state,employment}' FROM
> "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 ||
> @.last_run_of_employment
> == true || @.state.employment[last][2] == 0)');

In general you seem to have much too high an opinion of what PG's
index machinery can cope with.  The general pattern is that it can
use a query WHERE clause with an index if the clause is of the form
"indexed-column indexable-operator constant".  There's a small number
of special cases where it can transform things that don't initially
look like that into the right form, but AFAIR we don't have any
such special cases for any json-related operators.

The one saving grace is that "indexed-column" can be an expression
appearing in an index, so in some cases you can finesse things
that way.  But you won't find any deep knowledge of jsonpath
expressions in there.

Having said that, @? is reported as an indexable operator in v14:

regression=# \dAo gin jsonb*
             List of operators of operator families
 AM  | Operator family |      Operator      | Strategy | Purpose 
-----+-----------------+--------------------+----------+---------
 gin | jsonb_ops       | @>(jsonb,jsonb)    |        7 | search
 gin | jsonb_ops       | @?(jsonb,jsonpath) |       15 | search
 gin | jsonb_ops       | @@(jsonb,jsonpath) |       16 | search
 gin | jsonb_ops       | ?(jsonb,text)      |        9 | search
 gin | jsonb_ops       | ?|(jsonb,text[])   |       10 | search
 gin | jsonb_ops       | ?&(jsonb,text[])   |       11 | search
 gin | jsonb_path_ops  | @>(jsonb,jsonb)    |        7 | search
 gin | jsonb_path_ops  | @?(jsonb,jsonpath) |       15 | search
 gin | jsonb_path_ops  | @@(jsonb,jsonpath) |       16 | search
(9 rows)

so it seems like you ought to get some benefit for this query
from just a plain GIN index on "snapshot".

            regards, tom lane



Re: Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
On Thu, 2 Jun 2022 at 15:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Shaheed Haque <shaheedhaque@gmail.com> writes:
> > -- Create index designed to match the query.
> > --
> > create index idx1 on payrun using gin ((snapshot->'$.employee.* ?
> > (@.pay_graph <> 0 || @.last_run_of_employment == true ||
> > @.state.employment[last][2] == 0)'));
>
> But that doesn't match the query; it's not even the same topmost
> operator:
>
> > explain analyse SELECT id,snapshot #>'{employee,999,state,employment}' FROM
> > "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 ||
> > @.last_run_of_employment
> > == true || @.state.employment[last][2] == 0)');

I assume you are referring to the difference between "snapshot @?" and
"snapshot ->"? If so, apologies: too much cutting and pasting from too
many experiments. I did in fact also try the "using btree ((snapshot
@?" form but it gave the same results.

> In general you seem to have much too high an opinion of what PG's
> index machinery can cope with.  The general pattern is that it can
> use a query WHERE clause with an index if the clause is of the form
> "indexed-column indexable-operator constant".  There's a small number
> of special cases where it can transform things that don't initially
> look like that into the right form, but AFAIR we don't have any
> such special cases for any json-related operators.

LOL. I'm pretty much a noob here, so that's very possible.

> The one saving grace is that "indexed-column" can be an expression
> appearing in an index, so in some cases you can finesse things
> that way.  But you won't find any deep knowledge of jsonpath
> expressions in there.

I was basing my efforts on this statement in the docs
https://www.postgresql.org/docs/14/datatype-json.html#JSON-INDEXING:

    GIN index extracts statements of following form out of jsonpath:
accessors_chain = const. Accessors chain may consist of .key, [*], and
[index] accessors. jsonb_ops additionally supports .* and .**
accessors.

Did I mis-implement, misunderstand or read too much into this?

> Having said that, @? is reported as an indexable operator in v14:
>
> regression=# \dAo gin jsonb*
>              List of operators of operator families
>  AM  | Operator family |      Operator      | Strategy | Purpose
> -----+-----------------+--------------------+----------+---------
>  gin | jsonb_ops       | @>(jsonb,jsonb)    |        7 | search
>  gin | jsonb_ops       | @?(jsonb,jsonpath) |       15 | search
>  gin | jsonb_ops       | @@(jsonb,jsonpath) |       16 | search
>  gin | jsonb_ops       | ?(jsonb,text)      |        9 | search
>  gin | jsonb_ops       | ?|(jsonb,text[])   |       10 | search
>  gin | jsonb_ops       | ?&(jsonb,text[])   |       11 | search
>  gin | jsonb_path_ops  | @>(jsonb,jsonb)    |        7 | search
>  gin | jsonb_path_ops  | @?(jsonb,jsonpath) |       15 | search
>  gin | jsonb_path_ops  | @@(jsonb,jsonpath) |       16 | search
> (9 rows)
>
> so it seems like you ought to get some benefit for this query
> from just a plain GIN index on "snapshot".

Interesting. I'm pretty sure I started there a few days ago without
any luck but I'll give it another spin (having learnt quite a bit
since then).

>
>                         regards, tom lane



Re: Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
OK, I have corrected and simplified the test case (including switching
to a btree index). The WHERE clause and the inex now look like this:

    ...WHERE         ((snapshot -> 'employee' -> '999' ->>
'pay_graph')::integer != 0);
    ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
'pay_graph')::integer != 0));

But the index is still not being used (test case below). I have
confirmed that the equality operator is listed for "search" (I assume
inequality is the same as equality, but I tried both):

# \dAo btree jsonb*
            List of operators of operator families
 AM   | Operator family |    Operator     | Strategy | Purpose
-------+-----------------+-----------------+----------+---------
btree | jsonb_ops       | <(jsonb,jsonb)  |        1 | search
btree | jsonb_ops       | <=(jsonb,jsonb) |        2 | search
btree | jsonb_ops       | =(jsonb,jsonb)  |        3 | search
btree | jsonb_ops       | >=(jsonb,jsonb) |        4 | search
btree | jsonb_ops       | >(jsonb,jsonb)  |        5 | search
(5 rows)

If this is not a bug, then how should the query or the index be
changed to make this work?


=== begin test case ===

CREATE TABLE payrun (
    id serial primary key,
    snapshot JSONB
);

INSERT INTO payrun(snapshot)
VALUES
  ('{"employee": {"999": {"id": 999, "state": {"employment":
[["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
  ('{"employee": {"999": {"id": 999, "state": {"employment":
[["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
  ('{"employee": {"998": {"id": 998, "state": {"employment":
[["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
;

SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
0);

--
-- Create index designed to match the query.
--
CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
'$.*' ->> 'pay_graph')::integer != 0));

set enable_seqscan = OFF;

-- 
-- EXPLAIN ANALYZE ...query above...
--
explain analyze SELECT id,snapshot #>'{employee,999,state,employment}'
FROM "payrun" WHERE ((snapshot -> 'employee' -> '999' ->>
'pay_graph')::integer != 0);
                                                     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Seq Scan on payrun  (cost=10000000000.00..10000000001.08 rows=2
width=36) (actual time=70.051..70.052 rows=1 loops=1)
  Filter: (((((snapshot -> 'employee'::text) -> '999'::text) ->>
'pay_graph'::text))::integer <> 0)
  Rows Removed by Filter: 2
Planning Time: 0.147 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.354 ms, Inlining 8.305 ms, Optimization 49.237
ms, Emission 12.499 ms, Total 70.395 ms
Execution Time: 70.428 ms
(9 rows)

=== end test case ===

Thanks, Shaheed

On Thu, 2 Jun 2022 at 16:51, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> On Thu, 2 Jun 2022 at 15:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Shaheed Haque <shaheedhaque@gmail.com> writes:
> > > -- Create index designed to match the query.
> > > --
> > > create index idx1 on payrun using gin ((snapshot->'$.employee.* ?
> > > (@.pay_graph <> 0 || @.last_run_of_employment == true ||
> > > @.state.employment[last][2] == 0)'));
> >
> > But that doesn't match the query; it's not even the same topmost
> > operator:
> >
> > > explain analyse SELECT id,snapshot #>'{employee,999,state,employment}' FROM
> > > "payrun" WHERE (snapshot @? '$.employee."999" ? (@.pay_graph <> 0 ||
> > > @.last_run_of_employment
> > > == true || @.state.employment[last][2] == 0)');
>
> I assume you are referring to the difference between "snapshot @?" and
> "snapshot ->"? If so, apologies: too much cutting and pasting from too
> many experiments. I did in fact also try the "using btree ((snapshot
> @?" form but it gave the same results.
>
> > In general you seem to have much too high an opinion of what PG's
> > index machinery can cope with.  The general pattern is that it can
> > use a query WHERE clause with an index if the clause is of the form
> > "indexed-column indexable-operator constant".  There's a small number
> > of special cases where it can transform things that don't initially
> > look like that into the right form, but AFAIR we don't have any
> > such special cases for any json-related operators.
>
> LOL. I'm pretty much a noob here, so that's very possible.
>
> > The one saving grace is that "indexed-column" can be an expression
> > appearing in an index, so in some cases you can finesse things
> > that way.  But you won't find any deep knowledge of jsonpath
> > expressions in there.
>
> I was basing my efforts on this statement in the docs
> https://www.postgresql.org/docs/14/datatype-json.html#JSON-INDEXING:
>
>     GIN index extracts statements of following form out of jsonpath:
> accessors_chain = const. Accessors chain may consist of .key, [*], and
> [index] accessors. jsonb_ops additionally supports .* and .**
> accessors.
>
> Did I mis-implement, misunderstand or read too much into this?
>
> > Having said that, @? is reported as an indexable operator in v14:
> >
> > regression=# \dAo gin jsonb*
> >              List of operators of operator families
> >  AM  | Operator family |      Operator      | Strategy | Purpose
> > -----+-----------------+--------------------+----------+---------
> >  gin | jsonb_ops       | @>(jsonb,jsonb)    |        7 | search
> >  gin | jsonb_ops       | @?(jsonb,jsonpath) |       15 | search
> >  gin | jsonb_ops       | @@(jsonb,jsonpath) |       16 | search
> >  gin | jsonb_ops       | ?(jsonb,text)      |        9 | search
> >  gin | jsonb_ops       | ?|(jsonb,text[])   |       10 | search
> >  gin | jsonb_ops       | ?&(jsonb,text[])   |       11 | search
> >  gin | jsonb_path_ops  | @>(jsonb,jsonb)    |        7 | search
> >  gin | jsonb_path_ops  | @?(jsonb,jsonpath) |       15 | search
> >  gin | jsonb_path_ops  | @@(jsonb,jsonpath) |       16 | search
> > (9 rows)
> >
> > so it seems like you ought to get some benefit for this query
> > from just a plain GIN index on "snapshot".
>
> Interesting. I'm pretty sure I started there a few days ago without
> any luck but I'll give it another spin (having learnt quite a bit
> since then).
>
> >
> >                         regards, tom lane



Re: Unable to make use of "deep" JSONB index

От
Erik Rijkers
Дата:
Op 12-06-2022 om 11:34 schreef Shaheed Haque:
> OK, I have corrected and simplified the test case (including switching
> to a btree index). The WHERE clause and the inex now look like this:
> 
>      ...WHERE         ((snapshot -> 'employee' -> '999' ->>
> 'pay_graph')::integer != 0);
>      ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
> 'pay_graph')::integer != 0));
> 
> But the index is still not being used (test case below). I have
> confirmed that the equality operator is listed for "search" (I assume
> inequality is the same as equality, but I tried both):
> 
> # \dAo btree jsonb*
>              List of operators of operator families
>   AM   | Operator family |    Operator     | Strategy | Purpose
> -------+-----------------+-----------------+----------+---------
> btree | jsonb_ops       | <(jsonb,jsonb)  |        1 | search
> btree | jsonb_ops       | <=(jsonb,jsonb) |        2 | search
> btree | jsonb_ops       | =(jsonb,jsonb)  |        3 | search
> btree | jsonb_ops       | >=(jsonb,jsonb) |        4 | search
> btree | jsonb_ops       | >(jsonb,jsonb)  |        5 | search
> (5 rows)
> 
> If this is not a bug, then how should the query or the index be
> changed to make this work?
> 
> 
> === begin test case ===
> 
> CREATE TABLE payrun (
>      id serial primary key,
>      snapshot JSONB
> );
> 
> INSERT INTO payrun(snapshot)
> VALUES
>    ('{"employee": {"999": {"id": 999, "state": {"employment":
> [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
> false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
> null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
>    ('{"employee": {"999": {"id": 999, "state": {"employment":
> [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
> true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
> null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
>    ('{"employee": {"998": {"id": 998, "state": {"employment":
> [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
> false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
> null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
> ;
> 
> SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
> WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
> 0);
> 
> --
> -- Create index designed to match the query.
> --
> CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
> '$.*' ->> 'pay_graph')::integer != 0));
> 
> set enable_seqscan = OFF;
> 

How is this?

I took the triple condition from your earlier email.
I did not use your index.
I added one index using gin jsonb_path_ops.

create index payrun_jspathop_idx ON payrun using gin (snapshot 
jsonb_path_ops);
set enable_seqscan = OFF;
\timing on
     select id, snapshot #>'{employee,999,state,employment}' from payrun
where snapshot @? '$."employee"."999" ?
    ( @.pay_graph <> 0
   || @.last_run_of_employment == true
   || @.state.employment[last][2] == 0
)';
  id |         ?column?
----+---------------------------
   2 | [["1970-01-01", null, 3]]
(1 row)

Time: 0.897 ms
explain analyze  select id, snapshot #>'{employee,999,state,employment}' 
from payrun
where snapshot @? '$."employee"."999" ?
    ( @.pay_graph <> 0
   || @.last_run_of_employment == true
   || @.state.employment[last][2] == 0
)';
 
        QUERY PLAN 


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on payrun  (cost=136.00..140.02 rows=1 width=36) 
(actual time=0.018..0.019 rows=1 loops=1)
    Recheck Cond: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0 
|| @."last_run_of_employment" == true) || 
@."state"."employment"[last][2] == 0)'::jsonpath)
    Rows Removed by Index Recheck: 2
    Heap Blocks: exact=1
    ->  Bitmap Index Scan on payrun_jspathop_idx  (cost=0.00..136.00 
rows=1 width=0) (actual time=0.007..0.007 rows=3 loops=1)
          Index Cond: (snapshot @? '$."employee"."999"?((@."pay_graph" 
!= 0 || @."last_run_of_employment" == true) || 
@."state"."employment"[last][2] == 0)'::jsonpath)
  Planning Time: 0.034 ms
  Execution Time: 0.033 ms
(8 rows)

Time: 0.284 ms


hope that helps.

Erik Rijkers



Re: Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
Thanks Erik. Is the point that the index has to be on the JSON field
as a whole (i.e. "snapshot") rather than deep inside it (e.g.
"snapshot.something.further[down]")?

On Sun, 12 Jun 2022 at 11:53, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 12-06-2022 om 11:34 schreef Shaheed Haque:
> > OK, I have corrected and simplified the test case (including switching
> > to a btree index). The WHERE clause and the inex now look like this:
> >
> >      ...WHERE         ((snapshot -> 'employee' -> '999' ->>
> > 'pay_graph')::integer != 0);
> >      ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
> > 'pay_graph')::integer != 0));
> >
> > But the index is still not being used (test case below). I have
> > confirmed that the equality operator is listed for "search" (I assume
> > inequality is the same as equality, but I tried both):
> >
> > # \dAo btree jsonb*
> >              List of operators of operator families
> >   AM   | Operator family |    Operator     | Strategy | Purpose
> > -------+-----------------+-----------------+----------+---------
> > btree | jsonb_ops       | <(jsonb,jsonb)  |        1 | search
> > btree | jsonb_ops       | <=(jsonb,jsonb) |        2 | search
> > btree | jsonb_ops       | =(jsonb,jsonb)  |        3 | search
> > btree | jsonb_ops       | >=(jsonb,jsonb) |        4 | search
> > btree | jsonb_ops       | >(jsonb,jsonb)  |        5 | search
> > (5 rows)
> >
> > If this is not a bug, then how should the query or the index be
> > changed to make this work?
> >
> >
> > === begin test case ===
> >
> > CREATE TABLE payrun (
> >      id serial primary key,
> >      snapshot JSONB
> > );
> >
> > INSERT INTO payrun(snapshot)
> > VALUES
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
> > false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
> > true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"998": {"id": 998, "state": {"employment":
> > [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
> > false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
> > ;
> >
> > SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
> > WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
> > 0);
> >
> > --
> > -- Create index designed to match the query.
> > --
> > CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
> > '$.*' ->> 'pay_graph')::integer != 0));
> >
> > set enable_seqscan = OFF;
> >
>
> How is this?
>
> I took the triple condition from your earlier email.
> I did not use your index.
> I added one index using gin jsonb_path_ops.
>
> create index payrun_jspathop_idx ON payrun using gin (snapshot
> jsonb_path_ops);
> set enable_seqscan = OFF;
> \timing on
>      select id, snapshot #>'{employee,999,state,employment}' from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>   id |         ?column?
> ----+---------------------------
>    2 | [["1970-01-01", null, 3]]
> (1 row)
>
> Time: 0.897 ms
> explain analyze  select id, snapshot #>'{employee,999,state,employment}'
> from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>
>         QUERY PLAN
>
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on payrun  (cost=136.00..140.02 rows=1 width=36)
> (actual time=0.018..0.019 rows=1 loops=1)
>     Recheck Cond: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0
> || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>     Rows Removed by Index Recheck: 2
>     Heap Blocks: exact=1
>     ->  Bitmap Index Scan on payrun_jspathop_idx  (cost=0.00..136.00
> rows=1 width=0) (actual time=0.007..0.007 rows=3 loops=1)
>           Index Cond: (snapshot @? '$."employee"."999"?((@."pay_graph"
> != 0 || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>   Planning Time: 0.034 ms
>   Execution Time: 0.033 ms
> (8 rows)
>
> Time: 0.284 ms
>
>
> hope that helps.
>
> Erik Rijkers



Re: Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
(Resend, wrong version was sent before)

Thanks Erik. Is the point that the index has to be on the JSON field
as a whole (i.e. "snapshot") rather than deep inside it (e.g.
"snapshot.something.further[down]")?

In my case, the snapshot is several MB in size (perhaps 10MB or even
20MB), dominated by the snapshot.employee (cardinality 10k, each sized
as a dict 1-2kB as text). My expectation/guess is that an index of
"snapshot" will itself be of a size of similar order. However the
design as-is works very well except for this one case where to speed
it up, in principle, the index need contain no more than one boolean
per employee. So that's what I'd like to achieve, if possible.

I've seen no hint in the documentation that creating the index on
"snapshot.something.further[down]" should not work, and PG certainly
allows it to be created. Also, Tom has suggested that I should not
look to some magical ability to infer the use of the index from a
differently structured query, and I've taken that on board with the
updated query + index.

AFAIK, there are 3 possibilities:

- I've not done things right, in which case I'd love to know my mistake.
- It is not supposed to work, in which case it would be good to have
that stated, and maybe have PG not allow useless indices to be
created.
- It is a bug.

All input much appreciated,

Thanks, Shaheed


On Sun, 12 Jun 2022 at 11:53, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 12-06-2022 om 11:34 schreef Shaheed Haque:
> > OK, I have corrected and simplified the test case (including switching
> > to a btree index). The WHERE clause and the inex now look like this:
> >
> >      ...WHERE         ((snapshot -> 'employee' -> '999' ->>
> > 'pay_graph')::integer != 0);
> >      ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
> > 'pay_graph')::integer != 0));
> >
> > But the index is still not being used (test case below). I have
> > confirmed that the equality operator is listed for "search" (I assume
> > inequality is the same as equality, but I tried both):
> >
> > # \dAo btree jsonb*
> >              List of operators of operator families
> >   AM   | Operator family |    Operator     | Strategy | Purpose
> > -------+-----------------+-----------------+----------+---------
> > btree | jsonb_ops       | <(jsonb,jsonb)  |        1 | search
> > btree | jsonb_ops       | <=(jsonb,jsonb) |        2 | search
> > btree | jsonb_ops       | =(jsonb,jsonb)  |        3 | search
> > btree | jsonb_ops       | >=(jsonb,jsonb) |        4 | search
> > btree | jsonb_ops       | >(jsonb,jsonb)  |        5 | search
> > (5 rows)
> >
> > If this is not a bug, then how should the query or the index be
> > changed to make this work?
> >
> >
> > === begin test case ===
> >
> > CREATE TABLE payrun (
> >      id serial primary key,
> >      snapshot JSONB
> > );
> >
> > INSERT INTO payrun(snapshot)
> > VALUES
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
> > false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
> > true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"998": {"id": 998, "state": {"employment":
> > [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
> > false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
> > ;
> >
> > SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
> > WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
> > 0);
> >
> > --
> > -- Create index designed to match the query.
> > --
> > CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
> > '$.*' ->> 'pay_graph')::integer != 0));
> >
> > set enable_seqscan = OFF;
> >
>
> How is this?
>
> I took the triple condition from your earlier email.
> I did not use your index.
> I added one index using gin jsonb_path_ops.
>
> create index payrun_jspathop_idx ON payrun using gin (snapshot
> jsonb_path_ops);
> set enable_seqscan = OFF;
> \timing on
>      select id, snapshot #>'{employee,999,state,employment}' from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>   id |         ?column?
> ----+---------------------------
>    2 | [["1970-01-01", null, 3]]
> (1 row)
>
> Time: 0.897 ms
> explain analyze  select id, snapshot #>'{employee,999,state,employment}'
> from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>
>         QUERY PLAN
>
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on payrun  (cost=136.00..140.02 rows=1 width=36)
> (actual time=0.018..0.019 rows=1 loops=1)
>     Recheck Cond: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0
> || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>     Rows Removed by Index Recheck: 2
>     Heap Blocks: exact=1
>     ->  Bitmap Index Scan on payrun_jspathop_idx  (cost=0.00..136.00
> rows=1 width=0) (actual time=0.007..0.007 rows=3 loops=1)
>           Index Cond: (snapshot @? '$."employee"."999"?((@."pay_graph"
> != 0 || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>   Planning Time: 0.034 ms
>   Execution Time: 0.033 ms
> (8 rows)
>
> Time: 0.284 ms
>
>
> hope that helps.
>
> Erik Rijkers



Re: Unable to make use of "deep" JSONB index

От
Jeff Janes
Дата:


On Sun, Jun 12, 2022 at 5:34 AM Shaheed Haque <shaheedhaque@gmail.com> wrote:
OK, I have corrected and simplified the test case (including switching
to a btree index). The WHERE clause and the inex now look like this:

    ...WHERE         ((snapshot -> 'employee' -> '999' ->>
'pay_graph')::integer != 0);
    ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
'pay_graph')::integer != 0));

But, this is not a correction.  You are still trying to use -> as if it were @?, and that is still not going to work.

You are indexing the part of snapshot which has the employee number of '$.*', which is a weird employee number for anyone to have.  You might want to represent a wildcard but that is not what -> does.

Cheer,

Jeff

Re: Unable to make use of "deep" JSONB index

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Sun, Jun 12, 2022 at 5:34 AM Shaheed Haque <shaheedhaque@gmail.com>
> wrote:
>> OK, I have corrected and simplified the test case (including switching
>> to a btree index). The WHERE clause and the inex now look like this:
>> 
>> ...WHERE         ((snapshot -> 'employee' -> '999' ->>
>> 'pay_graph')::integer != 0);
>> ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
>> 'pay_graph')::integer != 0));

> But, this is not a correction.  You are still trying to use -> as if it
> were @?, and that is still not going to work.

In hopes of clarifying some more: all that index does is to record
the boolean result of
    (snapshot -> 'employee' -> '$.*' ->> 'pay_graph')::integer != 0
at each row.  We could use it for a query that contains *exactly*
that condition as a WHERE clause.  We cannot use it for a query that
contains some other condition, even if that other condition looks
related to you.

> You are indexing the part of snapshot which has the employee number of
> '$.*', which is a weird employee number for anyone to have.  You might want
> to represent a wildcard but that is not what -> does.

Yeah, there's also the problem that the semantics of this particular
expression aren't really useful.  But even if they were, PG's index
machinery is not smart enough to pick apart the contents of an index
expression.  If the index expression *exactly* matches some sub-expression
of a WHERE clause, and what's above that sub-expression is an operator
that's indexable according to the index opclass, then we have a chance
of using it.  This example is not that.

            regards, tom lane



Re: Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
On Sun, 12 Jun 2022 at 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jeff Janes <jeff.janes@gmail.com> writes:
> > On Sun, Jun 12, 2022 at 5:34 AM Shaheed Haque <shaheedhaque@gmail.com>
> > wrote:
> >> OK, I have corrected and simplified the test case (including switching
> >> to a btree index). The WHERE clause and the inex now look like this:
> >>
> >> ...WHERE         ((snapshot -> 'employee' -> '999' ->>
> >> 'pay_graph')::integer != 0);
> >> ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
> >> 'pay_graph')::integer != 0));
>
> > But, this is not a correction.  You are still trying to use -> as if it
> > were @?, and that is still not going to work.
>
> In hopes of clarifying some more: all that index does is to record
> the boolean result of
>         (snapshot -> 'employee' -> '$.*' ->> 'pay_graph')::integer != 0
> at each row.  We could use it for a query that contains *exactly*
> that condition as a WHERE clause.  We cannot use it for a query that
> contains some other condition, even if that other condition looks
> related to you.

OK, I see that I got myself all confused and the @? form is needed. So:

    ...USING btree ((snapshot -> 'employee' @? '$.* ? (@.pay_graph != 0)'));
    ...WHERE        ((snapshot -> 'employee' @? '$."999" ? (@.pay_graph != 0)'))

> > You are indexing the part of snapshot which has the employee number of
> > '$.*', which is a weird employee number for anyone to have.  You might want
> > to represent a wildcard but that is not what -> does.
>
> Yeah, there's also the problem that the semantics of this particular
> expression aren't really useful.

Does the switch back to @? address this point? If not, please clarify.

>  But even if they were, PG's index
> machinery is not smart enough to pick apart the contents of an index
> expression.  If the index expression *exactly* matches some sub-expression

To my inexpert eye, given the presence of the wildcard, the above look
like an exact match. What have I missed?

> of a WHERE clause, and what's above that sub-expression is an operator
> that's indexable according to the index opclass, then we have a chance
> of using it.  This example is not that.

Because I switched to btree, and btree cannot search on "@?"? So, what
should the index+query look like using gin? (I am trying to address a
niche case, and can easily arrange for them to match if I knew what
was needed).

Thanks, Shaheed

>
>                         regards, tom lane



Re: Unable to make use of "deep" JSONB index

От
"David G. Johnston"
Дата:
On Sun, Jun 12, 2022 at 4:52 PM Shaheed Haque <shaheedhaque@gmail.com> wrote:
    ...USING btree ((snapshot -> 'employee' @? '$.* ? (@.pay_graph != 0)'));
    ...WHERE        ((snapshot -> 'employee' @? '$."999" ? (@.pay_graph != 0)'))

To my inexpert eye, given the presence of the wildcard, the above look
like an exact match. What have I missed?


The system doesn't see a wildcard.  Exact match means identical characters.  The fact that you had to write "given the presence of the wildcard" is why this is not an exact match.

David J.

Re: Unable to make use of "deep" JSONB index

От
Jeff Janes
Дата:
On Sun, Jun 12, 2022 at 3:31 PM Shaheed Haque <shaheedhaque@gmail.com> wrote:
(Resend, wrong version was sent before)

Thanks Erik. Is the point that the index has to be on the JSON field
as a whole (i.e. "snapshot") rather than deep inside it (e.g.
"snapshot.something.further[down]")?

In my case, the snapshot is several MB in size (perhaps 10MB or even
20MB), dominated by the snapshot.employee (cardinality 10k, each sized
as a dict 1-2kB as text). My expectation/guess is that an index of
"snapshot" will itself be of a size of similar order. However the
design as-is works very well except for this one case where to speed
it up, in principle, the index need contain no more than one boolean
per employee. So that's what I'd like to achieve, if possible.

It sounds like what you really want here is to extract just the list of the ids which meet one of your three further criteria, and index that list.

You can do that with jsonpath, but you have to apply it with a function, not one of the boolean-returning operators.

This almost works to do that:

create index on payrun using gin (jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0).id'));

Then query it like:

select * from payrun where jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0).id') ? '999';

The problem is that ? only tests for top-level text values, while in your example the value for the 'id' keys are ints, not text.  So they are not findable with the ? operator. If I edit your example data to wrap the ids' values in double quotes, turning them into json strings rather than json ints, then this does work for me.

Maybe there is a way to modify the jsonpath so that it converts the ints to text for you.  But if there is such a way, I don't know what it is.
 
If I were doing this for my own system, I would probably create an immutable user-defined-function which took a jsonb and returned an int[] of the filtered id values.  Then you would have to query it with @> rather than ?.


- I've not done things right, in which case I'd love to know my mistake.
- It is not supposed to work, in which case it would be good to have
that stated, and maybe have PG not allow useless indices to be
created.
- It is a bug.

I guess it is number one mixed with number two.  The index you created is useless for your intended purpose, but is not useless for every conceivable purpose.  It is not realistic to expect PostgreSQL to reject things just because it is not obvious (to a computer) what you are getting up to.

Cheers,

Jeff

Re: Unable to make use of "deep" JSONB index

От
Erik Rijkers
Дата:
Op 12-06-2022 om 21:31 schreef Shaheed Haque:

> Thanks Erik. Is the point that the index has to be on the JSON field
> as a whole (i.e. "snapshot") rather than deep inside it (e.g.
> "snapshot.something.further[down]")?
> 
> In my case, the snapshot is several MB in size (perhaps 10MB or even
> 20MB), dominated by the snapshot.employee (cardinality 10k, each sized
> as a dict 1-2kB as text). My expectation/guess is that an index of
> "snapshot" will itself be of a size of similar order. However the
> design as-is works very well except for this one case where to speed
> it up, in principle, the index need contain no more than one boolean
> per employee. So that's what I'd like to achieve, if possible.
> 
> I've seen no hint in the documentation that creating the index on
> "snapshot.something.further[down]" should not work, and PG certainly
> allows it to be created. Also, Tom has suggested that I should not
> look to some magical ability to infer the use of the index from a
> differently structured query, and I've taken that on board with the
> updated query + index.

If you insist on a btree/integer on pay_graph, the below stuff seems to 
work, no?

Perhaps you've missed the difference
between
   x != 0  on the one hand,
and
   x > 0  on the other,

Here are both queries to show that difference.


CREATE INDEX payrun_btree_paygr_idx ON payrun using btree 
(((snapshot->'employee'->'999'->>'pay_graph')::integer));

set enable_seqscan = 0;

SELECT id, snapshot #>'{employee,999,state,employment}'
FROM payrun
WHERE (snapshot->'employee'->'999'->>'pay_graph')::integer > 0 ;
  id |         ?column?
----+---------------------------
   2 | [["1970-01-01", null, 3]]
(1 row)

Time: 1.384 ms

explain analyze  SELECT id, snapshot #>'{employee,999,state,employment}'
FROM payrun WHERE (snapshot->'employee'->'999'->>'pay_graph')::integer > 
0 ;
                                                            QUERY PLAN 


---------------------------------------------------------------------------------------------------------------------------------
  Index Scan using payrun_btree_paygr_idx on payrun  (cost=0.14..8.98 
rows=42 width=36) (actual time=0.018..0.020 rows=1 loops=1)
    Index Cond: (((((snapshot -> 'employee'::text) -> '999'::text) ->> 
'pay_graph'::text))::integer > 0)
  Planning Time: 0.047 ms
  Execution Time: 0.048 ms
(4 rows)

Time: 0.386 ms
explain analyze  SELECT id, snapshot #>'{employee,999,state,employment}'
FROM payrun WHERE (snapshot->'employee'->'999'->>'pay_graph')::integer 
!= 0 ;
                                                         QUERY PLAN 


---------------------------------------------------------------------------------------------------------------------------
  Seq Scan on payrun  (cost=10000000000.00..10000000004.49 rows=126 
width=36) (actual time=112.182..112.186 rows=1 loops=1)
    Filter: (((((snapshot -> 'employee'::text) -> '999'::text) ->> 
'pay_graph'::text))::integer <> 0)
    Rows Removed by Filter: 3
  Planning Time: 0.050 ms
  JIT:
    Functions: 4
    Options: Inlining true, Optimization true, Expressions true, 
Deforming true
    Timing: Generation 0.636 ms, Inlining 41.761 ms, Optimization 52.033 
ms, Emission 18.228 ms, Total 112.658 ms
  Execution Time: 153.486 ms
(9 rows)

Time: 153.835 ms

So:
     0.386 ms  for x > 0   uses payrun_btree_paygr_idx
vs:
   153.835 ms  for x != 0  uses seq scan

It's hard to see for me where you encounter problems; including the fast 
searches in my earier mail, it looks to me like you have many good options.

Erik Rijkers



Re: Unable to make use of "deep" JSONB index

От
Shaheed Haque
Дата:
Jeff, David, Erik, Tom,

I now see that I misunderstood several aspects of the problem. With
regard to Jeff's last note, I note (and like!!!!):

- the use of the final ".id" to collapse the employee "dict" into an
array of simple values.
- the use of "?" to test elements in the array as "keys".

As Jeff noted, the "?" requires a text value. Luckily I have a
".username" in my real data that should work...so I think I have a
solution without recourse to a custom function! I have appended an
updated test case showing the indexing working. Finally, I do wonder
if a "jsonb_path_query_array_text" function might be worth considering
as an enhancement for when there is not a usable text value available?

Thanks again for all your kind help.

Shaheed

== test case showing how to index "deep" into a JSONB field ==

DROP TABLE payrun;
DROP INDEX idx1;

CREATE TABLE payrun (
    id serial primary key,
    snapshot JSONB
);

INSERT INTO payrun(snapshot)
VALUES
  ('{"employee": {"999": {"id": "user999@foo.com", "state":
{"employment": [["1920-01-01", null, 3]]}, "pay_graph": 0,
"last_run_of_employment": false}, "111": {"id": "user111@foo.com",
"state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 3,
"last_run_of_employment": true}}}'),
  ('{"employee": {"999": {"id": "user999@foo.com", "state":
{"employment": [["1970-01-01", null, 3]]}, "pay_graph": 6,
"last_run_of_employment": true}, "222": {"id": "user222@foo.com",
"state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 5,
"last_run_of_employment": true}}}'),
  ('{"employee": {"998": {"id": "user998@foo.com", "state":
{"employment": [["1980-01-01", null, 3]]}, "pay_graph": 7,
"last_run_of_employment": false}, "333": {"id": "user333@foo.com",
"state": {"employment": [["1920-01-01", null, 5]]}, "pay_graph": 3,
"last_run_of_employment": true}}}')
;

CREATE INDEX idx1 ON payrun USING gin
(jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <> 0 ||
@.last_run_of_employment == true || @.state.employment[last][2] ==
0).id'));

SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
WHERE jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <>
0 || @.last_run_of_employment == true || @.state.employment[last][2]
== 0).id') ? 'user999@foo.com';

--
-- Test
--
set enable_seqscan = OFF;

EXPLAIN ANALYSE
SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
WHERE jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <>
0 || @.last_run_of_employment == true || @.state.employment[last][2]
== 0).id') ? 'user999@foo.com';



-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------
Bitmap Heap Scan on payrun  (cost=8.00..12.02 rows=1 width=36) (actual
time=0.022..0.023 rows=1 loops=1)
  Recheck Cond: (jsonb_path_query_array(snapshot,
'$."employee".*?((@."pay_graph" != 0 || @."last_run_of_employment" ==
true) || @."state"."employment"[last][2] == 0)."id"'::jsonpath,
'{}'::jsonb, false) ? 'user999@foo.com
'::text)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on idx1  (cost=0.00..8.00 rows=1 width=0)
(actual time=0.007..0.007 rows=1 loops=1)
        Index Cond: (jsonb_path_query_array(snapshot,
'$."employee".*?((@."pay_graph" != 0 || @."last_run_of_employment" ==
true) || @."state"."employment"[last][2] == 0)."id"'::jsonpath,
'{}'::jsonb, false) ? 'user999@foo
.com'::text)
Planning Time: 0.062 ms
Execution Time: 0.041 ms
(7 rows)





On Mon, 13 Jun 2022 at 03:23, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Sun, Jun 12, 2022 at 3:31 PM Shaheed Haque <shaheedhaque@gmail.com> wrote:
>>
>> (Resend, wrong version was sent before)
>>
>> Thanks Erik. Is the point that the index has to be on the JSON field
>> as a whole (i.e. "snapshot") rather than deep inside it (e.g.
>> "snapshot.something.further[down]")?
>>
>> In my case, the snapshot is several MB in size (perhaps 10MB or even
>> 20MB), dominated by the snapshot.employee (cardinality 10k, each sized
>> as a dict 1-2kB as text). My expectation/guess is that an index of
>> "snapshot" will itself be of a size of similar order. However the
>> design as-is works very well except for this one case where to speed
>> it up, in principle, the index need contain no more than one boolean
>> per employee. So that's what I'd like to achieve, if possible.
>
>
> It sounds like what you really want here is to extract just the list of the ids which meet one of your three further
criteria,and index that list. 
>
> You can do that with jsonpath, but you have to apply it with a function, not one of the boolean-returning operators.
>
> This almost works to do that:
>
> create index on payrun using gin (jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <> 0 ||
@.last_run_of_employment== true || @.state.employment[last][2] == 0).id')); 
>
> Then query it like:
>
> select * from payrun where jsonb_path_query_array(snapshot, '$.employee.* ? (@.pay_graph <> 0 ||
@.last_run_of_employment== true || @.state.employment[last][2] == 0).id') ? '999'; 
>
> The problem is that ? only tests for top-level text values, while in your example the value for the 'id' keys are
ints,not text.  So they are not findable with the ? operator. If I edit your example data to wrap the ids' values in
doublequotes, turning them into json strings rather than json ints, then this does work for me. 
>
> Maybe there is a way to modify the jsonpath so that it converts the ints to text for you.  But if there is such a
way,I don't know what it is. 
>
> If I were doing this for my own system, I would probably create an immutable user-defined-function which took a jsonb
andreturned an int[] of the filtered id values.  Then you would have to query it with @> rather than ?. 
>
>>
>> - I've not done things right, in which case I'd love to know my mistake.
>> - It is not supposed to work, in which case it would be good to have
>> that stated, and maybe have PG not allow useless indices to be
>> created.
>> - It is a bug.
>
>
> I guess it is number one mixed with number two.  The index you created is useless for your intended purpose, but is
notuseless for every conceivable purpose.  It is not realistic to expect PostgreSQL to reject things just because it is
notobvious (to a computer) what you are getting up to. 
>
> Cheers,
>
> Jeff