Обсуждение: Computed index on transformation of jsonb key set

Поиск
Список
Период
Сортировка

Computed index on transformation of jsonb key set

От
Steven Schlansker
Дата:
Hi Postgres fans,

We store a Java Map<UUID, Integer> in Postgres as a jsonb column.
As json does not have a UUID type, it is of course stored as text.

A simple value might be:
{"04e623c0-6940-542f-a0de-4c999c626dfe": 50000000, "6d3e24b6-9e8c-5eb1-9e4c-f32cc40864c9": 208250000}

I am trying to implement an efficient set-intersection test.
"Find all rows where the map has a key in (uuid1, uuid2, uuid3)"

The most straightforward approach is to create a GIN over the whole column.
This has the documented disadvantage that all keys and values are stored in the GIN repeatedly, with the kicker that
theUUID will be stored as text. 

I figured I'd end up with significantly better storage and performance characteristics if I first compute a uuid[]
valueand build the GIN over that, and use the array operator class instead.  Additionally, this eliminates possible
confusionabout uuid casing (text is case sensitive, uuid is not) and this has already caused at least one bug in our
application.

I attempted to optimize a query like:
select * from tbl where array(select jsonb_object_keys(mapData)::uuid) &&
array['320982a7-cfaa-572a-b5ea-2074d7f3b014'::uuid];

with:
create index my_idx on tbl using gin((array(select jsonb_object_keys(mapData)::uuid)));

Of course, this fails, ERROR: cannot use subquery in index expression
Clever me says, "well, I'll just use array_agg instead!"

create index my_idx on tbl using gin((array_agg(jsonb_object_keys(mapData)::uuid)));

Postgres 9.6 says:
ERROR:  aggregate functions are not allowed in index expressions

Just to see what happened, knowing that GIN supports multi-valued data, I tried:

create index my_idx on tbl using gin((jsonb_object_keys(mapData)::uuid));
ERROR:  index expression cannot return a set

How can I efficiently implement the feature I've described?  It seems difficult to use computed indexing with GIN.

Thank you for any advice and have a great weekend,
Steven




Re: Computed index on transformation of jsonb key set

От
"David G. Johnston"
Дата:
On Fri, Apr 26, 2019 at 2:25 PM Steven Schlansker <stevenschlansker@gmail.com> wrote:
How can I efficiently implement the feature I've described?  It seems difficult to use computed indexing with GIN.

Don't use a computed index?  Add a trigger to populate a physical column and index that.  Modify queries to make simple tests against the stored array.

David J.

Re: Computed index on transformation of jsonb key set

От
Rob Sargent
Дата:
On 4/26/19 3:25 PM, Steven Schlansker wrote:
> Hi Postgres fans,
>
> We store a Java Map<UUID, Integer> in Postgres as a jsonb column.
> As json does not have a UUID type, it is of course stored as text.
>
> A simple value might be:
> {"04e623c0-6940-542f-a0de-4c999c626dfe": 50000000, "6d3e24b6-9e8c-5eb1-9e4c-f32cc40864c9": 208250000}
>
> I am trying to implement an efficient set-intersection test.
> "Find all rows where the map has a key in (uuid1, uuid2, uuid3)"
>
> The most straightforward approach is to create a GIN over the whole column.
> This has the documented disadvantage that all keys and values are stored in the GIN repeatedly, with the kicker that
theUUID will be stored as text.
 
>
> I figured I'd end up with significantly better storage and performance characteristics if I first compute a uuid[]
valueand build the GIN over that, and use the array operator class instead.  Additionally, this eliminates possible
confusionabout uuid casing (text is case sensitive, uuid is not) and this has already caused at least one bug in our
application.
>
> I attempted to optimize a query like:
> select * from tbl where array(select jsonb_object_keys(mapData)::uuid) &&
array['320982a7-cfaa-572a-b5ea-2074d7f3b014'::uuid];
>
> with:
> create index my_idx on tbl using gin((array(select jsonb_object_keys(mapData)::uuid)));
>
> Of course, this fails, ERROR: cannot use subquery in index expression
> Clever me says, "well, I'll just use array_agg instead!"
>
> create index my_idx on tbl using gin((array_agg(jsonb_object_keys(mapData)::uuid)));
>
> Postgres 9.6 says:
> ERROR:  aggregate functions are not allowed in index expressions
>
> Just to see what happened, knowing that GIN supports multi-valued data, I tried:
>
> create index my_idx on tbl using gin((jsonb_object_keys(mapData)::uuid));
> ERROR:  index expression cannot return a set
>
> How can I efficiently implement the feature I've described?  It seems difficult to use computed indexing with GIN.
>
> Thank you for any advice and have a great weekend,
> Steven
>
>
>
Storing the map in a child table (parentId, UUID, int) is out of the 
question?



Re: Computed index on transformation of jsonb key set

От
Steven Schlansker
Дата:

> On Apr 26, 2019, at 2:33 PM, Rob Sargent <robjsargent@gmail.com> wrote:
>
>
> On 4/26/19 3:25 PM, Steven Schlansker wrote:
>>
>>
>> How can I efficiently implement the feature I've described?  It seems difficult to use computed indexing with GIN.
>>
> Storing the map in a child table (parentId, UUID, int) is out of the question?

Yes, thanks Rob and David, that will be my approach if I can't figure out anything better.

I was hoping to not have to denormalize the data given that Postgres generally has great support for computing indexes,
both to avoid storage costs and eliminate the need to maintain triggers or application code to handle it.




Re: Computed index on transformation of jsonb key set

От
Rob Sargent
Дата:
On 4/26/19 3:53 PM, Steven Schlansker wrote:
>
>> On Apr 26, 2019, at 2:33 PM, Rob Sargent <robjsargent@gmail.com> wrote:
>>
>>
>> On 4/26/19 3:25 PM, Steven Schlansker wrote:
>>>
>>> How can I efficiently implement the feature I've described?  It seems difficult to use computed indexing with GIN.
>>>
>> Storing the map in a child table (parentId, UUID, int) is out of the question?
> Yes, thanks Rob and David, that will be my approach if I can't figure out anything better.
>
> I was hoping to not have to denormalize the data given that Postgres generally has great support for computing
indexes,
> both to avoid storage costs and eliminate the need to maintain triggers or application code to handle it.
>
>
>
Is that json structure really more compact than the matching table (even 
with the additional parent id)?



Re: Computed index on transformation of jsonb key set

От
Andrew Gierth
Дата:
>>>>> "Steven" == Steven Schlansker <stevenschlansker@gmail.com> writes:

 Steven> I figured I'd end up with significantly better storage and
 Steven> performance characteristics if I first compute a uuid[] value
 Steven> and build the GIN over that, and use the array operator class
 Steven> instead. Additionally, this eliminates possible confusion about
 Steven> uuid casing (text is case sensitive, uuid is not) and this has
 Steven> already caused at least one bug in our application.

 Steven> I attempted to optimize a query like:
 Steven> select * from tbl where array(select jsonb_object_keys(mapData)::uuid) &&
array['320982a7-cfaa-572a-b5ea-2074d7f3b014'::uuid];

Obvious solution:

create function uuid_keys(mapData jsonb) returns uuid[]
  language plpgsql immutable strict
  as $$
    begin
      return array(select jsonb_object_keys(mapData)::uuid);
    end;
  $$;

create index on tbl using gin (uuid_keys(mapData));

select * from tbl where uuid_keys(mapData) && array[...];

-- 
Andrew (irc:RhodiumToad)



Re: Computed index on transformation of jsonb key set

От
Steven Schlansker
Дата:
> On Apr 27, 2019, at 12:55 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
> Obvious solution:
>
> create function uuid_keys(mapData jsonb) returns uuid[]
>  language plpgsql immutable strict
>  as $$
>    begin
>      return array(select jsonb_object_keys(mapData)::uuid);
>    end;
>  $$;
>
> create index on tbl using gin (uuid_keys(mapData));
>
> select * from tbl where uuid_keys(mapData) && array[...];

Thank you Andrew, I modified it slightly to use `language sql` but otherwise
it gets me going forward.