Обсуждение: Hstore index for full text search

Поиск
Список
Период
Сортировка

Hstore index for full text search

От
Burhan Akbulut
Дата:
Hi all,

I need help in full text search optimization for hstore type. I added my query to explain.depesz, you can check the query and also i added explain analyze result in this link: https://explain.depesz.com/s/QT1e

table_ord.ops column type is hstore. I couldn't find the effective index that would reduce the run time of the query.

When I tried to add an gin index to the hstore column, I got the following error:

create index on table_ord USING gin (ops);
ERROR:  index row size 2728 exceeds maximum 2712 for index " table_ord_ ops_idx"

How can I fix this query ?

Best Regards,

Burhan Akbulut
DBA - Cooksoft

Re: Hstore index for full text search

От
Michael Lewis
Дата:
Hash Cond: (o.courier_id = cc.id)
Filter: (((o.tracker_code)::text ~~* '%1654323%'::text) OR ((table_cus.name)::text ~~* '%1654323%'::text) OR ((au.username)::text ~~ '%1654323%'::text) OR ((o.source)::text ~~* '%1654323%'::text) OR ((o.ops -> 'shop'::text) ~~* '%1654323%'::text) OR ((o.ops -> 'camp_code'::text) ~~* '%1654323%'::text) OR ((city.name)::text ~~* '%1654323%'::text) OR ((co.name)::text ~~* '%1654323%'::text) OR ((o.tr_code)::text ~~* '%1654323%'::text) OR ((o.ops ? 'shipping_company'::text) AND ((o.ops -> 'shipping_company'::text) ~~* '%1654323%'::text)) OR ((cc.name)::text ~~* '%1654323%'::text))


All those OR conditions on different tables and fields seems like it will be unlikely that the planner will do anything with the index you are trying to create (for this query).

On the error, I came across discussions on dba.stackexchange.com referencing a limit of about 1/3 of the page size (8192) for every key because of it being a btree underneath. It could be one or more of your keys in ops (like shop, camp_code, and shipping_company) is much longer than those examples shown in the query.

Re: Hstore index for full text search

От
Tom Lane
Дата:
Michael Lewis <mlewis@entrata.com> writes:
> Hash Cond: (o.courier_id = cc.id)
> Filter: (((o.tracker_code)::text ~~* '%1654323%'::text) OR
> ((table_cus.name)::text
> ~~* '%1654323%'::text) OR ((au.username)::text ~~ '%1654323%'::text) OR
> ((o.source)::text ~~* '%1654323%'::text) OR ((o.ops -> 'shop'::text) ~~*
> '%1654323%'::text) OR ((o.ops -> 'camp_code'::text) ~~* '%1654323%'::text)
> OR ((city.name)::text ~~* '%1654323%'::text) OR ((co.name)::text ~~*
> '%1654323%'::text) OR ((o.tr_code)::text ~~* '%1654323%'::text) OR ((o.ops
> ? 'shipping_company'::text) AND ((o.ops -> 'shipping_company'::text) ~~*
> '%1654323%'::text)) OR ((cc.name)::text ~~* '%1654323%'::text))

> All those OR conditions on different tables and fields seems like it will
> be unlikely that the planner will do anything with the index you are trying
> to create (for this query).

A GIN index on an hstore column only provides the ability to search for
exact matches to hstore key strings.  There are a few bells and whistles,
like the ability to AND or OR such conditions.  But basically it's just an
exact-match engine, and it doesn't index the hstore's data values at all
(which is why the implementors weren't too concerned about having a length
limit on the index entries).  There is 0 chance of this index type being
useful for what the OP wants to do.

Given these examples, I'd think about setting up a collection of pg_trgm
indexes on the specific hstore keys you care about, ie something like

CREATE INDEX ON mytable USING GIST ((ops -> 'camp_code') gist_trgm_ops);
CREATE INDEX ON mytable USING GIST ((ops -> 'shipping_company') gist_trgm_ops);
...

which'd allow indexing queries like

... WHERE (ops -> 'camp_code') LIKE '%1654323%'
       OR (ops -> 'shipping_company') LIKE '%1654323%'

I'm not sure how far this will get you, though; if there's a whole lot
of different keys of interest, maintaining a separate index for each
one is probably too much overhead.  Another point is that you will only
get an indexscan if *every* OR'd clause matches some index.  The example
query looks sufficiently unstructured that that might be hard to ensure.

I kind of wonder whether this data design is actually a good idea.
It doesn't seem to match your querying style terribly well.

            regards, tom lane



Re: Hstore index for full text search

От
Michael Lewis
Дата:
On Tue, Aug 11, 2020 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
A GIN index on an hstore column only provides the ability to search for
exact matches to hstore key strings.  There are a few bells and whistles,
like the ability to AND or OR such conditions.  But basically it's just an
exact-match engine, and it doesn't index the hstore's data values at all
(which is why the implementors weren't too concerned about having a length
limit on the index entries).  There is 0 chance of this index type being
useful for what the OP wants to do.

Thanks for sharing. More like json path ops and not the full key and value. Interesting.
 
Another point is that you will only
get an indexscan if *every* OR'd clause matches some index.  The example
query looks sufficiently unstructured that that might be hard to ensure.

Does this still apply when the where clauses are on several tables and not just one?

Re: Hstore index for full text search

От
Tom Lane
Дата:
Michael Lewis <mlewis@entrata.com> writes:
> On Tue, Aug 11, 2020 at 4:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another point is that you will only
>> get an indexscan if *every* OR'd clause matches some index.  The example
>> query looks sufficiently unstructured that that might be hard to ensure.

> Does this still apply when the where clauses are on several tables and not
> just one?

Yeah.  In that case there's no hope of an indexscan at all, since for all
the planner knows, the query might match some table rows that don't meet
any of the conditions mentioned for that table's columns.  If you can
write

WHERE (condition-on-t1.a OR condition-on-t1.b OR ...)
  AND (condition-on-t2.x OR condition-on-t2.y OR ...)

then you've got a chance of making the OR'd conditions into index
qualifications on t1 or t2 respectively.  But if it's

WHERE condition-on-t1.a OR condition-on-t1.b OR ...
   OR condition-on-t2.x OR condition-on-t2.y OR ...

then you're in for full-table scans.  (This is another thing that
was bothering me about the data design, though I failed to think
it through clearly enough to state before.)

            regards, tom lane