Обсуждение: GIN vs BTREE - query planner picking the wrong one some times

Поиск
Список
Период
Сортировка

GIN vs BTREE - query planner picking the wrong one some times

От
Max Kremer
Дата:
Using postgres 9.5 and I'm trying to improve performance of searches using the LIKE operator on column containing a URL.

Consider a table like this with about 50 million rows

    CREATE TABLE page_hit (
        timestamp_ timestamp without time zone NOT NULL,
        location_url character varying(2048)
    )

and a query like this

    SELECT count(*) 
    FROM page_hit 
    WHERE location_url LIKE 'http://mysite.foo.com/path/morepath%'

The above query makes use of the following index

    CREATE INDEX location_idx
      ON page_hit
      USING btree
      (location_url varchar_pattern_ops);

The works great..BUT... to support LIKE wild cards in other parts of the string (for example a leading %) I created a GIN index as follows

    CREATE INDEX location_idx_gin
      ON page_hit
      USING gin
      (location COLLATE pg_catalog."default" gin_trgm_ops);


The problem is when I run EXPLAIN ANALYZE on the above query now its only ever using location_idx_gin and ignoring location_idx which kinda sucks cause the location_idx index is faster at trailing % queries. The query planner seems to ignore my BTREE index in all cases and uses the GIN index.  

The BTREE index outperforms the GIN index (in the case of a trailing %) but I want the planner to pick the GIN index in the case of leading %. Is there a way to do this? 

Some metrics (queries trailing %):

 - BTREE : <1 second explain: https://explain.depesz.com/s/7wgx 
 - GIN   :   3.8 seconds explain: https://explain.depesz.com/s/wYhk





Re: GIN vs BTREE - query planner picking the wrong one some times

От
Tom Lane
Дата:
Max Kremer <mkremer@trialfire.com> writes:
> The problem is when I run EXPLAIN ANALYZE on the above query now its only
> ever using location_idx_gin and ignoring location_idx which kinda sucks
> cause the location_idx index is faster at trailing % queries. The query
> planner seems to ignore my BTREE index in all cases and uses the GIN index.

I tried to reproduce that, but in my hands the planner just about always
prefers the btree if it has a choice.  Especially for longer strings such
as in your example.  (It looks to me like the cost estimate for GIN is
mostly proportional to the number of trigrams in the query string; I find
that GIN is estimated to cost about the same for LIKE strings as short as
'the%', but loses badly for anything longer.)

Can you show a concrete example (with sample data) where this doesn't
hold?

> Some metrics (queries trailing %):
>  - BTREE : <1 second explain: https://explain.depesz.com/s/7wgx
>  - GIN   :   3.8 seconds explain: https://explain.depesz.com/s/wYhk

The obfuscation of the index conditions makes these just about useless
for telling what's going on :-(

            regards, tom lane