Обсуждение: Use of multi-column gin index

Поиск
Список
Период
Сортировка

Use of multi-column gin index

От
Jared Rulison
Дата:
Hello, we have some confusion over the planner's use of an index.

Suppose we have a table "parades" with columns:

    "city_id" of type integer
    "description" of type text
    "start_time" of type timestamp without time zone

Suppose also we have indexes:

    "parades_city_id_description_tsv_index" gin (city_id, to_tsvector('simple'::regconfig, description)) WHERE description IS NOT NULL
    "parades_city_id_start_time_index" btree (city_id, start_time)

When we EXPLAIN the query

    SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND (to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND ("city_id" IN (<roughly 50 ids>)));

We get

    Bitmap Heap Scan on parades  (cost=12691.97..18831.21 rows=2559 width=886)
      Recheck Cond: ((to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery) AND (description IS NOT NULL) AND (city_id = ANY ('{<roughly 50 ids>}'::integer[])))
      ->  BitmapAnd  (cost=12691.97..12691.97 rows=2559 width=0)
            ->  Bitmap Index Scan on parades_city_id_description_tsv_index  (cost=0.00..2902.97 rows=229463 width=0)
                  Index Cond: (to_tsvector('simple'::regconfig, title) @@ '''fun'''::tsquery)
            ->  Bitmap Index Scan on parades_city_id_start_time_index  (cost=0.00..9787.47 rows=565483 width=0)
                  Index Cond: (city_id = ANY ('{<roughly 50 ids>}'::integer[]))

When we EXPLAIN the same query but with one city_id

    SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND (to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND ("city_id" IN (1)));

We get

     Bitmap Heap Scan on parades  (cost=36.20..81.45 rows=20 width=886)
       Recheck Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery) AND (description IS NOT NULL))
       ->  Bitmap Index Scan on 
parades_city_id_description_tsv_index  (cost=0.00..36.20 rows=20 width=0)
             Index Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery))

This leaves us with two questions:

1. How is postgres able to use parades_city_id_description_tsv_index in the first explain result without any filter on "city_id"?
2. Why does the planner in the first query decide not to simply use parades_city_id_description_tsv_index (as in the second explain result) when the cardinality of the set of "city_id"s is high?

Thanks,
Jared

Re: Use of multi-column gin index

От
Tom Lane
Дата:
Jared Rulison <jared@affinity.co> writes:
> Hello, we have some confusion over the planner's use of an index.
> ...
> 1. How is postgres able to use parades_city_id_description_tsv_index in the
> first explain result without any filter on "city_id"?

GIN indexes don't have any particular bias towards earlier or later
columns (unlike btrees).  So this isn't any harder than if you'd
put the index columns in the other order.

> 2. Why does the planner in the first query decide not to simply use
> parades_city_id_description_tsv_index (as in the second explain result)
> when the cardinality of the set of "city_id"s is high?

[ shrug... ]  It thinks it's cheaper.  Whether it's correct is impossible
to say from the given data, but there is a moderately complex cost model
in there.  The comments for gincost_scalararrayopexpr note

 * A ScalarArrayOpExpr will give rise to N separate indexscans at runtime,
 * each of which involves one value from the RHS array, plus all the
 * non-array quals (if any).

I haven't checked the actual execution code, but this seems to be saying
that the GIN indexscan executor always does ANDs before ORs.  That means
that doing everything in the same GIN indexscan would require executing
the to_tsvector part 50 times, so I can definitely believe that shoving
the IN part to a different index and AND'ing afterwards is a better idea.
(Whether the GIN executor should be made smarter to avoid that is a
separate question ;-))

            regards, tom lane