Обсуждение: Planner won't use composite index if there is an order by ????

Поиск
Список
Период
Сортировка

Planner won't use composite index if there is an order by ????

От
Dave Cramer
Дата:
Any idea why it wouldn't choose the right index ?

This is 8.3

> # \d battles
>                                          Table "public.battles"
>       Column        |            Type             |
> Modifiers
> ---------------------+-----------------------------
> +------------------------------------------------------
> id                  | integer                     | not null default
> nextval('battles_id_seq'::regclass)
> user_id             | integer                     | not null
> contest_id          | integer                     | not null
> entry_1_id          | integer                     | not null
> entry_2_id          | integer                     | not null
> new_entry_1_score   | integer                     |
> new_entry_2_score   | integer                     |
> score               | integer                     |
> scored_at           | timestamp without time zone |
> created_at          | timestamp without time zone | not null
> function_profile_id | integer                     |
> battle_type         | integer                     | default 0
> Indexes:
>    "battles_pkey" PRIMARY KEY, btree (id)
>    "unique_with_type" UNIQUE, btree (user_id, entry_1_id, entry_2_id,
> battle_type)
>    "battles_by_contest_and_type" btree (contest_id, battle_type)
>    "battles_by_time" btree (scored_at)
> Foreign-key constraints:
>    "fk_battles_contests" FOREIGN KEY (contest_id) REFERENCES
> contests(id)
>    "fk_battles_lefty" FOREIGN KEY (entry_1_id) REFERENCES entries(id)
>    "fk_battles_righty" FOREIGN KEY (entry_2_id) REFERENCES entries(id)
>    "fk_battles_users" FOREIGN KEY (user_id) REFERENCES users(id)
>
>
> Here is the analyze of the query we want but it takes forever because
> its using the index for the sort instead of restricting the number of
> battles by user_id:
>
> ourstage_production=# explain analyze SELECT * FROM battles WHERE
> user_id = 196698 and scored_at is not null and score in (-3,3) ORDER
> BY
> id DESC LIMIT 5;
>
> QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..8381.61 rows=5 width=56) (actual
> time=124421.499..183659.404 rows=2 loops=1)
>   ->  Index Scan Backward using battles_pkey on battles
> (cost=0.00..670528.67 rows=400 width=56) (actual
> time=124421.495..183659.394 rows=2 loops=1)
>         Filter: ((scored_at IS NOT NULL) AND (score = ANY
> ('{-3,3}'::integer[])) AND (user_id = 196698))
> Total runtime: 183659.446 ms
> (4 rows)
>
>
> If you remove the ORDER BY then it runs in 4 ms:
>
> ourstage_production=# explain analyze SELECT * FROM battles WHERE
> user_id = 196698 and scored_at is not null and score in (-3,3) LIMIT
> 5;
>                                                              QUERY
> PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..126.65 rows=5 width=56) (actual time=4.607..4.621
> rows=2 loops=1)
>   ->  Index Scan using unique_with_type on battles
> (cost=0.00..10131.66 rows=400 width=56) (actual time=4.603..4.611
> rows=2 loops=1)
>         Index Cond: (user_id = 196698)
>         Filter: ((scored_at IS NOT NULL) AND (score = ANY
> ('{-3,3}'::integer[])))
> Total runtime: 4.660 ms
> (5 rows)
>
>
> Here we tried to limit the table scan by time so that it would scan
> far
> fewer records.  But what ended up happening is that it flipped it over
> to using the right index.  The one that is based on user_id is much
> preferred:
>
>
> ourstage_production=# explain analyze SELECT * FROM battles WHERE
> user_id = 196698 and scored_at is not null and score in (-3,3) and
> scored_at > now() - INTERVAL '6 month' ORDER BY id DESC LIMIT 5;
>                                                                 QUERY
> PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=10158.16..10158.18 rows=5 width=56) (actual
> time=0.097..0.106 rows=2 loops=1)
>   ->  Sort  (cost=10158.16..10158.92 rows=302 width=56) (actual
> time=0.094..0.096 rows=2 loops=1)
>         Sort Key: id
>         Sort Method:  quicksort  Memory: 25kB
>         ->  Index Scan using unique_with_type on battles
> (cost=0.00..10153.15 rows=302 width=56) (actual time=0.069..0.078
> rows=2 loops=1)
>               Index Cond: (user_id = 196698)
>               Filter: ((scored_at IS NOT NULL) AND (score = ANY
> ('{-3,3}'::integer[])) AND (scored_at > (now() - '6 mons'::interval)))
> Total runtime: 0.152 ms
> (8 rows)
>
>
> Notice that we added time restriction and it now chooses to not use
> the
> time index and goes after the index based on user_id.  Why?  We
> don't know.


Re: Planner won't use composite index if there is an order by ????

От
Tom Lane
Дата:
Dave Cramer <pg@fastcrypt.com> writes:
> Any idea why it wouldn't choose the right index ?

It thinks that way is cheaper.  It's estimating its cost at 8381,
whereas selecting all the rows then sorting must take 10131 plus
some time to sort.

The reason why those estimates are so far off from reality is directly
tied to the 400-estimated-vs-2-actual rowcount estimation error.
I'm not sure how much of that could be fixed by raising the stats target
for the table, but that's certainly something to try.

Another thing you should look at is eliminating dependences between
columns.  I'll bet that the "scored_at is not null" condition is either
redundant with the "score in ..." condition, or could be made so (ie
force score to null when scored_at is null).  If you could get rid of
the separate test on scored_at, it'd help to avoid the estimation weak
spot of correlated restrictions.

            regards, tom lane