Обсуждение: WHERE with ORDER not using the best index

Поиск
Список
Период
Сортировка

WHERE with ORDER not using the best index

От
jugnooken
Дата:
Hello everyone,

I've a query that runs on a table with a matching index to its WHERE and
ORDER clause. However the planner never uses that index. Is there any reason
why it doesn't?

Here's the table:

db=> \d social_feed_feed_items;
                                         Table
"public.social_feed_feed_items"
      Column       |            Type             |
Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
 id                | integer                     | not null default
nextval('social_feed_feed_items_id_seq'::regclass)
 social_feed_id    | integer                     |
 social_message_id | integer                     |
 posted_at         | timestamp without time zone |
Indexes:
    "social_message_feed_feed_items_pkey" PRIMARY KEY, btree (id)
    "index_social_feed_feed_items_on_social_feed_id" btree (social_feed_id)
    "index_social_feed_feed_items_on_social_feed_id_and_posted_at" btree
(social_feed_id, posted_at DESC NULLS LAST)
    "index_social_feed_feed_items_on_social_message_id" btree
(social_message_id)
    "social_feed_item_feed_message_index" btree (social_feed_id,
social_message_id)

Here's the query:

db=> EXPLAIN ANALYSE SELECT social_feed_feed_items.social_message_id FROM
social_feed_feed_items WHERE social_feed_feed_items.social_feed_id = 480
ORDER BY posted_at DESC NULLS LAST LIMIT 1200;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=126.83..127.43 rows=1200 width=12) (actual time=10.321..13.694
rows=1200 loops=1)
   ->  Sort  (cost=126.83..129.08 rows=4498 width=12) (actual
time=10.318..11.485 rows=1200 loops=1)
         Sort Key: posted_at
         Sort Method: top-N heapsort  Memory: 153kB
         ->  Index Scan using index_social_feed_feed_items_on_social_feed_id
on social_feed_feed_items  (cost=0.09..76.33 rows=4498 width=12) (actual
time=0.037..5.317 rows=4249 loops=1)
               Index Cond: (social_feed_id = 480)
 Total runtime: 14.913 ms
(7 rows)

I was hoping that they planner would use
index_social_feed_feed_items_on_social_feed_id_and_posted_at, but it never
does. If I manually remove the index that it currently uses then magic
happens:

db=> DROP INDEX index_social_feed_feed_items_on_social_feed_id;
DROP INDEX
db=> EXPLAIN ANALYSE SELECT social_feed_feed_items.social_message_id FROM
social_feed_feed_items WHERE social_feed_feed_items.social_feed_id = 480
ORDER BY posted_at DESC NULLS LAST LIMIT 1200;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.09..998.63 rows=1200 width=12) (actual time=0.027..3.792
rows=1200 loops=1)
   ->  Index Scan using
index_social_feed_feed_items_on_social_feed_id_and_posted_at on
social_feed_feed_items  (cost=0.09..3742.95 rows=4498 width=12) (actual
time=0.023..1.536 rows=1200 loops=1)
         Index Cond: (social_feed_id = 480)
 Total runtime: 4.966 ms
(4 rows)

So my question is, without dropping
index_social_feed_feed_items_on_social_feed_id since it's needed by other
queries, how do I make the planner use
index_social_feed_feed_items_on_social_feed_id_and_posted_at for a much
faster performance? Why didn't the query look at the matching WHERE and
ORDER clause and only chose the WHERE to begin its plan?

db=> show SERVER_VERSION;
 server_version
----------------
 9.3.2
(1 row)

Thank you very much for your response(s).

Regards,
Ken



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/WHERE-with-ORDER-not-using-the-best-index-tp5789581.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: WHERE with ORDER not using the best index

От
Tom Lane
Дата:
jugnooken <ken@jugnoo.com> writes:
> Here's the query:

> db=> EXPLAIN ANALYSE SELECT social_feed_feed_items.social_message_id FROM
> social_feed_feed_items WHERE social_feed_feed_items.social_feed_id = 480
> ORDER BY posted_at DESC NULLS LAST LIMIT 1200;

> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=126.83..127.43 rows=1200 width=12) (actual time=10.321..13.694
> rows=1200 loops=1)
>    ->  Sort  (cost=126.83..129.08 rows=4498 width=12) (actual
> time=10.318..11.485 rows=1200 loops=1)
>          Sort Key: posted_at
>          Sort Method: top-N heapsort  Memory: 153kB
>          ->  Index Scan using index_social_feed_feed_items_on_social_feed_id
> on social_feed_feed_items  (cost=0.09..76.33 rows=4498 width=12) (actual
> time=0.037..5.317 rows=4249 loops=1)
>                Index Cond: (social_feed_id = 480)
>  Total runtime: 14.913 ms
> (7 rows)

> I was hoping that they planner would use
> index_social_feed_feed_items_on_social_feed_id_and_posted_at, but it never
> does. If I manually remove the index that it currently uses then magic
> happens:

> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.09..998.63 rows=1200 width=12) (actual time=0.027..3.792
> rows=1200 loops=1)
>    ->  Index Scan using
> index_social_feed_feed_items_on_social_feed_id_and_posted_at on
> social_feed_feed_items  (cost=0.09..3742.95 rows=4498 width=12) (actual
> time=0.023..1.536 rows=1200 loops=1)
>          Index Cond: (social_feed_id = 480)
>  Total runtime: 4.966 ms
> (4 rows)

Well, it likes the first plan because it's estimating that one as cheaper
;-).  The question is why the indexscan cost is estimated so remarkably
high for the second index --- nearly two orders of magnitude more to
retrieve the same number of index entries.  The most obvious explanation
is that that index is horribly bloated for some reason.  Have you checked
the physical index sizes?  If the second index is many times bigger,
REINDEX ought to help, though it's unclear whether the bloat will recur.

            regards, tom lane


Re: WHERE with ORDER not using the best index

От
jugnooken
Дата:
Thank you so much for the prompt reply, Tom. The index is actually fairly new
- but to be safe I issued REINDEX TABLE so that they are all clean. Here are
the sizes of each index right after REINDEX.

db=> select
pg_size_pretty(pg_relation_size('index_social_feed_feed_items_on_social_feed_id_and_posted_at'));
 pg_size_pretty
----------------
 149 MB
(1 row)

db=> select
pg_size_pretty(pg_relation_size('index_social_feed_feed_items_on_social_feed_id'));
 pg_size_pretty
----------------
 106 MB
(1 row)

Unfortunately, pg still thinks using
index_social_feed_feed_items_on_social_feed_id is faster although they are
about the same size :(. Any idea?

Regards,
Ken



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/WHERE-with-ORDER-not-using-the-best-index-tp5789581p5789624.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: WHERE with ORDER not using the best index

От
Tom Lane
Дата:
jugnooken <ken@jugnoo.com> writes:
> Unfortunately, pg still thinks using
> index_social_feed_feed_items_on_social_feed_id is faster although they are
> about the same size :(. Any idea?

On further reflection, the cost estimate that is weird for this number of
rows is not the large one for your preferred index, but the small estimate
for the one the planner likes.  My guess is that that must be happening
because the latter index is nearly perfectly correlated with the table's
physical order, whereas yours is more or less random relative to table
order.

The fact that the former index is actually faster in use means that in
your environment, random access into the table is pretty cheap, which
means you should consider decreasing random_page_cost.  But first it'd
be a good idea to confirm that your test case is actually representative
of production behavior --- it's very easy to get fooled by all-in-cache
measurements, which are not reliable guides unless your database does in
fact fit in RAM.

            regards, tom lane