Обсуждение: WHERE with ORDER not using the best index
Hello everyone, I've a query that runs on a table with a matching index to its WHERE and ORDER clause. However the planner never uses that index. Is there any reason why it doesn't? Here's the table: db=> \d social_feed_feed_items; Table "public.social_feed_feed_items" Column | Type | Modifiers -------------------+-----------------------------+--------------------------------------------------------------------- id | integer | not null default nextval('social_feed_feed_items_id_seq'::regclass) social_feed_id | integer | social_message_id | integer | posted_at | timestamp without time zone | Indexes: "social_message_feed_feed_items_pkey" PRIMARY KEY, btree (id) "index_social_feed_feed_items_on_social_feed_id" btree (social_feed_id) "index_social_feed_feed_items_on_social_feed_id_and_posted_at" btree (social_feed_id, posted_at DESC NULLS LAST) "index_social_feed_feed_items_on_social_message_id" btree (social_message_id) "social_feed_item_feed_message_index" btree (social_feed_id, social_message_id) Here's the query: db=> EXPLAIN ANALYSE SELECT social_feed_feed_items.social_message_id FROM social_feed_feed_items WHERE social_feed_feed_items.social_feed_id = 480 ORDER BY posted_at DESC NULLS LAST LIMIT 1200; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=126.83..127.43 rows=1200 width=12) (actual time=10.321..13.694 rows=1200 loops=1) -> Sort (cost=126.83..129.08 rows=4498 width=12) (actual time=10.318..11.485 rows=1200 loops=1) Sort Key: posted_at Sort Method: top-N heapsort Memory: 153kB -> Index Scan using index_social_feed_feed_items_on_social_feed_id on social_feed_feed_items (cost=0.09..76.33 rows=4498 width=12) (actual time=0.037..5.317 rows=4249 loops=1) Index Cond: (social_feed_id = 480) Total runtime: 14.913 ms (7 rows) I was hoping that they planner would use index_social_feed_feed_items_on_social_feed_id_and_posted_at, but it never does. If I manually remove the index that it currently uses then magic happens: db=> DROP INDEX index_social_feed_feed_items_on_social_feed_id; DROP INDEX db=> EXPLAIN ANALYSE SELECT social_feed_feed_items.social_message_id FROM social_feed_feed_items WHERE social_feed_feed_items.social_feed_id = 480 ORDER BY posted_at DESC NULLS LAST LIMIT 1200; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.09..998.63 rows=1200 width=12) (actual time=0.027..3.792 rows=1200 loops=1) -> Index Scan using index_social_feed_feed_items_on_social_feed_id_and_posted_at on social_feed_feed_items (cost=0.09..3742.95 rows=4498 width=12) (actual time=0.023..1.536 rows=1200 loops=1) Index Cond: (social_feed_id = 480) Total runtime: 4.966 ms (4 rows) So my question is, without dropping index_social_feed_feed_items_on_social_feed_id since it's needed by other queries, how do I make the planner use index_social_feed_feed_items_on_social_feed_id_and_posted_at for a much faster performance? Why didn't the query look at the matching WHERE and ORDER clause and only chose the WHERE to begin its plan? db=> show SERVER_VERSION; server_version ---------------- 9.3.2 (1 row) Thank you very much for your response(s). Regards, Ken -- View this message in context: http://postgresql.1045698.n5.nabble.com/WHERE-with-ORDER-not-using-the-best-index-tp5789581.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
jugnooken <ken@jugnoo.com> writes: > Here's the query: > db=> EXPLAIN ANALYSE SELECT social_feed_feed_items.social_message_id FROM > social_feed_feed_items WHERE social_feed_feed_items.social_feed_id = 480 > ORDER BY posted_at DESC NULLS LAST LIMIT 1200; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=126.83..127.43 rows=1200 width=12) (actual time=10.321..13.694 > rows=1200 loops=1) > -> Sort (cost=126.83..129.08 rows=4498 width=12) (actual > time=10.318..11.485 rows=1200 loops=1) > Sort Key: posted_at > Sort Method: top-N heapsort Memory: 153kB > -> Index Scan using index_social_feed_feed_items_on_social_feed_id > on social_feed_feed_items (cost=0.09..76.33 rows=4498 width=12) (actual > time=0.037..5.317 rows=4249 loops=1) > Index Cond: (social_feed_id = 480) > Total runtime: 14.913 ms > (7 rows) > I was hoping that they planner would use > index_social_feed_feed_items_on_social_feed_id_and_posted_at, but it never > does. If I manually remove the index that it currently uses then magic > happens: > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.09..998.63 rows=1200 width=12) (actual time=0.027..3.792 > rows=1200 loops=1) > -> Index Scan using > index_social_feed_feed_items_on_social_feed_id_and_posted_at on > social_feed_feed_items (cost=0.09..3742.95 rows=4498 width=12) (actual > time=0.023..1.536 rows=1200 loops=1) > Index Cond: (social_feed_id = 480) > Total runtime: 4.966 ms > (4 rows) Well, it likes the first plan because it's estimating that one as cheaper ;-). The question is why the indexscan cost is estimated so remarkably high for the second index --- nearly two orders of magnitude more to retrieve the same number of index entries. The most obvious explanation is that that index is horribly bloated for some reason. Have you checked the physical index sizes? If the second index is many times bigger, REINDEX ought to help, though it's unclear whether the bloat will recur. regards, tom lane
Thank you so much for the prompt reply, Tom. The index is actually fairly new - but to be safe I issued REINDEX TABLE so that they are all clean. Here are the sizes of each index right after REINDEX. db=> select pg_size_pretty(pg_relation_size('index_social_feed_feed_items_on_social_feed_id_and_posted_at')); pg_size_pretty ---------------- 149 MB (1 row) db=> select pg_size_pretty(pg_relation_size('index_social_feed_feed_items_on_social_feed_id')); pg_size_pretty ---------------- 106 MB (1 row) Unfortunately, pg still thinks using index_social_feed_feed_items_on_social_feed_id is faster although they are about the same size :(. Any idea? Regards, Ken -- View this message in context: http://postgresql.1045698.n5.nabble.com/WHERE-with-ORDER-not-using-the-best-index-tp5789581p5789624.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
jugnooken <ken@jugnoo.com> writes: > Unfortunately, pg still thinks using > index_social_feed_feed_items_on_social_feed_id is faster although they are > about the same size :(. Any idea? On further reflection, the cost estimate that is weird for this number of rows is not the large one for your preferred index, but the small estimate for the one the planner likes. My guess is that that must be happening because the latter index is nearly perfectly correlated with the table's physical order, whereas yours is more or less random relative to table order. The fact that the former index is actually faster in use means that in your environment, random access into the table is pretty cheap, which means you should consider decreasing random_page_cost. But first it'd be a good idea to confirm that your test case is actually representative of production behavior --- it's very easy to get fooled by all-in-cache measurements, which are not reliable guides unless your database does in fact fit in RAM. regards, tom lane