Seq scan instead of index scan querying single row from primary key on large table
От | James Coleman |
---|---|
Тема | Seq scan instead of index scan querying single row from primary key on large table |
Дата | |
Msg-id | CAAaqYe8uY_NoG-SmDzCyzkcggkDs5cN2stiJxvdZXVThZM+XQA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Seq scan instead of index scan querying single row from primary key on large table
|
Список | pgsql-hackers |
Hi all, I've been optimizing queries for a long time, and I don't think I've ever seen something more surprising to me than this -- sufficiently so that I wanted to ask if others thought it implied a bug. It's possible my mental model for the planner is broken in some significant way, or that I'm overlooking something obvious, so please let me know if that's the case too. We have a query like this (all tables renamed to be generic): SELECT DISTINCT "objects"."pk" FROM "objects" LEFT OUTER JOIN "objects" "allowed_objects_objects" ON "allowed_objects_objects"."pk" = "objects"."allowed_object_fk" AND "objects"."allowed_object_fk" IS NOT NULL LEFT OUTER JOIN "facilitated_object_metadata" ON "facilitated_object_metadata"."object_fk" = "objects"."pk" LEFT OUTER JOIN "object_audits" ON "object_audits"."object_fk" = "objects"."pk" LEFT OUTER JOIN "objects" "action_objects_objects" ON "action_objects_objects"."allowed_object_fk" IS NOT NULL AND "action_objects_objects"."allowed_object_fk" = "objects"."pk" LEFT OUTER JOIN "objects" "return_objects_objects" ON "return_objects_objects"."returned_object_fk" IS NOT NULL AND "return_objects_objects"."returned_object_fk" = "objects"."pk" LEFT OUTER JOIN "object_taxs" ON "object_taxs"."object_fk" = "objects"."pk" LEFT OUTER JOIN "object_configs" ON "object_configs"."object_fk" = "objects"."pk" LEFT OUTER JOIN "object_statuses" ON "object_statuses"."object_fk" = "objects"."pk" LEFT OUTER JOIN "object_edits" ON "object_edits"."object_fk" = "objects"."pk" LEFT OUTER JOIN "audit_answers" ON "audit_answers"."requester_type" = 'Transaction' AND "audit_answers"."requester_fk" = "objects"."pk" WHERE "objects"."pk" = 131690144 LIMIT 1; This is ORM generated, so granted there's some slight oddity with the "DISTINCT pk" and then the "LIMIT 1". The plan generated by the planner changed suddenly one morning this week, and in a very surprising way: the innermost scan (of "objects") started choosing a seq scan, despite the cost from that node being very high and an index scan being possible -- it's the primary key and we're restricting on a single value, so intuitively we know, and indeed the planner knows, that there will only be a single row returned. Here's the surprising plan: Limit (cost=3.42..4.00 rows=1 width=8) -> Limit (cost=3.42..4.00 rows=1 width=8) -> Nested Loop Left Join (cost=3.42..7939067.28 rows=13777920 width=8) -> Nested Loop Left Join (cost=2.86..7766839.95 rows=1059840 width=8) -> Nested Loop Left Join (cost=2.44..7753589.13 rows=211968 width=8) -> Nested Loop Left Join (cost=1.86..7750902.24 rows=138 width=8) -> Nested Loop Left Join (cost=1.29..7750895.86 rows=1 width=8) -> Nested Loop Left Join (cost=1.29..7750895.85 rows=1 width=8) -> Nested Loop Left Join (cost=0.86..7750893.39 rows=1 width=8) -> Nested Loop Left Join (cost=0.43..7750890.93 rows=1 width=8) -> Seq Scan on objects (cost=0.00..7750888.47 rows=1 width=16) Filter: (pk = 131690144) -> Index Only Scan using index_facilitated_object_metadata_on_object_fk on facilitated_object_metadata (cost=0.43..2.45 rows=1 width=8) Index Cond: (object_fk = 131690144) -> Index Only Scan using index_objects_on_allowed_object_fk_not_null on objects action_objects_objects (cost=0.43..2.45 rows=1 width=8) Index Cond: (allowed_object_fk = 131690144) -> Index Only Scan using index_objects_on_returned_object_fk_not_null on objects return_objects_objects (cost=0.43..2.45 rows=1 width=8) Index Cond: (returned_object_fk = 131690144) -> Seq Scan on object_taxs (cost=0.00..0.00 rows=1 width=8) Filter: (object_fk = 131690144) -> Index Only Scan using index_object_audits_on_object_fk on object_audits (cost=0.57..5.00 rows=138 width=8) Index Cond: (object_fk = 131690144) -> Materialize (cost=0.57..41.13 rows=1536 width=8) -> Index Only Scan using index_object_configs_on_object_id on object_configs (cost=0.57..33.45 rows=1536 width=8) Index Cond: (object_fk = 131690144) -> Materialize (cost=0.42..2.84 rows=5 width=8) -> Index Only Scan using index_adjustment_responses_on_response_fk on object_edits (cost=0.42..2.81 rows=5 width=8) Index Cond: (object_fk = 131690144) -> Materialize (cost=0.56..3.36 rows=13 width=8) -> Index Only Scan using index_audit_answers_on_requester_type_and_fk on audit_answers (cost=0.56..3.30 rows=13 width=8) Index Cond: ((requester_type = 'Object'::bt_object_or_alternate_object) AND (requester_fk = 131690144)) Note the innermost table scan: Seq Scan on objects (cost=0.00..7750888.47 rows=1 width=16) Filter: (pk = 131690144) If I set enable_seqscan = off, then I get the old plan: Limit (cost=4.12..4.13 rows=1 width=8) -> Limit (cost=4.12..4.13 rows=1 width=8) -> Nested Loop Left Join (cost=4.12..188183.54 rows=13777920 width=8) -> Nested Loop Left Join (cost=3.55..15956.20 rows=1059840 width=8) -> Nested Loop Left Join (cost=3.13..2705.38 rows=211968 width=8) -> Nested Loop Left Join (cost=2.55..18.49 rows=138 width=8) -> Nested Loop Left Join (cost=1.98..12.11 rows=1 width=8) -> Nested Loop Left Join (cost=1.86..9.96 rows=1 width=8) -> Nested Loop Left Join (cost=1.43..7.50 rows=1 width=8) -> Nested Loop Left Join (cost=0.99..5.04 rows=1 width=8) -> Index Scan using objects_pkey on objects (cost=0.57..2.58 rows=1 width=16) Index Cond: (pk = 131690144) -> Index Only Scan using index_facilitated_object_metadata_on_object_fk on facilitated_object_metadata (cost=0.43..2.45 rows=1 width=8) Index Cond: (object_fk = 131690144) -> Index Only Scan using index_objects_on_allowed_object_fk_not_null on objects action_objects_objects (cost=0.43..2.45 rows=1 width=8) Index Cond: (allowed_object_fk = 131690144) -> Index Only Scan using index_objects_on_returned_object_fk_not_null on objects return_objects_objects (cost=0.43..2.45 rows=1 width=8) Index Cond: (returned_object_fk = 131690144) -> Index Only Scan using index_object_taxs_on_object_fk on object_taxs (cost=0.12..2.14 rows=1 width=8) Index Cond: (object_fk = 131690144) -> Index Only Scan using index_object_audits_on_object_fk on object_audits (cost=0.57..5.00 rows=138 width=8) Index Cond: (object_fk = 131690144) -> Materialize (cost=0.57..41.13 rows=1536 width=8) -> Index Only Scan using index_object_configs_on_object_id on object_configs (cost=0.57..33.45 rows=1536 width=8) Index Cond: (object_fk = 131690144) -> Materialize (cost=0.42..2.84 rows=5 width=8) -> Index Only Scan using index_adjustment_responses_on_response_fk on object_edits (cost=0.42..2.81 rows=5 width=8) Index Cond: (object_fk = 131690144) -> Materialize (cost=0.56..3.36 rows=13 width=8) -> Index Only Scan using index_audit_answers_on_requester_type_and_fk on audit_answers (cost=0.56..3.30 rows=13 width=8) Index Cond: ((requester_type = 'Object'::bt_object_or_alternate_object) AND (requester_fk = 131690144)) Notice the innermost table scan is the expect index scan: -> Index Scan using objects_pkey on objects (cost=0.57..2.58 rows=1 width=16) Index Cond: (pk = 131690144) The join order stays the same between the two plans. The scan on object_taxs changes from a seq scan to an index scan (the table is empty so neither would be a problem) obviously because of disabled sequence scans. Analyzing object_audits during the incident did *not* change the plan, but a later auto-analyze of that same table seemed to change the plan back. That being said, we're now back at the bad plan (without setting enable_seqscan = off). I don't understand why that analyze should really affect anything at all given that join order stays the same between good/bad plans and given that I'd expect the planner to strongly prefer the index scan given its far lower cost for the number of estimated tuples. Analyzing objects doesn't help. Removing the DISTINCT does not fix the plan. Removing the LIMIT 1 does fix the plan and removes the double LIMIT node in the plan. The second LIMIT node is implicit from the DISTINCT, and it doesn't seem to me that a LIMIT node should change the cost/path chosen of an identical limit node inside it, and indeed the cost seems to be the same at both. The only explanation I can come up with is that the startup cost for the seq scan is 0 while for the index scan is 0.57, and the LIMIT at the top level is causing the planner to care about startup cost. Assuming that's true I think the early return cost multiplication of the LIMIT is being applied very naively on the seq scan node. Or perhaps the issue is that the startup cost for a single tuple on a seq scan like this shouldn't really have a startup cost of 0 -- that cost is presumably for tuples being returned _without_ having applied the filter. That seems slightly odd to me, because the cost of getting the first row out of that node -- in my naive view thinking about it for all of 5 seconds -- should be calculated based on applying the filter (and thus the likelihood that that filter matches right away). If we did that then this path would never win. But that 0.00 startup cost for the seq scan with a filter shows up in PG14 and PG11 also, not just PG16, so that's not something that's changed. To recap: the estimation of rows is correct, the estimated high (total) cost of the seq scan is correct, but the seq scan is chosen over the index scan anyway for a plan that returns a single "random" row based on the primary key. Am I right to be surprised here? James Coleman
В списке pgsql-hackers по дате отправления: