Обсуждение: Why is the optimiser choosing the slower query, or, understanding explain analyze output
Why is the optimiser choosing the slower query, or, understanding explain analyze output
От
Alistair Bayley
Дата:
I have postgresql 8.4.15 on Ubuntu 10.04 and this query: SELECT MAX(probeTable.PROBE_ALARM_EVENT_ID) AS MAX_EVENT_ID FROM ALARM_EVENT eventTable INNER JOIN ALARM_EVENT_PROBE probeTable ON eventTable.ALARM_EVENT_ID = probeTable.ALARM_EVENT_ID WHERE probeTable.PROBE_ID = 2 which is running slower than it could. Table definitions and explain analyze output below. The first explain is the current plan (uses sequential scans). The second is after I have disabled sequential scans, and is the plan I would prefer. I have vacuum analyzed both tables. In terms of relevant changes to the default postgresql.conf, we have these: shared_buffers = 28MB constraint_exclusion = on I want to understand why the optimiser is choosing the plan with sequential table scans, rather than the plan with index scans. I am not sure how to interpret the predicted vs actual times/costs, and want to understand why the predicted cost for the index scan plan seems to be way off. I have read: https://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf show server_version; server_version ---------------- 8.4.15 Table "public.alarm_event_probe" Column | Type | Modifiers ----------------------+---------+----------- alarm_event_id | bigint | not null probe_id | integer | not null probe_alarm_event_id | bigint | not null Indexes: "alarm_event_probe_pkey" PRIMARY KEY, btree (alarm_event_id) "alarm_event_probe_unique" UNIQUE, btree (probe_id, probe_alarm_event_id) Foreign-key constraints: "alarm_event_probe_fk" FOREIGN KEY (probe_id) REFERENCES probe(probe_id) Triggers: alarm_event_probe_alarm_event_foreign_key_trigger BEFORE INSERT OR UPDATE ON alarm_event_probe FOR EACH ROW EXECUTE PROCEDURE alarm_event_foreign_key_function() Table "public.alarm_event" Column | Type | Modifiers ------------------------+-----------------------------+---------------------------------------------------------------------- alarm_event_id | bigint | not null default nextval('alarm_event_alarm_event_id_seq'::regclass) timestamp_device_utc | timestamp without time zone | timestamp_manager_utc | timestamp without time zone | timestamp_proxy_utc | timestamp without time zone | not null timestamp_database_utc | timestamp without time zone | not null default systimestamp() device_name | character varying(32) | not null device_location | character varying(4) | not null device_type | character varying(6) | not null device_category | character varying(32) | not null device_instance | integer | not null proxy_name | character varying(128) | not null proxy_instance | character varying(32) | not null proxy_source | character varying(256) | not null native_alarm_id | character varying(32) | alarm_name | character varying(64) | not null alarm_severity | integer | alarm_description | character varying(1024) | not null org_code | integer | not null default 348 domain_id | integer | service_state | integer | not null default 0 proactive | boolean | not null default true alarm_id | bigint | Indexes: "alarm_event_pkey" PRIMARY KEY, btree (alarm_event_id) Check constraints: "alarm_event_alarm_severity_range" CHECK (alarm_severity >= 0 AND alarm_severity <= 5) "alarm_event_service_state_range" CHECK (service_state >= 0 AND service_state <= 2) Foreign-key constraints: "alarm_event_domain_fk" FOREIGN KEY (domain_id) REFERENCES domain(domain_id) "alarm_event_organisation_fk" FOREIGN KEY (org_code) REFERENCES organisation(org_code) Triggers: alarm_event_1_raw_trigger AFTER INSERT ON alarm_event FOR EACH ROW EXECUTE PROCEDURE alarm_event_raw_trigger_function() alarm_event_2_trigger_a AFTER INSERT OR DELETE OR UPDATE ON alarm_event FOR EACH ROW EXECUTE PROCEDURE alarm_event_trigger_function() alarm_event_probe_alarm_event_delete_cascade_trigger BEFORE DELETE ON alarm_event FOR EACH ROW EXECUTE PROCEDURE alarm_event_delete_alarm_event_probe_function() alarm_event_trigger_b BEFORE INSERT OR DELETE OR UPDATE ON alarm_event FOR EACH ROW EXECUTE PROCEDURE alarm_event_trigger_function() alarm_event_zz_insert_trigger AFTER INSERT ON alarm_event FOR EACH ROW EXECUTE PROCEDURE alarm_event_insert() Number of child tables: 33 (Use \d+ to list them.) QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=451984.11..451984.12 rows=1 width=8) (actual time=248693.886..248693.889 rows=1 loops=1) -> Hash Join (cost=107.06..443578.92 rows=3362075 width=8) (actual time=2521.275..248683.459 rows=1934 loops=1) Hash Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Append (cost=0.00..370989.07 rows=7772408 width=8) (actual time=14.364..220430.509 rows=7771865 loops=1) -> Seq Scan on alarm_event eventtable (cost=0.00..3.00 rows=1 width=8) (actual time=0.088..0.088 rows=0 loops=1) -> Seq Scan on alarm_event_y2011m12 eventtable (cost=0.00..12638.54 rows=290254 width=8) (actual time=14.267..11136.290 rows=290254 loops=1) -> Seq Scan on alarm_event_y2011m10 eventtable (cost=0.00..9146.57 rows=214457 width=8) (actual time=7.719..5820.900 rows=214457 loops=1) -> Seq Scan on alarm_event_y2011m09 eventtable (cost=0.00..1183.60 rows=27660 width=8) (actual time=0.046..111.918 rows=27660 loops=1) -> Seq Scan on alarm_event_y2011m11 eventtable (cost=0.00..6209.42 rows=149342 width=8) (actual time=0.055..662.708 rows=149342 loops=1) -> Seq Scan on alarm_event_y2012m01 eventtable (cost=0.00..8661.84 rows=207184 width=8) (actual time=0.075..943.065 rows=207184 loops=1) -> Seq Scan on alarm_event_y2012m02 eventtable (cost=0.00..7824.78 rows=182378 width=8) (actual time=0.051..6620.416 rows=182378 loops=1) -> Seq Scan on alarm_event_y2012m03 eventtable (cost=0.00..16717.50 rows=386350 width=8) (actual time=101.301..14018.406 rows=386350 loops=1) -> Seq Scan on alarm_event_y2012m04 eventtable (cost=0.00..10125.17 rows=238117 width=8) (actual time=25.155..3013.045 rows=238117 loops=1) -> Seq Scan on alarm_event_y2012m05 eventtable (cost=0.00..10747.73 rows=251573 width=8) (actual time=0.058..1605.062 rows=251573 loops=1) -> Seq Scan on alarm_event_y2012m06 eventtable (cost=0.00..16638.79 rows=387879 width=8) (actual time=0.122..4477.169 rows=387879 loops=1) -> Seq Scan on alarm_event_y2012m07 eventtable (cost=0.00..9675.58 rows=222658 width=8) (actual time=85.891..8504.216 rows=222658 loops=1) -> Seq Scan on alarm_event_y2012m08 eventtable (cost=0.00..9570.01 rows=234201 width=8) (actual time=106.049..7978.829 rows=234201 loops=1) -> Seq Scan on alarm_event_y2012m09 eventtable (cost=0.00..12731.91 rows=300791 width=8) (actual time=5.516..5596.174 rows=300791 loops=1) -> Seq Scan on alarm_event_y2012m10 eventtable (cost=0.00..11052.83 rows=266483 width=8) (actual time=0.064..1159.065 rows=266483 loops=1) -> Seq Scan on alarm_event_y2012m11 eventtable (cost=0.00..9540.78 rows=226778 width=8) (actual time=0.045..878.269 rows=226778 loops=1) -> Seq Scan on alarm_event_y2012m12 eventtable (cost=0.00..8110.64 rows=208464 width=8) (actual time=0.061..883.399 rows=208464 loops=1) -> Seq Scan on alarm_event_y2013m01 eventtable (cost=0.00..13391.78 rows=272178 width=8) (actual time=0.028..10847.953 rows=270708 loops=1) -> Seq Scan on alarm_event_y2013m02 eventtable (cost=0.00..15720.65 rows=297365 width=8) (actual time=134.814..8192.080 rows=297204 loops=1) -> Seq Scan on alarm_event_y2013m03 eventtable (cost=0.00..41027.29 rows=810229 width=8) (actual time=11.567..8419.318 rows=805929 loops=1) -> Seq Scan on alarm_event_y2013m04 eventtable (cost=0.00..13382.35 rows=253135 width=8) (actual time=0.072..9731.649 rows=253329 loops=1) -> Seq Scan on alarm_event_y2013m05 eventtable (cost=0.00..9793.55 rows=175455 width=8) (actual time=4.434..9965.956 rows=176525 loops=1) -> Seq Scan on alarm_event_y2013m06 eventtable (cost=0.00..9961.74 rows=184074 width=8) (actual time=123.567..11228.522 rows=184335 loops=1) -> Seq Scan on alarm_event_y2013m07 eventtable (cost=0.00..10330.15 rows=190215 width=8) (actual time=4.728..4839.910 rows=189743 loops=1) -> Seq Scan on alarm_event_y2013m08 eventtable (cost=0.00..8808.29 rows=160329 width=8) (actual time=8.731..2313.534 rows=161477 loops=1) -> Seq Scan on alarm_event_y2013m09 eventtable (cost=0.00..14232.17 rows=273617 width=8) (actual time=0.035..1367.249 rows=273621 loops=1) -> Seq Scan on alarm_event_y2013m10 eventtable (cost=0.00..17202.44 rows=320544 width=8) (actual time=0.105..1853.031 rows=320310 loops=1) -> Seq Scan on alarm_event_y2013m11 eventtable (cost=0.00..15857.97 rows=278997 width=8) (actual time=0.044..7627.316 rows=281997 loops=1) -> Seq Scan on alarm_event_y2013m12 eventtable (cost=0.00..15012.38 rows=278738 width=8) (actual time=121.673..11397.349 rows=280114 loops=1) -> Seq Scan on alarm_event_y2014m01 eventtable (cost=0.00..17534.21 rows=331521 width=8) (actual time=70.202..7098.822 rows=330846 loops=1) -> Seq Scan on alarm_event_y2014m02 eventtable (cost=0.00..8124.21 rows=151321 width=8) (actual time=22.929..1894.643 rows=151158 loops=1) -> Seq Scan on alarm_event_y2014m03 eventtable (cost=0.00..10.40 rows=40 width=8) (actual time=0.003..0.003 rows=0 loops=1) -> Seq Scan on alarm_event_y2014m04 eventtable (cost=0.00..10.40 rows=40 width=8) (actual time=0.004..0.004 rows=0 loops=1) -> Seq Scan on alarm_event_y2014m05 eventtable (cost=0.00..10.40 rows=40 width=8) (actual time=0.003..0.003 rows=0 loops=1) -> Hash (cost=82.89..82.89 rows=1934 width=16) (actual time=16.402..16.402 rows=1934 loops=1) -> Seq Scan on alarm_event_probe probetable (cost=0.00..82.89 rows=1934 width=16) (actual time=0.403..8.985 rows=1934 loops=1) Filter: (probe_id = 2) Total runtime: 248864.890 ms (42 rows) set enable_seqscan = false; SET QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=502051.06..502051.07 rows=1 width=8) (actual time=11597.236..11597.239 rows=1 loops=1) -> Nested Loop (cost=0.00..493645.84 rows=3362085 width=8) (actual time=151.505..11587.339 rows=1934 loops=1) Join Filter: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_probe_pkey on alarm_event_probe probetable (cost=0.00..168.49 rows=1934 width=16) (actual time=143.484..159.889 rows=1934 loops=1) Filter: (probe_id = 2) -> Append (cost=0.00..254.73 rows=34 width=8) (actual time=5.343..5.874 rows=1 loops=1934) -> Index Scan using alarm_event_pkey on alarm_event eventtable (cost=0.00..6.65 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2011m12_alarm_event_id_ix on alarm_event_y2011m12 eventtable (cost=0.00..7.89 rows=1 width=8) (actual time=0.378..0.378 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2011m10_alarm_event_id_ix on alarm_event_y2011m10 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2011m09_alarm_event_id_ix on alarm_event_y2011m09 eventtable (cost=0.00..7.79 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2011m11_alarm_event_id_ix on alarm_event_y2011m11 eventtable (cost=0.00..7.85 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m01_alarm_event_id_ix on alarm_event_y2012m01 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.479..0.480 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m02_alarm_event_id_ix on alarm_event_y2012m02 eventtable (cost=0.00..7.86 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m03_alarm_event_id_ix on alarm_event_y2012m03 eventtable (cost=0.00..7.91 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m04_alarm_event_id_ix on alarm_event_y2012m04 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m05_alarm_event_id_ix on alarm_event_y2012m05 eventtable (cost=0.00..7.88 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m06_alarm_event_id_ix on alarm_event_y2012m06 eventtable (cost=0.00..7.91 rows=1 width=8) (actual time=0.071..0.071 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m07_alarm_event_id_ix on alarm_event_y2012m07 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.016..0.016 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m08_alarm_event_id_ix on alarm_event_y2012m08 eventtable (cost=0.00..7.88 rows=1 width=8) (actual time=0.027..0.027 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m09_alarm_event_id_ix on alarm_event_y2012m09 eventtable (cost=0.00..7.89 rows=1 width=8) (actual time=0.664..0.670 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m10_alarm_event_id_ix on alarm_event_y2012m10 eventtable (cost=0.00..7.88 rows=1 width=8) (actual time=0.381..0.381 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m11_alarm_event_id_ix on alarm_event_y2012m11 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.308..0.309 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2012m12_alarm_event_id_ix on alarm_event_y2012m12 eventtable (cost=0.00..7.86 rows=1 width=8) (actual time=0.033..0.033 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m01_alarm_event_id_ix on alarm_event_y2013m01 eventtable (cost=0.00..7.89 rows=1 width=8) (actual time=0.032..0.032 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m02_alarm_event_id_ix on alarm_event_y2013m02 eventtable (cost=0.00..7.90 rows=1 width=8) (actual time=0.254..0.254 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m03_alarm_event_id_ix on alarm_event_y2013m03 eventtable (cost=0.00..8.03 rows=1 width=8) (actual time=0.030..0.030 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m04_alarm_event_id_ix on alarm_event_y2013m04 eventtable (cost=0.00..7.89 rows=1 width=8) (actual time=0.095..0.095 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m05_alarm_event_id_ix on alarm_event_y2013m05 eventtable (cost=0.00..7.86 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m06_alarm_event_id_ix on alarm_event_y2013m06 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.104..0.104 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m07_alarm_event_id_ix on alarm_event_y2013m07 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.179..0.180 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m08_alarm_event_id_ix on alarm_event_y2013m08 eventtable (cost=0.00..7.87 rows=1 width=8) (actual time=0.250..0.250 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m09_alarm_event_id_ix on alarm_event_y2013m09 eventtable (cost=0.00..7.89 rows=1 width=8) (actual time=0.626..0.627 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m10_alarm_event_id_ix on alarm_event_y2013m10 eventtable (cost=0.00..7.90 rows=1 width=8) (actual time=0.300..0.300 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m11_alarm_event_id_ix on alarm_event_y2013m11 eventtable (cost=0.00..7.90 rows=1 width=8) (actual time=0.201..0.201 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2013m12_alarm_event_id_ix on alarm_event_y2013m12 eventtable (cost=0.00..7.90 rows=1 width=8) (actual time=0.552..0.552 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2014m01_alarm_event_id_ix on alarm_event_y2014m01 eventtable (cost=0.00..7.91 rows=1 width=8) (actual time=0.323..0.324 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2014m02_alarm_event_id_ix on alarm_event_y2014m02 eventtable (cost=0.00..7.86 rows=1 width=8) (actual time=0.302..0.303 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2014m03_alarm_event_id_ix on alarm_event_y2014m03 eventtable (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2014m04_alarm_event_id_ix on alarm_event_y2014m04 eventtable (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) -> Index Scan using alarm_event_y2014m05_alarm_event_id_ix on alarm_event_y2014m05 eventtable (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=1934) Index Cond: (eventtable.alarm_event_id = probetable.alarm_event_id) Total runtime: 11599.260 ms (75 rows)
On Mon, Feb 17, 2014 at 1:54 PM, Alistair Bayley <alistair@abayley.org> wrote:
I have postgresql 8.4.15 on Ubuntu 10.04 and this query:
SELECT MAX(probeTable.PROBE_ALARM_EVENT_ID) AS MAX_EVENT_ID
FROM ALARM_EVENT eventTable
INNER JOIN ALARM_EVENT_PROBE probeTable
ON eventTable.ALARM_EVENT_ID = probeTable.ALARM_EVENT_ID
WHERE probeTable.PROBE_ID = 2
which is running slower than it could. Table definitions and explain
analyze output below.
The first explain is the current plan (uses sequential scans).
The second is after I have disabled sequential scans, and is the plan
I would prefer.
I have vacuum analyzed both tables. In terms of relevant changes to
the default postgresql.conf, we have these:
shared_buffers = 28MB
constraint_exclusion = on
I want to understand why the optimiser is choosing the plan with
sequential table scans, rather than the plan with index scans.
I am not sure how to interpret the predicted vs actual times/costs,
and want to understand why the predicted cost for the index scan plan
seems to be way off.
The planner clamps the estimated number of rows from an index scan at 1 row, even if it actually believes the number will be 0. That makes the logical simpler, avoiding needs to test for division by zero all over the place, and probably makes it more robust to mis-estimation in most use cases. But in this case, that means it thinks it will find 34 rows, one from each partition, which is way too high.
Now, there certainly is some cost to test an index and finding that no rows in it can match. But your query is probably probing the same spot in each index for each negative match, which means all the blocks are already in memory. But PostgreSQL doesn't know that, so even if it didn't do the clamp it would probably still not get the right answer.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Mon, Feb 17, 2014 at 1:54 PM, Alistair Bayley <alistair@abayley.org>wrote: >> I want to understand why the optimiser is choosing the plan with >> sequential table scans, rather than the plan with index scans. > The planner clamps the estimated number of rows from an index scan at 1 > row, even if it actually believes the number will be 0. That makes the > logical simpler, avoiding needs to test for division by zero all over the > place, and probably makes it more robust to mis-estimation in most use > cases. But in this case, that means it thinks it will find 34 rows, one > from each partition, which is way too high. Even if it believed the zero row estimate it's probably getting internally, the cost estimate wouldn't change much, because as you say it's still got to assume that the index will be traversed to verify that there's no such row(s). I notice though that the cost estimate for the seqscan plan isn't all that much lower than that for the indexscan plan. Probably lowering random_page_cost a bit would change the planner's mind. We have no information about total size of database vs available RAM, but if it's a mostly memory-resident database then such a change would be a good idea. regards, tom lane
Re: Why is the optimiser choosing the slower query, or, understanding explain analyze output
От
Alistair Bayley
Дата:
On 18 February 2014 14:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I notice though that the cost estimate for the seqscan plan isn't all that > much lower than that for the indexscan plan. Probably lowering > random_page_cost a bit would change the planner's mind. We have no > information about total size of database vs available RAM, but if it's > a mostly memory-resident database then such a change would be a good idea. select pg_size_pretty(pg_database_size('fms')); pg_size_pretty ---------------- 3066 MB (1 row) The DB sits on a dedicated VM with 2G RAM, of which only about 600M is currently used. So assuming it is mostly memory-resident seems pretty reasonable. I'm particularly interested in the massive different between cost and actual for the index plan. The seq scan plan has 451984/248694 (ratio 1.82) for cost/actual, while the index plan has 502051/11597 (ratio 43.29). At least the seq scan plan is only out by a factor of ~2. The row estimate for the Nested Loop op is 3362085 (vs 1934 actual). The optimiser estimated 1934 rows (accurate!) for the alarm_event_probe scan. As this table is joined to alarm_event on the PK (alarm_event_id), each row in alarm_event_probe can match at most one row from alarm_event, so the most rows you could expect from the join would be 1934. The optimiser does not seem to realise that the join is 1-to-1, or 1-to-0. FWIW set random_page_cost = 3.6 causes it to generate the preferred plan. I was under the impression that the best way to solve these kind of optimiser problems was to ensure that the optimiser had good stats information etc. There don't seem to be too many ways to direct it when it makes poor choices. What's the best way to fix this? 1. set random_page_cost = 3.0 2. set enable_seqscan = false; Or something else? Thanks, Alistair
Alistair Bayley <alistair@abayley.org> writes: > On 18 February 2014 14:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I notice though that the cost estimate for the seqscan plan isn't all that >> much lower than that for the indexscan plan. Probably lowering >> random_page_cost a bit would change the planner's mind. We have no >> information about total size of database vs available RAM, but if it's >> a mostly memory-resident database then such a change would be a good idea. > [ database size is 3GB, RAM 2GB ] The usual advice for database-in-RAM scenarios is to set random_page_cost = 1, or even to lower both random_page_cost and seq_page_cost below 1. In this case, since it's not going to be entirely RAM-resident, a compromise setting around 2 might be a good idea. > I'm particularly interested in the massive different between cost and > actual for the index plan. The seq scan plan has 451984/248694 (ratio > 1.82) for cost/actual, while the index plan has 502051/11597 (ratio > 43.29). At least the seq scan plan is only out by a factor of ~2. Most likely this means that the index plan is taking a lot more advantage of locality-of-reference than the planner is giving it credit for. I wouldn't put too much faith in those numbers by themselves though, because that's what nearly always happens if you run the same case through EXPLAIN more than once: all the data it needs is already in cache. It's a good idea to pay attention to what happens when the plan does have to read in some new data. regards, tom lane