Обсуждение: Why is the optimiser choosing the slower query, or, understanding explain analyze output

Поиск
Список
Период
Сортировка

Why is the optimiser choosing the slower query, or, understanding explain analyze output

От
Alistair Bayley
Дата:
I have postgresql 8.4.15 on Ubuntu 10.04 and this query:

    SELECT MAX(probeTable.PROBE_ALARM_EVENT_ID) AS MAX_EVENT_ID
    FROM ALARM_EVENT eventTable
    INNER JOIN ALARM_EVENT_PROBE probeTable
       ON eventTable.ALARM_EVENT_ID = probeTable.ALARM_EVENT_ID
    WHERE probeTable.PROBE_ID = 2

which is running slower than it could. Table definitions and explain
analyze output below.
The first explain is the current plan (uses sequential scans).
The second is after I have disabled sequential scans, and is the plan
I would prefer.

I have vacuum analyzed both tables. In terms of relevant changes to
the default postgresql.conf, we have these:

    shared_buffers = 28MB
    constraint_exclusion = on

I want to understand why the optimiser is choosing the plan with
sequential table scans, rather than the plan with index scans.
I am not sure how to interpret the predicted vs actual times/costs,
and want to understand why the predicted cost for the index scan plan
seems to be way off.

I have read: https://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf


    show server_version;

     server_version
    ----------------
     8.4.15


          Table "public.alarm_event_probe"
            Column        |  Type   | Modifiers
    ----------------------+---------+-----------
     alarm_event_id       | bigint  | not null
     probe_id             | integer | not null
     probe_alarm_event_id | bigint  | not null
    Indexes:
        "alarm_event_probe_pkey" PRIMARY KEY, btree (alarm_event_id)
        "alarm_event_probe_unique" UNIQUE, btree (probe_id,
probe_alarm_event_id)
    Foreign-key constraints:
        "alarm_event_probe_fk" FOREIGN KEY (probe_id) REFERENCES probe(probe_id)
    Triggers:
        alarm_event_probe_alarm_event_foreign_key_trigger BEFORE
INSERT OR UPDATE ON alarm_event_probe FOR EACH ROW EXECUTE PROCEDURE
alarm_event_foreign_key_function()

                                                     Table "public.alarm_event"
             Column         |            Type             |
                  Modifiers

------------------------+-----------------------------+----------------------------------------------------------------------
     alarm_event_id         | bigint                      | not null
default nextval('alarm_event_alarm_event_id_seq'::regclass)
     timestamp_device_utc   | timestamp without time zone |
     timestamp_manager_utc  | timestamp without time zone |
     timestamp_proxy_utc    | timestamp without time zone | not null
     timestamp_database_utc | timestamp without time zone | not null
default systimestamp()
     device_name            | character varying(32)       | not null
     device_location        | character varying(4)        | not null
     device_type            | character varying(6)        | not null
     device_category        | character varying(32)       | not null
     device_instance        | integer                     | not null
     proxy_name             | character varying(128)      | not null
     proxy_instance         | character varying(32)       | not null
     proxy_source           | character varying(256)      | not null
     native_alarm_id        | character varying(32)       |
     alarm_name             | character varying(64)       | not null
     alarm_severity         | integer                     |
     alarm_description      | character varying(1024)     | not null
     org_code               | integer                     | not null default 348
     domain_id              | integer                     |
     service_state          | integer                     | not null default 0
     proactive              | boolean                     | not null
default true
     alarm_id               | bigint                      |
    Indexes:
        "alarm_event_pkey" PRIMARY KEY, btree (alarm_event_id)
    Check constraints:
        "alarm_event_alarm_severity_range" CHECK (alarm_severity >= 0
AND alarm_severity <= 5)
        "alarm_event_service_state_range" CHECK (service_state >= 0
AND service_state <= 2)
    Foreign-key constraints:
        "alarm_event_domain_fk" FOREIGN KEY (domain_id) REFERENCES
domain(domain_id)
        "alarm_event_organisation_fk" FOREIGN KEY (org_code)
REFERENCES organisation(org_code)
    Triggers:
        alarm_event_1_raw_trigger AFTER INSERT ON alarm_event FOR EACH
ROW EXECUTE PROCEDURE alarm_event_raw_trigger_function()
        alarm_event_2_trigger_a AFTER INSERT OR DELETE OR UPDATE ON
alarm_event FOR EACH ROW EXECUTE PROCEDURE
alarm_event_trigger_function()
        alarm_event_probe_alarm_event_delete_cascade_trigger BEFORE
DELETE ON alarm_event FOR EACH ROW EXECUTE PROCEDURE
alarm_event_delete_alarm_event_probe_function()
        alarm_event_trigger_b BEFORE INSERT OR DELETE OR UPDATE ON
alarm_event FOR EACH ROW EXECUTE PROCEDURE
alarm_event_trigger_function()
        alarm_event_zz_insert_trigger AFTER INSERT ON alarm_event FOR
EACH ROW EXECUTE PROCEDURE alarm_event_insert()
    Number of child tables: 33 (Use \d+ to list them.)


       QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=451984.11..451984.12 rows=1 width=8) (actual
time=248693.886..248693.889 rows=1 loops=1)
       ->  Hash Join  (cost=107.06..443578.92 rows=3362075 width=8)
(actual time=2521.275..248683.459 rows=1934 loops=1)
             Hash Cond: (eventtable.alarm_event_id = probetable.alarm_event_id)
             ->  Append  (cost=0.00..370989.07 rows=7772408 width=8)
(actual time=14.364..220430.509 rows=7771865 loops=1)
                   ->  Seq Scan on alarm_event eventtable
(cost=0.00..3.00 rows=1 width=8) (actual time=0.088..0.088 rows=0
loops=1)
                   ->  Seq Scan on alarm_event_y2011m12 eventtable
(cost=0.00..12638.54 rows=290254 width=8) (actual
time=14.267..11136.290 rows=290254 loops=1)
                   ->  Seq Scan on alarm_event_y2011m10 eventtable
(cost=0.00..9146.57 rows=214457 width=8) (actual time=7.719..5820.900
rows=214457 loops=1)
                   ->  Seq Scan on alarm_event_y2011m09 eventtable
(cost=0.00..1183.60 rows=27660 width=8) (actual time=0.046..111.918
rows=27660 loops=1)
                   ->  Seq Scan on alarm_event_y2011m11 eventtable
(cost=0.00..6209.42 rows=149342 width=8) (actual time=0.055..662.708
rows=149342 loops=1)
                   ->  Seq Scan on alarm_event_y2012m01 eventtable
(cost=0.00..8661.84 rows=207184 width=8) (actual time=0.075..943.065
rows=207184 loops=1)
                   ->  Seq Scan on alarm_event_y2012m02 eventtable
(cost=0.00..7824.78 rows=182378 width=8) (actual time=0.051..6620.416
rows=182378 loops=1)
                   ->  Seq Scan on alarm_event_y2012m03 eventtable
(cost=0.00..16717.50 rows=386350 width=8) (actual
time=101.301..14018.406 rows=386350 loops=1)
                   ->  Seq Scan on alarm_event_y2012m04 eventtable
(cost=0.00..10125.17 rows=238117 width=8) (actual
time=25.155..3013.045 rows=238117 loops=1)
                   ->  Seq Scan on alarm_event_y2012m05 eventtable
(cost=0.00..10747.73 rows=251573 width=8) (actual time=0.058..1605.062
rows=251573 loops=1)
                   ->  Seq Scan on alarm_event_y2012m06 eventtable
(cost=0.00..16638.79 rows=387879 width=8) (actual time=0.122..4477.169
rows=387879 loops=1)
                   ->  Seq Scan on alarm_event_y2012m07 eventtable
(cost=0.00..9675.58 rows=222658 width=8) (actual time=85.891..8504.216
rows=222658 loops=1)
                   ->  Seq Scan on alarm_event_y2012m08 eventtable
(cost=0.00..9570.01 rows=234201 width=8) (actual
time=106.049..7978.829 rows=234201 loops=1)
                   ->  Seq Scan on alarm_event_y2012m09 eventtable
(cost=0.00..12731.91 rows=300791 width=8) (actual time=5.516..5596.174
rows=300791 loops=1)
                   ->  Seq Scan on alarm_event_y2012m10 eventtable
(cost=0.00..11052.83 rows=266483 width=8) (actual time=0.064..1159.065
rows=266483 loops=1)
                   ->  Seq Scan on alarm_event_y2012m11 eventtable
(cost=0.00..9540.78 rows=226778 width=8) (actual time=0.045..878.269
rows=226778 loops=1)
                   ->  Seq Scan on alarm_event_y2012m12 eventtable
(cost=0.00..8110.64 rows=208464 width=8) (actual time=0.061..883.399
rows=208464 loops=1)
                   ->  Seq Scan on alarm_event_y2013m01 eventtable
(cost=0.00..13391.78 rows=272178 width=8) (actual
time=0.028..10847.953 rows=270708 loops=1)
                   ->  Seq Scan on alarm_event_y2013m02 eventtable
(cost=0.00..15720.65 rows=297365 width=8) (actual
time=134.814..8192.080 rows=297204 loops=1)
                   ->  Seq Scan on alarm_event_y2013m03 eventtable
(cost=0.00..41027.29 rows=810229 width=8) (actual
time=11.567..8419.318 rows=805929 loops=1)
                   ->  Seq Scan on alarm_event_y2013m04 eventtable
(cost=0.00..13382.35 rows=253135 width=8) (actual time=0.072..9731.649
rows=253329 loops=1)
                   ->  Seq Scan on alarm_event_y2013m05 eventtable
(cost=0.00..9793.55 rows=175455 width=8) (actual time=4.434..9965.956
rows=176525 loops=1)
                   ->  Seq Scan on alarm_event_y2013m06 eventtable
(cost=0.00..9961.74 rows=184074 width=8) (actual
time=123.567..11228.522 rows=184335 loops=1)
                   ->  Seq Scan on alarm_event_y2013m07 eventtable
(cost=0.00..10330.15 rows=190215 width=8) (actual time=4.728..4839.910
rows=189743 loops=1)
                   ->  Seq Scan on alarm_event_y2013m08 eventtable
(cost=0.00..8808.29 rows=160329 width=8) (actual time=8.731..2313.534
rows=161477 loops=1)
                   ->  Seq Scan on alarm_event_y2013m09 eventtable
(cost=0.00..14232.17 rows=273617 width=8) (actual time=0.035..1367.249
rows=273621 loops=1)
                   ->  Seq Scan on alarm_event_y2013m10 eventtable
(cost=0.00..17202.44 rows=320544 width=8) (actual time=0.105..1853.031
rows=320310 loops=1)
                   ->  Seq Scan on alarm_event_y2013m11 eventtable
(cost=0.00..15857.97 rows=278997 width=8) (actual time=0.044..7627.316
rows=281997 loops=1)
                   ->  Seq Scan on alarm_event_y2013m12 eventtable
(cost=0.00..15012.38 rows=278738 width=8) (actual
time=121.673..11397.349 rows=280114 loops=1)
                   ->  Seq Scan on alarm_event_y2014m01 eventtable
(cost=0.00..17534.21 rows=331521 width=8) (actual
time=70.202..7098.822 rows=330846 loops=1)
                   ->  Seq Scan on alarm_event_y2014m02 eventtable
(cost=0.00..8124.21 rows=151321 width=8) (actual time=22.929..1894.643
rows=151158 loops=1)
                   ->  Seq Scan on alarm_event_y2014m03 eventtable
(cost=0.00..10.40 rows=40 width=8) (actual time=0.003..0.003 rows=0
loops=1)
                   ->  Seq Scan on alarm_event_y2014m04 eventtable
(cost=0.00..10.40 rows=40 width=8) (actual time=0.004..0.004 rows=0
loops=1)
                   ->  Seq Scan on alarm_event_y2014m05 eventtable
(cost=0.00..10.40 rows=40 width=8) (actual time=0.003..0.003 rows=0
loops=1)
             ->  Hash  (cost=82.89..82.89 rows=1934 width=16) (actual
time=16.402..16.402 rows=1934 loops=1)
                   ->  Seq Scan on alarm_event_probe probetable
(cost=0.00..82.89 rows=1934 width=16) (actual time=0.403..8.985
rows=1934 loops=1)
                         Filter: (probe_id = 2)
     Total runtime: 248864.890 ms
    (42 rows)

    set enable_seqscan = false;
    SET


                      QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Aggregate  (cost=502051.06..502051.07 rows=1 width=8) (actual
time=11597.236..11597.239 rows=1 loops=1)
       ->  Nested Loop  (cost=0.00..493645.84 rows=3362085 width=8)
(actual time=151.505..11587.339 rows=1934 loops=1)
             Join Filter: (eventtable.alarm_event_id =
probetable.alarm_event_id)
             ->  Index Scan using alarm_event_probe_pkey on
alarm_event_probe probetable  (cost=0.00..168.49 rows=1934 width=16)
(actual time=143.484..159.889 rows=1934 loops=1)
                   Filter: (probe_id = 2)
             ->  Append  (cost=0.00..254.73 rows=34 width=8) (actual
time=5.343..5.874 rows=1 loops=1934)
                   ->  Index Scan using alarm_event_pkey on
alarm_event eventtable  (cost=0.00..6.65 rows=1 width=8) (actual
time=0.008..0.008 rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m12_alarm_event_id_ix on alarm_event_y2011m12
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.378..0.378
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m10_alarm_event_id_ix on alarm_event_y2011m10
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.008..0.008
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m09_alarm_event_id_ix on alarm_event_y2011m09
eventtable  (cost=0.00..7.79 rows=1 width=8) (actual time=0.007..0.007
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2011m11_alarm_event_id_ix on alarm_event_y2011m11
eventtable  (cost=0.00..7.85 rows=1 width=8) (actual time=0.008..0.008
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m01_alarm_event_id_ix on alarm_event_y2012m01
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.479..0.480
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m02_alarm_event_id_ix on alarm_event_y2012m02
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.015..0.015
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m03_alarm_event_id_ix on alarm_event_y2012m03
eventtable  (cost=0.00..7.91 rows=1 width=8) (actual time=0.016..0.016
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m04_alarm_event_id_ix on alarm_event_y2012m04
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.014..0.014
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m05_alarm_event_id_ix on alarm_event_y2012m05
eventtable  (cost=0.00..7.88 rows=1 width=8) (actual time=0.015..0.015
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m06_alarm_event_id_ix on alarm_event_y2012m06
eventtable  (cost=0.00..7.91 rows=1 width=8) (actual time=0.071..0.071
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m07_alarm_event_id_ix on alarm_event_y2012m07
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.016..0.016
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m08_alarm_event_id_ix on alarm_event_y2012m08
eventtable  (cost=0.00..7.88 rows=1 width=8) (actual time=0.027..0.027
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m09_alarm_event_id_ix on alarm_event_y2012m09
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.664..0.670
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m10_alarm_event_id_ix on alarm_event_y2012m10
eventtable  (cost=0.00..7.88 rows=1 width=8) (actual time=0.381..0.381
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m11_alarm_event_id_ix on alarm_event_y2012m11
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.308..0.309
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2012m12_alarm_event_id_ix on alarm_event_y2012m12
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.033..0.033
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m01_alarm_event_id_ix on alarm_event_y2013m01
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.032..0.032
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m02_alarm_event_id_ix on alarm_event_y2013m02
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.254..0.254
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m03_alarm_event_id_ix on alarm_event_y2013m03
eventtable  (cost=0.00..8.03 rows=1 width=8) (actual time=0.030..0.030
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m04_alarm_event_id_ix on alarm_event_y2013m04
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.095..0.095
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m05_alarm_event_id_ix on alarm_event_y2013m05
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.023..0.023
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m06_alarm_event_id_ix on alarm_event_y2013m06
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.104..0.104
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m07_alarm_event_id_ix on alarm_event_y2013m07
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.179..0.180
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m08_alarm_event_id_ix on alarm_event_y2013m08
eventtable  (cost=0.00..7.87 rows=1 width=8) (actual time=0.250..0.250
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m09_alarm_event_id_ix on alarm_event_y2013m09
eventtable  (cost=0.00..7.89 rows=1 width=8) (actual time=0.626..0.627
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m10_alarm_event_id_ix on alarm_event_y2013m10
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.300..0.300
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m11_alarm_event_id_ix on alarm_event_y2013m11
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.201..0.201
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2013m12_alarm_event_id_ix on alarm_event_y2013m12
eventtable  (cost=0.00..7.90 rows=1 width=8) (actual time=0.552..0.552
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m01_alarm_event_id_ix on alarm_event_y2014m01
eventtable  (cost=0.00..7.91 rows=1 width=8) (actual time=0.323..0.324
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m02_alarm_event_id_ix on alarm_event_y2014m02
eventtable  (cost=0.00..7.86 rows=1 width=8) (actual time=0.302..0.303
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m03_alarm_event_id_ix on alarm_event_y2014m03
eventtable  (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m04_alarm_event_id_ix on alarm_event_y2014m04
eventtable  (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
                   ->  Index Scan using
alarm_event_y2014m05_alarm_event_id_ix on alarm_event_y2014m05
eventtable  (cost=0.00..3.87 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1934)
                         Index Cond: (eventtable.alarm_event_id =
probetable.alarm_event_id)
     Total runtime: 11599.260 ms
    (75 rows)


Re: Why is the optimiser choosing the slower query, or, understanding explain analyze output

От
Jeff Janes
Дата:
On Mon, Feb 17, 2014 at 1:54 PM, Alistair Bayley <alistair@abayley.org> wrote:
I have postgresql 8.4.15 on Ubuntu 10.04 and this query:

    SELECT MAX(probeTable.PROBE_ALARM_EVENT_ID) AS MAX_EVENT_ID
    FROM ALARM_EVENT eventTable
    INNER JOIN ALARM_EVENT_PROBE probeTable
       ON eventTable.ALARM_EVENT_ID = probeTable.ALARM_EVENT_ID
    WHERE probeTable.PROBE_ID = 2

which is running slower than it could. Table definitions and explain
analyze output below.
The first explain is the current plan (uses sequential scans).
The second is after I have disabled sequential scans, and is the plan
I would prefer.

I have vacuum analyzed both tables. In terms of relevant changes to
the default postgresql.conf, we have these:

    shared_buffers = 28MB
    constraint_exclusion = on

I want to understand why the optimiser is choosing the plan with
sequential table scans, rather than the plan with index scans.
I am not sure how to interpret the predicted vs actual times/costs,
and want to understand why the predicted cost for the index scan plan
seems to be way off.

The planner clamps the estimated number of rows from an index scan at 1 row, even if it actually believes the number will be 0.  That makes the logical simpler, avoiding needs to test for division by zero all over the place, and probably makes it more robust to mis-estimation in most use cases.  But in this case, that means it thinks it will find 34 rows, one from each partition, which is way too high.  

Now, there certainly is some cost to test an index and finding that no rows in it can match.  But your query is probably probing the same spot in each index for each negative match, which means all the blocks are already in memory.  But PostgreSQL doesn't know that, so even if it didn't do the clamp it would probably still not get the right answer.

Cheers,

Jeff

Re: Why is the optimiser choosing the slower query, or, understanding explain analyze output

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> On Mon, Feb 17, 2014 at 1:54 PM, Alistair Bayley <alistair@abayley.org>wrote:
>> I want to understand why the optimiser is choosing the plan with
>> sequential table scans, rather than the plan with index scans.

> The planner clamps the estimated number of rows from an index scan at 1
> row, even if it actually believes the number will be 0.  That makes the
> logical simpler, avoiding needs to test for division by zero all over the
> place, and probably makes it more robust to mis-estimation in most use
> cases.  But in this case, that means it thinks it will find 34 rows, one
> from each partition, which is way too high.

Even if it believed the zero row estimate it's probably getting
internally, the cost estimate wouldn't change much, because as you say
it's still got to assume that the index will be traversed to verify that
there's no such row(s).

I notice though that the cost estimate for the seqscan plan isn't all that
much lower than that for the indexscan plan.  Probably lowering
random_page_cost a bit would change the planner's mind.  We have no
information about total size of database vs available RAM, but if it's
a mostly memory-resident database then such a change would be a good idea.

            regards, tom lane


Re: Why is the optimiser choosing the slower query, or, understanding explain analyze output

От
Alistair Bayley
Дата:
On 18 February 2014 14:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I notice though that the cost estimate for the seqscan plan isn't all that
> much lower than that for the indexscan plan.  Probably lowering
> random_page_cost a bit would change the planner's mind.  We have no
> information about total size of database vs available RAM, but if it's
> a mostly memory-resident database then such a change would be a good idea.

select pg_size_pretty(pg_database_size('fms'));

 pg_size_pretty
----------------
 3066 MB
(1 row)

The DB sits on a dedicated VM with 2G RAM, of which only about 600M is
currently used. So assuming it is mostly memory-resident seems pretty
reasonable.

I'm particularly interested in the massive different between cost and
actual for the index plan. The seq scan plan has 451984/248694 (ratio
1.82) for cost/actual, while the index plan has 502051/11597 (ratio
43.29). At least the seq scan plan is only out by a factor of ~2.

The row estimate for the Nested Loop op is 3362085 (vs 1934 actual).
The optimiser estimated 1934 rows (accurate!) for the
alarm_event_probe scan. As this table is joined to alarm_event on the
PK (alarm_event_id), each row in alarm_event_probe can match at most
one row from alarm_event, so the most rows you could expect from the
join would be 1934. The optimiser does not seem to realise that the
join is 1-to-1, or 1-to-0.

FWIW set random_page_cost = 3.6 causes it to generate the preferred plan.

I was under the impression that the best way to solve these kind of
optimiser problems was to ensure that the optimiser had good stats
information etc. There don't seem to be too many ways to direct it
when it makes poor choices.

What's the best way to fix this?
 1. set random_page_cost = 3.0
 2. set enable_seqscan = false;

Or something else?

Thanks,
Alistair


Re: Why is the optimiser choosing the slower query, or, understanding explain analyze output

От
Tom Lane
Дата:
Alistair Bayley <alistair@abayley.org> writes:
> On 18 February 2014 14:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I notice though that the cost estimate for the seqscan plan isn't all that
>> much lower than that for the indexscan plan.  Probably lowering
>> random_page_cost a bit would change the planner's mind.  We have no
>> information about total size of database vs available RAM, but if it's
>> a mostly memory-resident database then such a change would be a good idea.

> [ database size is 3GB, RAM 2GB ]

The usual advice for database-in-RAM scenarios is to set random_page_cost
= 1, or even to lower both random_page_cost and seq_page_cost below 1.
In this case, since it's not going to be entirely RAM-resident, a
compromise setting around 2 might be a good idea.

> I'm particularly interested in the massive different between cost and
> actual for the index plan. The seq scan plan has 451984/248694 (ratio
> 1.82) for cost/actual, while the index plan has 502051/11597 (ratio
> 43.29). At least the seq scan plan is only out by a factor of ~2.

Most likely this means that the index plan is taking a lot more advantage
of locality-of-reference than the planner is giving it credit for.
I wouldn't put too much faith in those numbers by themselves though,
because that's what nearly always happens if you run the same case
through EXPLAIN more than once: all the data it needs is already in
cache.  It's a good idea to pay attention to what happens when the plan
does have to read in some new data.

            regards, tom lane