Обсуждение: postgres 8.2 seems to prefer Seq Scan
One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans for the first query while the ordering in the second query seems to perform worse on 8.2. I ran analyze. I've tried with the encoding set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to improve this? Thanks, Alex postgres 7.4 EXPLAIN ANALYZE select pnum, event_pid, code_name, code_description, code_mcam, event_date, effective_date, ref_country, ref_country_legal_code, corresponding_pnum, withdrawal_date, payment_date, extension_date, fee_payment_year, requester, free_form from code inner join event on code_pid = code_pid_fk where pnum ='AB5819188'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..60.87 rows=19 width=231) (actual time=0.065..0.065 rows=0 loops=1) -> Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 width=172) (actual time=0.063..0.063 rows=0 loops=1) Index Cond: ((pnum)::text = 'AB5819188'::text) -> Index Scan using code_pkey on code (cost=0.00..3.01 rows=1 width=67) (never executed) Index Cond: (code.code_pid = "outer".code_pid_fk) Total runtime: 0.242 ms (6 rows) postgres 8.2 EXPLAIN ANALYZE select pnum, event_pid, code_name, code_description, code_mcam, event_date, effective_date, ref_country, ref_country_legal_code, corresponding_pnum, withdrawal_date, payment_date, extension_date, fee_payment_year, requester, free_form from code inner join event on code_pid = code_pid_fk where pnum ='AB5819188'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=106.91..3283.46 rows=1779 width=230) (actual time=10.383..10.390 rows=1 loops=1) Hash Cond: (event.code_pid_fk = code.code_pid) -> Index Scan using pnum_idx on event (cost=0.00..3147.63 rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) Index Cond: ((pnum)::text = 'AB5819188'::text) -> Hash (cost=70.85..70.85 rows=2885 width=67) (actual time=10.329..10.329 rows=2885 loops=1) -> Seq Scan on code (cost=0.00..70.85 rows=2885 width=67) (actual time=0.013..4.805 rows=2885 loops=1) Total runtime: 10.490 ms (7 rows) postgres 7.4 EXPLAIN ANALYZE select e.pnum, c.code_description, c.code_mcam, e.event_pid from event e, code c where c.code_name = e.ref_country_legal_code and c.code_country = e.ref_country and e.pnum = 'AB5819188'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=3.47..106.28 rows=1 width=73) (actual time=7.795..7.795 rows=0 loops=1) Hash Cond: ((("outer".code_name)::text = ("inner".ref_country_legal_code)::text) AND (("outer".code_country)::text = ("inner".ref_country)::text)) -> Seq Scan on code c (cost=0.00..63.92 rows=2592 width=69) (actual time=0.010..3.881 rows=2592 loops=1) -> Hash (cost=3.37..3.37 rows=19 width=30) (actual time=0.064..0.064 rows=0 loops=1) -> Index Scan using pnum_idx on event e (cost=0.00..3.37 rows=19 width=30) (actual time=0.062..0.062 rows=0 loops=1) Index Cond: ((pnum)::text = 'AB5819188'::text) Total runtime: 7.947 ms (7 rows) postgres 8.2 EXPLAIN ANALYZE select e.pnum, c.code_description, c.code_mcam, e.event_pid from event e, code c where c.code_name = e.ref_country_legal_code and c.code_country = e.ref_country and e.pnum = 'AB5819188'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=114.12..3368.51 rows=1 width=73) (actual time=10.900..10.900 rows=0 loops=1) Hash Cond: (((e.ref_country_legal_code)::text = (c.code_name)::text) AND ((e.ref_country)::text = (c.code_country)::text)) -> Index Scan using pnum_idx on event e (cost=0.00..3147.63 rows=1779 width=30) (actual time=0.027..0.031 rows=1 loops=1) Index Cond: ((pnum)::text = 'AB5819188'::text) -> Hash (cost=70.85..70.85 rows=2885 width=69) (actual time=10.838..10.838 rows=2885 loops=1) -> Seq Scan on code c (cost=0.00..70.85 rows=2885 width=69) (actual time=0.011..4.863 rows=2885 loops=1) Total runtime: 11.018 ms (7 rows)
On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote: > One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded > onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans > for the first query while the ordering in the second query seems to > perform worse on 8.2. I ran analyze. I've tried with the encoding > set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to > improve this? Are you sure the data sets are identical? The 7.4 query returned 0 rows; the 8.2 query returned 1 row. If you're running the same query against the same data in both versions then at least one of them appears to be returning the wrong result. Exactly which versions of 7.4 and 8.2 are you running? Have you analyzed all tables in both versions? The row count estimate in 7.4 is much closer to reality than in 8.2: 7.4 > -> Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 > width=172) (actual time=0.063..0.063 rows=0 loops=1) > Index Cond: ((pnum)::text = 'AB5819188'::text) 8.2 > -> Index Scan using pnum_idx on event (cost=0.00..3147.63 > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) > Index Cond: ((pnum)::text = 'AB5819188'::text) If analyzing the event table doesn't improve the row count estimate then try increasing the statistics target for event.pnum and analyzing again. Example: ALTER TABLE event ALTER pnum SET STATISTICS 100; ANALYZE event; You can set the statistics target as high as 1000 to get more accurate results at the cost of longer ANALYZE times. -- Michael Fuhr
On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote: > On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote: > > One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded > > onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans > > for the first query while the ordering in the second query seems to > > perform worse on 8.2. I ran analyze. I've tried with the encoding > > set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to > > improve this? > > Are you sure the data sets are identical? The 7.4 query returned > 0 rows; the 8.2 query returned 1 row. If you're running the same > query against the same data in both versions then at least one of > them appears to be returning the wrong result. Exactly which > versions of 7.4 and 8.2 are you running? They should be although it's possible one of my co-workers updated one of the DB's since I last dumped it, but should be a negligible amount of data. Not sure of the exact version of 7.4; psql just says: psql --version psql (PostgreSQL) 7.4 contains support for command-line editing 8.2 is 8.2.3 > > Have you analyzed all tables in both versions? The row count > estimate in 7.4 is much closer to reality than in 8.2: > Yes. > 7.4 > > -> Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 > > width=172) (actual time=0.063..0.063 rows=0 loops=1) > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > 8.2 > > -> Index Scan using pnum_idx on event (cost=0.00..3147.63 > > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > If analyzing the event table doesn't improve the row count estimate > then try increasing the statistics target for event.pnum and analyzing > again. Example: > > ALTER TABLE event ALTER pnum SET STATISTICS 100; > ANALYZE event; > > You can set the statistics target as high as 1000 to get more > accurate results at the cost of longer ANALYZE times. > Thanks! I'll give that a try and report back. Alex
On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote: > On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote: > > One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded > > onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans > > for the first query while the ordering in the second query seems to > > perform worse on 8.2. I ran analyze. I've tried with the encoding > > set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to > > improve this? > > Are you sure the data sets are identical? The 7.4 query returned > 0 rows; the 8.2 query returned 1 row. If you're running the same > query against the same data in both versions then at least one of > them appears to be returning the wrong result. Exactly which > versions of 7.4 and 8.2 are you running? > > Have you analyzed all tables in both versions? The row count > estimate in 7.4 is much closer to reality than in 8.2: > > 7.4 > > -> Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 > > width=172) (actual time=0.063..0.063 rows=0 loops=1) > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > 8.2 > > -> Index Scan using pnum_idx on event (cost=0.00..3147.63 > > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > If analyzing the event table doesn't improve the row count estimate > then try increasing the statistics target for event.pnum and analyzing > again. Example: > > ALTER TABLE event ALTER pnum SET STATISTICS 100; > ANALYZE event; > > You can set the statistics target as high as 1000 to get more > accurate results at the cost of longer ANALYZE times. > Setting statistics to 400 seems to be the sweet spot. Values above that seem to only marginally improve performance. However, I have to disable seqscan in order for the query to be fast. Why does the query planner insist on doing a seq scan? Is there anyway to make it prefer the index scan? Thanks, Alex postgres 8.2 db=# EXPLAIN ANALYZE select pnum, event_pid, code_name, code_description, code_mcam, event_date, effective_date, ref_country, ref_country_legal_code, corresponding_pnum, withdrawal_date, payment_date, extension_date, fee_payment_year, requester, free_form from code inner join event on code_pid = code_pid_fk where pnum ='US5819188'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=106.91..210.37 rows=54 width=229) (actual time=11.245..11.253 rows=1 loops=1) Hash Cond: (event.code_pid_fk = code.code_pid) -> Index Scan using pnum_idx on event (cost=0.00..102.58 rows=54 width=170) (actual time=0.108..0.112 rows=1 loops=1) Index Cond: ((pnum)::text = 'US5819188'::text) -> Hash (cost=70.85..70.85 rows=2885 width=67) (actual time=11.006..11.006 rows=2885 loops=1) -> Seq Scan on code (cost=0.00..70.85 rows=2885 width=67) (actual time=0.025..5.392 rows=2885 loops=1) Total runtime: 11.429 ms (7 rows) db=# set enable_seqscan=0; SET db=# EXPLAIN ANALYZE select pnum, event_pid, code_name, code_description, code_mcam, event_date, effective_date, ref_country, ref_country_legal_code, corresponding_pnum, withdrawal_date, payment_date, extension_date, fee_payment_year, requester, free_form from code inner join event on code_pid = code_pid_fk where pnum ='US5819188'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..289.72 rows=54 width=229) (actual time=0.068..0.076 rows=1 loops=1) -> Index Scan using pnum_idx on event (cost=0.00..102.58 rows=54 width=170) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: ((pnum)::text = 'US5819188'::text) -> Index Scan using code_pkey on code (cost=0.00..3.45 rows=1 width=67) (actual time=0.041..0.043 rows=1 loops=1) Index Cond: (code.code_pid = event.code_pid_fk) Total runtime: 0.126 ms (6 rows)
On 4/9/07, Alex Deucher <alexdeucher@gmail.com> wrote: > On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote: > > On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote: > > > One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded > > > onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans > > > for the first query while the ordering in the second query seems to > > > perform worse on 8.2. I ran analyze. I've tried with the encoding > > > set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to > > > improve this? > > > > Are you sure the data sets are identical? The 7.4 query returned > > 0 rows; the 8.2 query returned 1 row. If you're running the same > > query against the same data in both versions then at least one of > > them appears to be returning the wrong result. Exactly which > > versions of 7.4 and 8.2 are you running? > > > > Have you analyzed all tables in both versions? The row count > > estimate in 7.4 is much closer to reality than in 8.2: > > > > 7.4 > > > -> Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 > > > width=172) (actual time=0.063..0.063 rows=0 loops=1) > > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > > > 8.2 > > > -> Index Scan using pnum_idx on event (cost=0.00..3147.63 > > > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) > > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > > > If analyzing the event table doesn't improve the row count estimate > > then try increasing the statistics target for event.pnum and analyzing > > again. Example: > > > > ALTER TABLE event ALTER pnum SET STATISTICS 100; > > ANALYZE event; > > > > You can set the statistics target as high as 1000 to get more > > accurate results at the cost of longer ANALYZE times. > > > > Setting statistics to 400 seems to be the sweet spot. Values above > that seem to only marginally improve performance. However, I have to > disable seqscan in order for the query to be fast. Why does the query > planner insist on doing a seq scan? Is there anyway to make it prefer > the index scan? > FWIW, disabling seqscan also makes the second query much faster: EXPLAIN ANALYZE select e.pnum, c.code_description, c.code_mcam, e.event_pid from event e, code c where c.code_name = e.ref_country_legal_code and c.code_country = e.ref_country and e.pnum = 'US5819188'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=104.13..257.65 rows=1 width=73) (actual time=0.038..0.038 rows=0 loops=1) Merge Cond: ((c.code_country)::text = "inner"."?column5?") Join Filter: ((c.code_name)::text = (e.ref_country_legal_code)::text) -> Index Scan using code_country_idx on code c (cost=0.00..134.00 rows=2885 width=69) (actual time=0.012..0.012 rows=1 loops=1) -> Sort (cost=104.13..104.27 rows=54 width=30) (actual time=0.019..0.021 rows=1 loops=1) Sort Key: (e.ref_country)::text -> Index Scan using pnum_idx on event e (cost=0.00..102.58 rows=54 width=30) (actual time=0.010..0.012 rows=1 loops=1) Index Cond: ((pnum)::text = 'US5819188'::text) Total runtime: 0.072 ms (9 rows)
On 4/9/07, Alex Deucher <alexdeucher@gmail.com> wrote: > On 4/9/07, Alex Deucher <alexdeucher@gmail.com> wrote: > > On 4/6/07, Michael Fuhr <mike@fuhr.org> wrote: > > > On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote: > > > > One more anomaly between 7.4 and 8.2. DB dumped from 7.4 and loaded > > > > onto 8.2, both have locale set to C. 8.2 seems to prefer Seq Scans > > > > for the first query while the ordering in the second query seems to > > > > perform worse on 8.2. I ran analyze. I've tried with the encoding > > > > set to UTF-8 and SQL_ASCII; same numbers and plans. Any ideas how to > > > > improve this? > > > > > > Are you sure the data sets are identical? The 7.4 query returned > > > 0 rows; the 8.2 query returned 1 row. If you're running the same > > > query against the same data in both versions then at least one of > > > them appears to be returning the wrong result. Exactly which > > > versions of 7.4 and 8.2 are you running? > > > > > > Have you analyzed all tables in both versions? The row count > > > estimate in 7.4 is much closer to reality than in 8.2: > > > > > > 7.4 > > > > -> Index Scan using pnum_idx on event (cost=0.00..3.37 rows=19 > > > > width=172) (actual time=0.063..0.063 rows=0 loops=1) > > > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > > > > > 8.2 > > > > -> Index Scan using pnum_idx on event (cost=0.00..3147.63 > > > > rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1) > > > > Index Cond: ((pnum)::text = 'AB5819188'::text) > > > > > > If analyzing the event table doesn't improve the row count estimate > > > then try increasing the statistics target for event.pnum and analyzing > > > again. Example: > > > > > > ALTER TABLE event ALTER pnum SET STATISTICS 100; > > > ANALYZE event; > > > > > > You can set the statistics target as high as 1000 to get more > > > accurate results at the cost of longer ANALYZE times. > > > > > > > Setting statistics to 400 seems to be the sweet spot. Values above > > that seem to only marginally improve performance. However, I have to > > disable seqscan in order for the query to be fast. Why does the query > > planner insist on doing a seq scan? Is there anyway to make it prefer > > the index scan? > > Ok, it looks like bumping up the the stats to 400 did the trick. It seems my test sets were not a good representation of the queries. The sets I was using were more of an exception to the rule since they were hitting comparatively fewer rows that most others. Thanks to everyone on the list and IRC for their help. Alex