Обсуждение: Selective usage of index in planner/optimizer (Too conservative?)
Hi: I was testing a database when notice that it does not used the new index I created. So after a couple of VACUUM ANALYZE it tried the following test queries. **** TEST CASE #1 *********** loyalty=# set enable_seqscan=off; SET VARIABLE loyalty=# explain analyze select count(*) from points where branch_cd=1 ; NOTICE: QUERY PLAN: Aggregate (cost=119123.54..119123.54 rows=1 width=0) (actual time=811.08..811.0 8 rows=1 loops=1) -> Index Scan using idx_monthly_branch on points (cost=0.00..1187 65.86 rows=143073 width=0) (actual time=0.19..689.75 rows=136790 loops=1) Total runtime: 811.17 msec ***** TEST CASE #2 ********* loyalty=# set enable_seqscan=on; SET VARIABLE loyalty=# explain analyze select count(*) from points where branch_cd=1 ; NOTICE: QUERY PLAN: Aggregate (cost=62752.34..62752.34 rows=1 width=0) (actual time=3593.93..3593.9 3 rows=1 loops=1) -> Seq Scan on points (cost=0.00..62681.70 rows=28254 width=0) (a ctual time=0.33..3471.54 rows=136790 loops=1) Total runtime: 3594.01 msec *** TEST CASE #3 (Sequential scan turned off) ****** loyalty=# explain select * from points where branch_cd=5; NOTICE: QUERY PLAN: Index Scan using idx_monthly_branch on points (cost=0.00..49765.12 r ows=16142 width=55) I am wondering why in test case #2 it did not use an index scan, where as in case #3 it did. The number of rows in test #2 and #3 are just a small subset of table "points". The following are the number of elements in the table: branch_cd = 1 ---> 136,970 branch_cd = 5 ---> 39,385 count(*) ---> 2,570,173 Its rather strange why "SELECT COUNT(*)...WHERE branch_cd=1" uses sequential scan even though it just comprises 5.3% of whole table... I'ts also strange because of the ff: (Remember test case 1 and 2 are the same query) test 1 --> seq_scan=off --> 811.17 msec test 2 --> seq_scan=on --> 3594.01 msec Test #1 have 400% improvement over Test #2, yet the query plan for test #2 is the default. Are there way to let the planner improve the choice in using an index or not? BTW the "cost" variables are set to the default for the test. Thank you in advance. ludwig. __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com
Ludwig Lim <lud_nowhere_man@yahoo.com> writes: > NOTICE: QUERY PLAN: > Aggregate (cost=119123.54..119123.54 rows=1 width=0) > (actual time=811.08..811.0 > 8 rows=1 loops=1) > -> Index Scan using idx_monthly_branch on points > (cost=0.00..1187 > 65.86 rows=143073 width=0) (actual time=0.19..689.75 > rows=136790 loops=1) > Total runtime: 811.17 msec > NOTICE: QUERY PLAN: > Aggregate (cost=62752.34..62752.34 rows=1 width=0) > (actual time=3593.93..3593.9 > 3 rows=1 loops=1) > -> Seq Scan on points (cost=0.00..62681.70 > rows=28254 width=0) (a > ctual time=0.33..3471.54 rows=136790 loops=1) > Total runtime: 3594.01 msec Something fishy about this --- why is the estimated number of rows different in the two cases (143073 vs 28254)? Did you redo VACUUM and/or ANALYZE in between? > I am wondering why in test case #2 it did not use > an index scan, where as in case #3 it did. Probably because it knows "branch_cd=5" is more selective than "branch_cd=1". It would be useful to see the pg_stats entry for branch_cd. > Its rather strange why "SELECT COUNT(*)...WHERE > branch_cd=1" uses sequential scan even though it just > comprises 5.3% of whole table... No, what's strange is that it's faster to use an indexscan for that. The table must be very nearly in order by branch_cd; have you clustered it recently? regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Something fishy about this --- why is the estimated > number of rows > different in the two cases (143073 vs 28254)? Did > you redo VACUUM > and/or ANALYZE in between? I neither VACUUMed nor ANALYZEd between the 2 cases. > > > I am wondering why in test case #2 it did not > use > > an index scan, where as in case #3 it did. > > Probably because it knows "branch_cd=5" is more > selective than > "branch_cd=1". It would be useful to see the > pg_stats entry for > branch_cd. Should I try altering the statistics? I tried ANALYZE points(branch_cd); but it still gave me the same results. > > Its rather strange why "SELECT COUNT(*)...WHERE > > branch_cd=1" uses sequential scan even though it > just > > comprises 5.3% of whole table... What I mean is the table is rather large. (2 million rows) and I thought the planner would automatically used an index to retrieve a small subset (based on the percentage) of the large table. > No, what's strange is that it's faster to use an > indexscan for that. > The table must be very nearly in order by branch_cd; > have you clustered > it recently? I never clustered the table. But prior to testing I dropped an index and create a new one. Does dropping and creating index "confuse" the planner even after a VACUUM ANALYZE? I seem to notice this trend everytime I add a new index to the table. It would slow down and the performance would gradually improve in a day or two. Should I try changing "cost" variables? I'm using Pentium IV, with SCSI [RAID 5]. regards, ludwig. __________________________________________________ Do you Yahoo!? New DSL Internet Access from SBC & Yahoo! http://sbc.yahoo.com