Re: PG7.4.5: query not using index on date column
От | Dave Steinberg |
---|---|
Тема | Re: PG7.4.5: query not using index on date column |
Дата | |
Msg-id | 726FC50F-4222-11D9-A28C-0030656E7E7A@redterror.net обсуждение исходный текст |
Ответ на | Re: PG7.4.5: query not using index on date column (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-sql |
Sorry for the delay in getting back on this thread, painting my apartment meant no internet over this thanksgiving break... :) > How many rows in the table altogether? A rough guess is a few million > based on the estimated seqscan cost. That would mean that this query > is retrieving about 10% of the table, which is a large enough fraction > that the planner will probably think a seqscan is best. It may be > right. > If you do "set enable_seqscan = off", how does the EXPLAIN ANALYZE > output change? Right now it's a hair under 800k rows. With enable_seqscan = off, I get this as my analyze results: QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------- Sort (cost=361676.23..361676.33 rows=38 width=226) (actual time=46076.756..46076.924 rows=69 loops=1) Sort Key: count -> Subquery Scan aggs (cost=361674.10..361675.24 rows=38 width=226) (actual time=46068.621..46076.159 rows=69 loops=1) -> HashAggregate (cost=361674.10..361674.86 rows=38 width=54) (actual time=46068.596..46075.170 rows=69 loops=1) -> Index Scan using received_date_idx on messages (cost=0.00..349968.44 rows=585283 width=54) (actual time=20.988..15020.821 rows=589543 loops=1) Index Cond: ((received_date >= '2004-11-01'::date) AND (received_date <= '2004-11-30'::date)) Total runtime: 46091.315 ms Quite a bit higher with these estimates. Using your 10% of the table guess above, I did a breakdown by date and got these results: geekadmin@db:geekisp=>select received_date, count(received_date) from spamreport.messages group by received_date order by received_date asc; received_date | count ---------------+------- 2004-10-20 | 7592 2004-10-21 | 19699 2004-10-22 | 17311 2004-10-23 | 16730 2004-10-24 | 18249 2004-10-25 | 16718 2004-10-26 | 16951 2004-10-27 | 19818 2004-10-28 | 19580 2004-10-29 | 17610 2004-10-30 | 16210 2004-10-31 | 20468 2004-11-01 | 12337 2004-11-02 | 9012 2004-11-03 | 20871 2004-11-04 | 20103 2004-11-05 | 18807 2004-11-06 | 20131 2004-11-07 | 22291 2004-11-08 | 23041 2004-11-09 | 20254 2004-11-10 | 17810 2004-11-11 | 21091 2004-11-12 | 21976 2004-11-13 | 18824 2004-11-14 | 20543 2004-11-15 | 18829 2004-11-16 | 24248 2004-11-17 | 18093 2004-11-18 | 25675 2004-11-19 | 27084 2004-11-20 | 22362 2004-11-21 | 25187 2004-11-22 | 26451 2004-11-23 | 26016 2004-11-24 | 23147 2004-11-25 | 25785 2004-11-26 | 20584 2004-11-27 | 25615 2004-11-28 | 6931 2004-11-29 | 6549 (41 rows) So it looks like an aggregation of 2 weeks worth of data is more than 10%, so its aligned with what you were saying. That also jives, since when I re-enable seq_scan and do the report for one or two day's of data - it uses the index (previously it did not - perhaps there was not enough data). > If it's not right, you may want to try to adjust random_page_cost > and/or > effective_cache_size so that the planner's estimated costs are more in > line with reality. Beware of making such adjustments on the basis of > only one test case, though. I'll look in the manual and try playing with these options, thanks for your suggestions! In the end, I'm guessing that if I need to retrieve a few hundred thousand rows from disk, faster disks might make the biggest impact on performance. Thanks for your help! -- Dave Steinberg http://www.geekisp.com/
В списке pgsql-sql по дате отправления: