Re: PG7.4.5: query not using index on date column

Поиск
Список
Период
Сортировка
От Dave Steinberg
Тема Re: PG7.4.5: query not using index on date column
Дата
Msg-id 726FC50F-4222-11D9-A28C-0030656E7E7A@redterror.net
обсуждение исходный текст
Ответ на Re: PG7.4.5: query not using index on date column  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Sorry for the delay in getting back on this thread, painting my  
apartment meant no internet over this thanksgiving break... :)

> How many rows in the table altogether?  A rough guess is a few million
> based on the estimated seqscan cost.  That would mean that this query
> is retrieving about 10% of the table, which is a large enough fraction
> that the planner will probably think a seqscan is best.  It may be  
> right.
> If you do "set enable_seqscan = off", how does the EXPLAIN ANALYZE
> output change?

Right now it's a hair under 800k rows.  With enable_seqscan = off, I  
get this as my analyze results:
                                                                             QUERY PLAN
------------------------------------------------------------------------ 
------------------------------------------------------------------------ 
------------- Sort  (cost=361676.23..361676.33 rows=38 width=226) (actual  
time=46076.756..46076.924 rows=69 loops=1)   Sort Key: count   ->  Subquery Scan aggs  (cost=361674.10..361675.24
rows=38 
 
width=226) (actual time=46068.621..46076.159 rows=69 loops=1)         ->  HashAggregate  (cost=361674.10..361674.86
rows=38 
 
width=54) (actual time=46068.596..46075.170 rows=69 loops=1)               ->  Index Scan using received_date_idx on
messages  
 
(cost=0.00..349968.44 rows=585283 width=54) (actual  
time=20.988..15020.821 rows=589543 loops=1)                     Index Cond: ((received_date >= '2004-11-01'::date)  
AND (received_date <= '2004-11-30'::date)) Total runtime: 46091.315 ms

Quite a bit higher with these estimates.  Using your 10% of the table  
guess above, I did a breakdown by date and got these results:

geekadmin@db:geekisp=>select received_date, count(received_date) from  
spamreport.messages group by received_date order by received_date asc; received_date | count
---------------+------- 2004-10-20    |  7592 2004-10-21    | 19699 2004-10-22    | 17311 2004-10-23    | 16730
2004-10-24   | 18249 2004-10-25    | 16718 2004-10-26    | 16951 2004-10-27    | 19818 2004-10-28    | 19580 2004-10-29
  | 17610 2004-10-30    | 16210 2004-10-31    | 20468 2004-11-01    | 12337 2004-11-02    |  9012 2004-11-03    | 20871
2004-11-04   | 20103 2004-11-05    | 18807 2004-11-06    | 20131 2004-11-07    | 22291 2004-11-08    | 23041 2004-11-09
  | 20254 2004-11-10    | 17810 2004-11-11    | 21091 2004-11-12    | 21976 2004-11-13    | 18824 2004-11-14    | 20543
2004-11-15   | 18829 2004-11-16    | 24248 2004-11-17    | 18093 2004-11-18    | 25675 2004-11-19    | 27084 2004-11-20
  | 22362 2004-11-21    | 25187 2004-11-22    | 26451 2004-11-23    | 26016 2004-11-24    | 23147 2004-11-25    | 25785
2004-11-26   | 20584 2004-11-27    | 25615 2004-11-28    |  6931 2004-11-29    |  6549
 
(41 rows)

So it looks like an aggregation of 2 weeks worth of data is more than  
10%, so its aligned with what you were saying.  That also jives, since  
when I re-enable seq_scan and do the report for one or two day's of  
data - it uses the index (previously it did not - perhaps there was not  
enough data).

> If it's not right, you may want to try to adjust random_page_cost  
> and/or
> effective_cache_size so that the planner's estimated costs are more in
> line with reality.  Beware of making such adjustments on the basis of
> only one test case, though.

I'll look in the manual and try playing with these options, thanks for  
your suggestions!  In the end, I'm guessing that if I need to retrieve  
a few hundred thousand rows from disk, faster disks might make the  
biggest impact on performance.

Thanks for your help!
-- 
Dave Steinberg
http://www.geekisp.com/


В списке pgsql-sql по дате отправления:

Предыдущее
От: Michael Fuhr
Дата:
Сообщение: Re: stored procedures in postgresql user plpgsql
Следующее
От: "Nurdin"
Дата:
Сообщение: create stored procedure from temporary table