Hi Folks, I was hoping someone could help me to improve the performance of a query I've got that insists on doing a
seq.scan on a large table. I'm trying to do some reporting based on my spam logs which I've partly digested and stored
ina table. Here are the particulars:
The messages table:
Column | Type | Modifiers
-------------------+-----------------------+--------------message_id | integer | not null default
nextval('spamreport.messages_message_id_seq'::text)received_date | date | not nullscore
|numeric | not nulluser_threshold | numeric | not nullraw_email_address | character
varying(64)| not nullprocessing_time | numeric | not nullsize | integer |
notnullfuzzed_address | character varying(64) | not nulldomain | character varying(64) | not null
Indexes: "messages_pkey" primary key, btree (message_id) "domain_idx" btree ("domain") "fuzzy_idx" btree
(fuzzed_address) "received_date_idx" btree (received_date)
And here's the primary query I run, along with explain analyze output:
>> explain analyze SELECT * FROM ( SELECT domain, count(*) as count, max(score) as max_score,
avg(score)as average_score, stddev(score) as stddev_score, sum(CASE WHEN score > user_threshold THEN 1 ELSE
0END) as spam_count, avg(processing_time) as average_time, avg(size) as average_size FROM messages
WHEREreceived_date BETWEEN '2004-11-01' AND '2004-11-30' GROUP BY domain ) AS aggs ORDER BY count DESC;
QUERY PLAN
-------------------------------------------------------------------Sort (cost=30303.51..30303.60 rows=35 width=226)
(actualtime=29869.716..29869.883 rows=69 loops=1) Sort Key: count -> Subquery Scan aggs (cost=30301.56..30302.61
rows=35width=226) (actual time=29861.705..29869.240 rows=69 loops=1) -> HashAggregate (cost=30301.56..30302.26
rows=35width=54) (actual time=29861.681..29868.261 rows=69 loops=1) -> Seq Scan on messages
(cost=0.00..21573.04rows=436426 width=54) (actual time=5.523..6304.657 rows=462931 loops=1) Filter:
((received_date>= '2004-11-01'::date) AND (received_date <= '2004-11-30'::date))Total runtime: 29870.437 ms
This database gets vacuumed nightly. Also, the query plan stays the same even if I restrict the received_date column
downto a single day.
Thanks in advance,
--
Dave Steinberg
http://www.geekisp.com/