*very* slow query to summarize data for a month ...

Поиск
Список
Период
Сортировка
Table structure is simple:

CREATE TABLE traffic_logs (
    company_id bigint,
    ip_id bigint,
    port integer,
    bytes bigint,
    runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
    FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
     AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
                                                                              QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual time=32983.36..47586.17 rows=144 loops=1)
   ->  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual time=32957.40..42817.88 rows=462198 loops=1)
         ->  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual time=32957.38..36261.31 rows=462198 loops=1)
               Sort Key: c.company_name, ts.company_id
               ->  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41) (actual time=13983.07..22642.14 rows=462198
loops=1)
                     Merge Cond: ("outer".company_id = "inner".company_id)
                     ->  Sort  (cost=24.41..25.29 rows=352 width=25) (actual time=5.52..7.40 rows=348 loops=1)
                           Sort Key: c.company_id
                           ->  Seq Scan on company c  (cost=0.00..9.52 rows=352 width=25) (actual time=0.02..2.78
rows=352loops=1) 
                     ->  Sort  (cost=31297.04..31317.57 rows=8213 width=16) (actual time=13977.49..16794.41 rows=462198
loops=1)
                           Sort Key: ts.company_id
                           ->  Index Scan using tl_month on traffic_logs ts  (cost=0.00..30763.02 rows=8213 width=16)
(actualtime=0.29..5562.25 rows=462198 loops=1) 
                                 Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time
zone)
 Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:

                                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=39578.63..39660.76 rows=821 width=41) (actual time=87805.47..101251.35 rows=144 loops=1)
   ->  Group  (cost=39578.63..39640.23 rows=8213 width=41) (actual time=87779.56..96824.56 rows=462198 loops=1)
         ->  Sort  (cost=39578.63..39599.17 rows=8213 width=41) (actual time=87779.52..90781.48 rows=462198 loops=1)
               Sort Key: c.company_name, ts.company_id
               ->  Merge Join  (cost=38899.14..39044.62 rows=8213 width=41) (actual time=64073.98..72783.68 rows=462198
loops=1)
                     Merge Cond: ("outer".company_id = "inner".company_id)
                     ->  Sort  (cost=24.41..25.29 rows=352 width=25) (actual time=64.66..66.55 rows=348 loops=1)
                           Sort Key: c.company_id
                           ->  Seq Scan on company c  (cost=0.00..9.52 rows=352 width=25) (actual time=1.76..61.70
rows=352loops=1) 
                     ->  Sort  (cost=38874.73..38895.27 rows=8213 width=16) (actual time=64009.26..66860.71 rows=462198
loops=1)
                           Sort Key: ts.company_id
                           ->  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual
time=5.02..-645982.04rows=462198 loops=1) 
                                 Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without
timezone) 
 Total runtime: 101277.17 msec
(14 rows)


В списке pgsql-performance по дате отправления:

Предыдущее
От:
Дата:
Сообщение: Re: [NOVICE] error while executing a c program with embedded sql
Следующее
От: "Patrick Hatcher"
Дата:
Сообщение: Re: *very* slow query to summarize data for a month ...