Re: *very* slow query to summarize data for a month ...

Поиск
Список
Период
Сортировка
От Marc G. Fournier
Тема Re: *very* slow query to summarize data for a month ...
Дата
Msg-id 20031111134925.L56037@ganymede.hub.org
обсуждение исходный текст
Ответ на Re: *very* slow query to summarize data for a month ...  (Dennis Bjorklund <db@zigo.dhs.org>)
Ответы Re: *very* slow query to summarize data for a month ...  (Greg Stark <gsstark@mit.edu>)
Список pgsql-performance

On Tue, 11 Nov 2003, Dennis Bjorklund wrote:

> On Mon, 10 Nov 2003, Marc G. Fournier wrote:
>
> >
> > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> >     FROM company c, traffic_logs ts
> >    WHERE c.company_id = ts.company_id
> >      AND month_trunc(ts.runtime) = '2003-10-01'
> > GROUP BY company_name,ts.company_id;
>
> What if you do
>
>   ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
>
> and add an index like (runtime, company_name, company_id)?

Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
    FROM traffic_logs ts
   WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;


                                                                        QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=31630.84..31693.05 rows=829 width=16) (actual time=14862.71..26552.39 rows=144 loops=1)
   ->  Group  (cost=31630.84..31672.31 rows=8295 width=16) (actual time=9634.28..20967.07 rows=462198 loops=1)
         ->  Sort  (cost=31630.84..31651.57 rows=8295 width=16) (actual time=9634.24..12838.73 rows=462198 loops=1)
               Sort Key: company_id
               ->  Index Scan using tl_month on traffic_logs ts  (cost=0.00..31090.93 rows=8295 width=16) (actual
time=0.26..6043.35rows=462198 loops=1) 
                     Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 26659.35 msec
(7 rows)



-OR-

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
    FROM traffic_logs ts
   WHERE ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
GROUP BY ts.company_id;


                                                                            QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=81044.53..84424.21 rows=45062 width=16) (actual time=13307.52..29274.66 rows=144 loops=1)
   ->  Group  (cost=81044.53..83297.65 rows=450625 width=16) (actual time=10809.02..-673265.13 rows=462198 loops=1)
         ->  Sort  (cost=81044.53..82171.09 rows=450625 width=16) (actual time=10808.99..14069.79 rows=462198 loops=1)
               Sort Key: company_id
               ->  Seq Scan on traffic_logs ts  (cost=0.00..38727.35 rows=450625 width=16) (actual time=0.07..6801.92
rows=462198loops=1) 
                     Filter: ((runtime >= '2003-10-01 00:00:00'::timestamp without time zone) AND (runtime <
'2003-11-0100:00:00'::timestamp without time zone)) 
 Total runtime: 29385.97 msec
(7 rows)


Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tl_month on traffic_logs ts  (cost=0.00..31096.36 rows=8297 width=16) (actual time=0.96..5432.93
rows=462198loops=1) 
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 8092.88 msec
(3 rows)

and without the index, >15k msec:

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Seq Scan on traffic_logs ts  (cost=0.00..38719.55 rows=8297 width=16) (actual time=0.11..11354.45 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 15353.57 msec
(3 rows)

so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...

I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...

If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(

В списке pgsql-performance по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: *very* slow query to summarize data for a month ...
Следующее
От: Greg Stark
Дата:
Сообщение: Re: *very* slow query to summarize data for a month ...