conceptual method to create high performance query involving time

Поиск

Список

Период

Сортировка

От	Don Bowman
Тема	conceptual method to create high performance query involving time
Дата	21 февраля 2004 г. 20:12:39
Msg-id	FE045D4D9F7AED4CBFF1B3B813C85337045D81D5@mail.sandvine.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

I have a table which has e.g.

CREATE TABLE portstats
(
    id        serial,
    logtime   TIMESTAMP,
    cluster   VARCHAR(40),
    element   VARCHAR(40),
    port      INT,
    rxOctets  BIGINT,
    txOctets  BIGINT
);

which is used for logging statistics from network equipment.
cluster is like the location.
rxOctets, txOctets are numbers which increase over time.

Now, i would like to generate a chart which shows the
bitrate. So i need subtract rxOctets from a previous
value and divide by the time range.

To be efficient, and avoid fetching too many points, i
want the interval between points I select to be a function
of the time range. E.g., when I'm doing a 1-day chart,
i would like to select points that are 15min apart.
When I'm doing a 1-yr query, I would like to select
points that are e.g. 4hours apart. I can make this
determination in the script that generates the statement.

The problem i'm having is that this is
a) a very slow operation
b) selects all data points on t1, and then the interval
   apart one on t2... so i still end up with too many
   points.

points are logged every ~5 minutes, but there is
some small variation on the interval (and some observations
might be missing due to eg communication loss to db).
[ a process goes along later and decimates out points
as they age to prevent the db from becoming very large].

The query I have is below. The question is ... what
is the best strategy for an operation of this nature?

SELECT
    t1.port,
    t1.logtime AS start,
    t2.logtime AS end,
    t1.cluster,
    t1.element,
    (8.0 * (t2.rxoctets - t1.rxoctets) /
     (extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS rxbps,
    (8.0 * (t2.txoctets - t1.txoctets) /
     (extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS txbps
FROM
    portstats t1
 INNER JOIN portstats t2
         ON t2.cluster    = t1.cluster
        AND t2.element    = t1.element
        AND t2.port       = t1.port
        AND t2.logtime    =
 (SELECT logtime
    FROM portstats t3
   WHERE t3.cluster   = t1.cluster
     AND t3.element   = t1.element
     AND t3.port      = t1.port
     AND t3.logtime > t1.logtime + '00:15:00'
ORDER BY cluster ASC,
         element ASC,
         port ASC,
         logtime ASC
         LIMIT 1)
WHERE t1.cluster = 'somecluster'
  AND (t1.element = 'somelement')
  AND (t1.logtime BETWEEN '2004-01-07 00:00' AND '2004-02-08 00:00')
ORDER BY
    t1.cluster ASC,
    t1.element ASC,
    t1.port ASC,
    t1.logtime ASC
;

The query plan for 1 week is below, this takes ~2s to operate. It gets very
slow for 1yr.

Sort  (cost=14055.35..14067.74 rows=4956 width=176) (actual
time=1523.956..1538.354 rows=5943 loops=1)
 Sort Key: t1.svcluster, t1.element, t1.port, t1.logtime
 ->  Merge Join  (cost=2304.49..13751.18 rows=4956 width=176) (actual
time=1008.620..1329.766 rows=5943 loops=1)
       Merge Cond: (("outer"."?column10?" = "inner".logtime) AND
("outer".port = "inner".port))
       ->  Sort  (cost=977.39..992.25 rows=5944 width=136) (actual
time=678.564..692.974 rows=5943 loops=1)
             Sort Key: (subplan), t1.port
             ->  Index Scan using portstats_element_idx on portstats t1
(cost=0.00..604.78 rows=5944 width=136) (actual time=0.191..581.311 ro
ws=5943 loops=1)
                   Index Cond: (element = 'my-element.mydomain.net'::bpchar)
                   Filter: ((svcluster = 'my-cluster'::bpchar) AND (logtime
>= '2004-01-07 00:00:00-05'::timestamp with time zone) AND (logtime
 <= '2004-02-08 00:00:00-05'::timestamp with time zone))
                   SubPlan
                     ->  Limit  (cost=0.00..0.62 rows=1 width=104) (actual
time=0.064..0.066 rows=1 loops=5943)
                           ->  Index Scan using www6 on portstats t3
(cost=0.00..399.28 rows=643 width=104) (actual time=0.054..0.054 rows=1 l
oops=5943)
                                 Index Cond: ((svcluster = $1) AND (element
= $2) AND (port = $3) AND (logtime > ($4 + '00:15:00'::interval)))
       ->  Sort  (cost=1327.10..1356.00 rows=11560 width=136) (actual
time=289.168..321.522 rows=11771 loops=1)
             Sort Key: t2.logtime, t2.port
             ->  Index Scan using portstats_element_idx on portstats t2
(cost=0.00..546.98 rows=11560 width=136) (actual time=0.103..192.027 r
ows=11560 loops=1)
                   Index Cond: ('my-element.mydomain.net'::bpchar = element)
                   Filter: (('my-cluster'::bpchar = svcluster))
Total runtime: 1609.411 ms
(19 rows)

В списке pgsql-performance по дате отправления:

Предыдущее

От: Rod Taylor
Дата: 21 февраля 2004 г., 05:18:36
Сообщение: Re: RAID or manual split?

Следующее

От: John Siracusa
Дата: 21 февраля 2004 г., 23:18:04
Сообщение: Column correlation drifts, index ignored again

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

conceptual method to create high performance query involving time

Предыдущее

Следующее