conceptual method to create high performance query involving time

Поиск
Список
Период
Сортировка
От Don Bowman
Тема conceptual method to create high performance query involving time
Дата
Msg-id FE045D4D9F7AED4CBFF1B3B813C85337045D81D5@mail.sandvine.com
обсуждение исходный текст
Список pgsql-performance
I have a table which has e.g.

CREATE TABLE portstats
(
    id        serial,
    logtime   TIMESTAMP,
    cluster   VARCHAR(40),
    element   VARCHAR(40),
    port      INT,
    rxOctets  BIGINT,
    txOctets  BIGINT
);

which is used for logging statistics from network equipment.
cluster is like the location.
rxOctets, txOctets are numbers which increase over time.

Now, i would like to generate a chart which shows the
bitrate. So i need subtract rxOctets from a previous
value and divide by the time range.

To be efficient, and avoid fetching too many points, i
want the interval between points I select to be a function
of the time range. E.g., when I'm doing a 1-day chart,
i would like to select points that are 15min apart.
When I'm doing a 1-yr query, I would like to select
points that are e.g. 4hours apart. I can make this
determination in the script that generates the statement.

The problem i'm having is that this is
a) a very slow operation
b) selects all data points on t1, and then the interval
   apart one on t2... so i still end up with too many
   points.

points are logged every ~5 minutes, but there is
some small variation on the interval (and some observations
might be missing due to eg communication loss to db).
[ a process goes along later and decimates out points
as they age to prevent the db from becoming very large].

The query I have is below. The question is ... what
is the best strategy for an operation of this nature?

SELECT
    t1.port,
    t1.logtime AS start,
    t2.logtime AS end,
    t1.cluster,
    t1.element,
    (8.0 * (t2.rxoctets - t1.rxoctets) /
     (extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS rxbps,
    (8.0 * (t2.txoctets - t1.txoctets) /
     (extract(EPOCH FROM(t2.logtime - t1.logtime))))::int8 AS txbps
FROM
    portstats t1
 INNER JOIN portstats t2
         ON t2.cluster    = t1.cluster
        AND t2.element    = t1.element
        AND t2.port       = t1.port
        AND t2.logtime    =
 (SELECT logtime
    FROM portstats t3
   WHERE t3.cluster   = t1.cluster
     AND t3.element   = t1.element
     AND t3.port      = t1.port
     AND t3.logtime > t1.logtime + '00:15:00'
ORDER BY cluster ASC,
         element ASC,
         port ASC,
         logtime ASC
         LIMIT 1)
WHERE t1.cluster = 'somecluster'
  AND (t1.element = 'somelement')
  AND (t1.logtime BETWEEN '2004-01-07 00:00' AND '2004-02-08 00:00')
ORDER BY
    t1.cluster ASC,
    t1.element ASC,
    t1.port ASC,
    t1.logtime ASC
;

The query plan for 1 week is below, this takes ~2s to operate. It gets very
slow for 1yr.

Sort  (cost=14055.35..14067.74 rows=4956 width=176) (actual
time=1523.956..1538.354 rows=5943 loops=1)
 Sort Key: t1.svcluster, t1.element, t1.port, t1.logtime
 ->  Merge Join  (cost=2304.49..13751.18 rows=4956 width=176) (actual
time=1008.620..1329.766 rows=5943 loops=1)
       Merge Cond: (("outer"."?column10?" = "inner".logtime) AND
("outer".port = "inner".port))
       ->  Sort  (cost=977.39..992.25 rows=5944 width=136) (actual
time=678.564..692.974 rows=5943 loops=1)
             Sort Key: (subplan), t1.port
             ->  Index Scan using portstats_element_idx on portstats t1
(cost=0.00..604.78 rows=5944 width=136) (actual time=0.191..581.311 ro
ws=5943 loops=1)
                   Index Cond: (element = 'my-element.mydomain.net'::bpchar)
                   Filter: ((svcluster = 'my-cluster'::bpchar) AND (logtime
>= '2004-01-07 00:00:00-05'::timestamp with time zone) AND (logtime
 <= '2004-02-08 00:00:00-05'::timestamp with time zone))
                   SubPlan
                     ->  Limit  (cost=0.00..0.62 rows=1 width=104) (actual
time=0.064..0.066 rows=1 loops=5943)
                           ->  Index Scan using www6 on portstats t3
(cost=0.00..399.28 rows=643 width=104) (actual time=0.054..0.054 rows=1 l
oops=5943)
                                 Index Cond: ((svcluster = $1) AND (element
= $2) AND (port = $3) AND (logtime > ($4 + '00:15:00'::interval)))
       ->  Sort  (cost=1327.10..1356.00 rows=11560 width=136) (actual
time=289.168..321.522 rows=11771 loops=1)
             Sort Key: t2.logtime, t2.port
             ->  Index Scan using portstats_element_idx on portstats t2
(cost=0.00..546.98 rows=11560 width=136) (actual time=0.103..192.027 r
ows=11560 loops=1)
                   Index Cond: ('my-element.mydomain.net'::bpchar = element)
                   Filter: (('my-cluster'::bpchar = svcluster))
Total runtime: 1609.411 ms
(19 rows)


В списке pgsql-performance по дате отправления:

Предыдущее
От: Rod Taylor
Дата:
Сообщение: Re: RAID or manual split?
Следующее
От: John Siracusa
Дата:
Сообщение: Column correlation drifts, index ignored again