On 21/06/10 22:42, Sergio Charpinel Jr. wrote:
> Hi,
>
> I'm getting low performance on SUM and GROUP BY queries.
> How can I improve my database to perform such queries.
> -> Sort (cost=3499863.27..3523695.89 rows=9533049
> width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
> Sort Key: ip_src, port_src, ip_dst, port_dst,
> tcp_flags, ip_proto
> -> Seq Scan on acct_2010_25 (cost=0.00..352648.10
> rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)
> Filter: ((stamp_inserted >= '2010-06-20
> 10:10:00'::timestamp without time zone) AND (stamp_inserted <
> '2010-06-21 10:10:00'::timestamp without time zone))
Provide an index on at least (ip_src,port_src,ip_dst,port_dst). If you
frequently do other queries that only want some of that information you
could create several individual indexes for those columns instead, as Pg
will combine them for a query, but that is much less efficient than an
index across all four columns.
CREATE INDEX ip_peers_idx ON acct_2010_25(ip_src,port_src,ip_dst_port_dst);
Every index added costs you insert/update/delete speed, so try to find
the smallest/simplest index that gives you acceptable performance.
> Another one just summing bytes (still low):
>
> => explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags,
> ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE
> "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21
> 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto
> LIMIT 50 OFFSET 0;
Same deal. You have no suitable index, so Pg has to do a sequential scan
of the table. Since you appear to query on stamp_inserted a lot, you
should index it.
--
Craig Ringer