Low perfomance SUM and Group by large databse

Поиск
Список
Период
Сортировка
От Sergio Charpinel Jr.
Тема Low perfomance SUM and Group by large databse
Дата
Msg-id AANLkTin7cGA3b3P6gPvptI9OaD8jf9O5oFWBfKcUocTT@mail.gmail.com
обсуждение исходный текст
Ответы Re: Low perfomance SUM and Group by large databse  (Craig Ringer <craig@postnewspapers.com.au>)
Re: Low perfomance SUM and Group by large databse  (Russell Smith <mr-russ@pws.com.au>)
Список pgsql-performance
Hi,

I'm getting low performance on SUM and GROUP BY queries.
How can I improve my database to perform such queries.

Here is my table schema:
=> \d acct_2010_25
                                             Tabela "public.acct_2010_25"
     Coluna     |            Tipo             |                             Modificadores                              
----------------+-----------------------------+------------------------------------------------------------------------
 ip_src         | inet                        | not null default '0.0.0.0'::inet
 ip_dst         | inet                        | not null default '0.0.0.0'::inet
 as_src         | bigint                      | not null default 0
 as_dst         | bigint                      | not null default 0
 port_src       | integer                     | not null default 0
 port_dst       | integer                     | not null default 0
 tcp_flags      | smallint                    | not null default 0
 ip_proto       | smallint                    | not null default 0
 packets        | integer                     | not null
 flows          | integer                     | not null default 0
 bytes          | bigint                      | not null
 stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00 BC'::timestamp without time zone
 stamp_updated  | timestamp without time zone | 
Índices:
    "acct_2010_25_pk" PRIMARY KEY, btree (stamp_inserted, ip_src, ip_dst, port_src, port_dst, ip_proto)
    "ibytes_acct_2010_25" btree (bytes)

Here is my one query example (could add pk to flow and packet fields):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0;
                                                                                      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3998662.81..3998662.94 rows=50 width=50) (actual time=276981.107..276981.133 rows=50 loops=1)
   ->  Sort  (cost=3998662.81..4001046.07 rows=953305 width=50) (actual time=276981.105..276981.107 rows=50 loops=1)
         Sort Key: sum(bytes)
         ->  GroupAggregate  (cost=3499863.27..3754872.33 rows=953305 width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)
               ->  Sort  (cost=3499863.27..3523695.89 rows=9533049 width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)
                     Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto
                     ->  Seq Scan on acct_2010_25  (cost=0.00..352648.10 rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)
                           Filter: ((stamp_inserted >= '2010-06-20 10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21 10:10:00'::timestamp without time zone))
 Total runtime: 278791.661 ms
(9 registros)

Another one just summing bytes (still low):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto LIMIT 50 OFFSET 0;
                                                                                   QUERY PLAN                                                                                   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3395202.50..3395213.12 rows=50 width=42) (actual time=106261.359..106261.451 rows=50 loops=1)
   ->  GroupAggregate  (cost=3395202.50..3602225.48 rows=974226 width=42) (actual time=106261.357..106261.435 rows=50 loops=1)
         ->  Sort  (cost=3395202.50..3419558.14 rows=9742258 width=42) (actual time=106261.107..106261.169 rows=176 loops=1)
               Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto
               ->  Seq Scan on acct_2010_25  (cost=0.00..367529.72 rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)
                     Filter: ((stamp_inserted >= '2010-06-20 10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21 10:10:00'::timestamp without time zone))
 Total runtime: 109911.882 ms
(7 registros)


The server has 2 Intel(R) Xeon(R) CPU  E5430 @ 2.66GHz and 16GB RAM.
I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just increased checkpoint_segments to 50).

What can I change to increase performance?

Thanks in advance.

Cheers.

--
Sergio Roberto Charpinel Jr.

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: HashAggregate slower than sort?
Следующее
От: "Davor J."
Дата:
Сообщение: Re: Slow function in queries SELECT clause.