Low perfomance SUM and Group by large databse

Поиск

Список

Период

Сортировка

От	Sergio Charpinel Jr.
Тема	Low perfomance SUM and Group by large databse
Дата	21 июня 2010 г. 14:42:24
Msg-id	AANLkTin7cGA3b3P6gPvptI9OaD8jf9O5oFWBfKcUocTT@mail.gmail.com обсуждение исходный текст
Ответы	Re: Low perfomance SUM and Group by large databse (Craig Ringer <craig@postnewspapers.com.au>) Re: Low perfomance SUM and Group by large databse (Russell Smith <mr-russ@pws.com.au>)
Список	pgsql-performance

Дерево обсуждения

Hi,

I'm getting low performance on SUM and GROUP BY queries.

How can I improve my database to perform such queries.

Here is my table schema:

=> \d acct_2010_25

Tabela "public.acct_2010_25"

Coluna | Tipo | Modificadores

----------------+-----------------------------+------------------------------------------------------------------------

ip_src | inet | not null default '0.0.0.0'::inet

ip_dst | inet | not null default '0.0.0.0'::inet

as_src | bigint | not null default 0

as_dst | bigint | not null default 0

port_src | integer | not null default 0

port_dst | integer | not null default 0

tcp_flags | smallint | not null default 0

ip_proto | smallint | not null default 0

packets | integer | not null

flows | integer | not null default 0

bytes | bigint | not null

stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00 BC'::timestamp without time zone

stamp_updated | timestamp without time zone |

Índices:

"acct_2010_25_pk" PRIMARY KEY, btree (stamp_inserted, ip_src, ip_dst, port_src, port_dst, ip_proto)

"ibytes_acct_2010_25" btree (bytes)

Here is my one query example (could add pk to flow and packet fields):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto,SUM("bytes"),SUM("packets"),SUM("flows") FROM "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto order by SUM(bytes) desc LIMIT 50 OFFSET 0;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=3998662.81..3998662.94 rows=50 width=50) (actual time=276981.107..276981.133 rows=50 loops=1)

-> Sort (cost=3998662.81..4001046.07 rows=953305 width=50) (actual time=276981.105..276981.107 rows=50 loops=1)

Sort Key: sum(bytes)

-> GroupAggregate (cost=3499863.27..3754872.33 rows=953305 width=50) (actual time=165468.257..182677.580 rows=8182616 loops=1)

-> Sort (cost=3499863.27..3523695.89 rows=9533049 width=50) (actual time=165468.022..168908.828 rows=9494165 loops=1)

Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto

-> Seq Scan on acct_2010_25 (cost=0.00..352648.10 rows=9533049 width=50) (actual time=0.038..50860.391 rows=9494165 loops=1)

Filter: ((stamp_inserted >= '2010-06-20 10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21 10:10:00'::timestamp without time zone))

Total runtime: 278791.661 ms

(9 registros)

Another one just summing bytes (still low):

=> explain analyze SELECT ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto,SUM("bytes") FROM "acct_2010_25" WHERE "stamp_inserted">='2010-06-20 10:10' AND "stamp_inserted"<'2010-06-21 10:10' GROUP BY ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto LIMIT 50 OFFSET 0;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=3395202.50..3395213.12 rows=50 width=42) (actual time=106261.359..106261.451 rows=50 loops=1)

-> GroupAggregate (cost=3395202.50..3602225.48 rows=974226 width=42) (actual time=106261.357..106261.435 rows=50 loops=1)

-> Sort (cost=3395202.50..3419558.14 rows=9742258 width=42) (actual time=106261.107..106261.169 rows=176 loops=1)

Sort Key: ip_src, port_src, ip_dst, port_dst, tcp_flags, ip_proto

-> Seq Scan on acct_2010_25 (cost=0.00..367529.72 rows=9742258 width=42) (actual time=0.073..8058.598 rows=9494165 loops=1)

Filter: ((stamp_inserted >= '2010-06-20 10:10:00'::timestamp without time zone) AND (stamp_inserted < '2010-06-21 10:10:00'::timestamp without time zone))

Total runtime: 109911.882 ms

(7 registros)

The server has 2 Intel(R) Xeon(R) CPU E5430 @ 2.66GHz and 16GB RAM.

I'm using PostgreSQL 8.1.18 default config from Centos 5.5 (just increased checkpoint_segments to 50).

What can I change to increase performance?

Thanks in advance.

Cheers.

--
Sergio Roberto Charpinel Jr.

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Kevin Grittner"
Дата: 21 июня 2010 г., 13:54:02
Сообщение: Re: HashAggregate slower than sort?

Следующее

От: "Davor J."
Дата: 21 июня 2010 г., 14:52:46
Сообщение: Re: Slow function in queries SELECT clause.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Low perfomance SUM and Group by large databse

Предыдущее

Следующее