Re: [PERFORM] Very poor read performance, query independent

Поиск

Список

Период

Сортировка

От	Charles Nadeau
Тема	Re: [PERFORM] Very poor read performance, query independent
Дата	16 июля 2017 г. 15:20:57
Msg-id	CADFyZw7JZ5RGzB5aZYOtfnUzw7pT2is_+ughn3yyVkTq-XZxmQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PERFORM] Very poor read performance, query independent (Igor Neyman <ineyman@perceptron.com>)
Список	pgsql-performance

Дерево обсуждения

Igor,

I set the work_mem to 12GB, restarted postrgresql, repeat the same "explain (analyze, buffers)..." as above and the read throughput was very low, at most 10MB/s. All the sorting operation are now done in memory.

I lowered the work_mem back to 6GB, restarted postrgresql, repeat the same "explain (analyze, buffers)..." as above and the read throughput was very low, at most 10MB/s. The 1st sorting operation is still done in memory, the second one to disk. I think I need about 4GB to do all sort in memory.

One thing I remember from Friday's "explain (analyze, buffers)...". I set temp_buffer and work_mem to 6GB as I read both your message one after the other. So I decided to try again: I then set work_mem=6GB, temp_buffers=6GB, restarted postrgresql, repeat the same "explain (analyze, buffers)..." as above and the read throughput was very low again, at most 10MB/s. The 1st sorting operation is still done in memory, the second one to disk.

For the last test, I brought back work_mem and temp_buffer to their original value. The read throughput is still low.

In all cases, about 20 minutes after the query starts, it start writing to disk furiously. Here are the peak values as reported by dstat:

work_mem=12GB, temp_buffers=8MB: peak of 393MB/s

work_mem=6GB, temp_buffers=8MB: peak of 579MB/s

work_mem=6GB, temp_buffers=6GB: peak of 418MB/s

work_mem=1468006kB and temp_buffers=8MB, peak of 61MB/s

Also, at peak write, the server alomost ran of memory: the cache almost goes to 0 and it starts swapping.

This query is a bit extreme in terms of sorting. Maybe I should try to benchmark while counting all the records of my biggest table like Mark Kirkwood suggested. I'll do some more tests and post the results back to the mailing list.

Thanks!

Charles

On Fri, Jul 14, 2017 at 9:13 PM, Igor Neyman <ineyman@perceptron.com> wrote:

From: Charles Nadeau [mailto:charles.nadeau@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <ineyman@perceptron.com>
Cc: Jeff Janes <jeff.janes@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content of the directory that stores the temporary files, I found that I need at most 21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
srcaddr,
dstaddr,
dstport,
COUNT(*) AS conversation,
SUM(doctets) / 1024 / 1024 AS mbytes
FROM
flowscompact,
mynetworks
WHERE
mynetworks.ipaddr >>= flowscompact.srcaddr
AND dstaddr IN
(
SELECT
dstaddr
FROM
dstexterne
)
GROUP BY
srcaddr,
dstaddr,
dstport
ORDER BY
mbytes DESC LIMIT 50;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=48135680.07..48135680.22 rows=50 width=52) (actual time=2227678.196..2227678.223 rows=50 loops=1)
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Unique (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual time=2227678.194..2227678.217 rows=50 loops=1)
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Sort (cost=48135680.07..48137002.33 rows=2644514 width=52) (actual time=2227678.192..2227678.202 rows=50 loops=1)
Sort Key: (((sum(flows.doctets) / '1024'::numeric) / '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
Sort Method: quicksort Memory: 654395kB
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> GroupAggregate (cost=48059426.65..48079260.50 rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Sort (cost=48059426.65..48060748.90 rows=2644514 width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
Sort Method: external merge Disk: 3049216kB
Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
-> Gather (cost=30060688.07..48003007.07 rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 loops=1)
Workers Planned: 12
Workers Launched: 12
Buffers: shared hit=728798037 read=82974833
-> Hash Semi Join (cost=30059688.07..47951761.31 rows=220376 width=20) (actual time=1268845.181..2007864.725 rows=7057357 loops=13)
Hash Cond: (flows.dstaddr = flows_1.dstaddr)
Buffers: shared hit=728795193 read=82974833
-> Nested Loop (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
Buffers: shared hit=590692229 read=14991777
-> Parallel Seq Scan on flows (cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 rows=45371630 loops=13)
Buffers: shared hit=860990 read=14991777
-> Index Only Scan using mynetworks_ipaddr_idx on mynetworks (cost=0.03..0.03 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=589831190)
Index Cond: (ipaddr >>= (flows.srcaddr)::ip4r)
Heap Fetches: 0
Buffers: shared hit=589831203
-> Hash (cost=30059641.47..30059641.47 rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
Buckets: 4194304 (originally 16384) Batches: 1 (originally 1) Memory Usage: 166486kB
Buffers: shared hit=138102964 read=67983056
-> HashAggregate (cost=30059561.64..30059601.56 rows=13305 width=4) (actual time=1265248.165..1267432.083 rows=3803508 loops=13)
Group Key: flows_1.dstaddr
Buffers: shared hit=138102964 read=67983056
-> Nested Loop Anti Join (cost=0.00..29729327.92 rows=660467447 width=4) (actual time=0.389..1201072.707 rows=125838232 loops=13)
Join Filter: (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
Rows Removed by Join Filter: 503353617
Buffers: shared hit=138102964 read=67983056
-> Seq Scan on flows flows_1 (cost=0.00..17836152.73 rows=661128576 width=4) (actual time=0.322..343152.274 rows=589831190 loops=13)
Buffers: shared hit=138102915 read=67983056
-> Materialize (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 loops=7667805470)
Buffers: shared hit=13
-> Seq Scan on mynetworks mynetworks_1 (cost=0.00..1.01 rows=4 width=8) (actual time=0.006..0.007 rows=4 loops=13)
Buffers: shared hit=13
Planning time: 0.941 ms
Execution time: 2228345.171 ms
(48 rows)

With a work_mem at 6GB, I noticed that for the first 20 minutes the query was running, the i/o wait was much lower, hovering aroun 3% then it jumped 45% until almost the end of the query.

flowscompact and dstexterne are actually views. I use views to simplify query writing and to "abstract" queries that are use often in other queries. flowscompact is a view built on table flows (having about 590 million rows), it only keeps the most often used fields.
flows=# \d+ flowscompact;
View "public.flowscompact"
Column | Type | Modifiers | Storage | Description
-----------+--------------------------+-----------+---------+-------------
flow_id | bigint | | plain |
sysuptime | bigint | | plain |
exaddr | ip4 | | plain |
dpkts | integer | | plain |
doctets | bigint | | plain |
first | bigint | | plain |
last | bigint | | plain |
srcaddr | ip4 | | plain |
dstaddr | ip4 | | plain |
srcport | integer | | plain |
dstport | integer | | plain |
prot | smallint | | plain |
tos | smallint | | plain |
tcp_flags | smallint | | plain |
timestamp | timestamp with time zone | | plain |
View definition:
SELECT flowstimestamp.flow_id,
flowstimestamp.sysuptime,
flowstimestamp.exaddr,
flowstimestamp.dpkts,
flowstimestamp.doctets,
flowstimestamp.first,
flowstimestamp.last,
flowstimestamp.srcaddr,
flowstimestamp.dstaddr,
flowstimestamp.srcport,
flowstimestamp.dstport,
flowstimestamp.prot,
flowstimestamp.tos,
flowstimestamp.tcp_flags,
flowstimestamp."timestamp"
FROM flowstimestamp;
mynetworks is a table having one column and 4 rows; it contains a list of our network networks:
flows=# select * from mynetworks;
ipaddr
----------------
192.168.0.0/24
10.112.12.0/30
10.112.12.4/30
10.112.12.8/30
(4 row)
flows=# \d+ mynetworks;
Table "public.mynetworks"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+---------+--------------+-------------
ipaddr | ip4r | | plain | |
Indexes:
"mynetworks_ipaddr_idx" gist (ipaddr)
dstexterne is a view listing all the destination IPv4 addresses not inside our network; it has one column and 3.8 million rows.
flows=# \d+ dstexterne;
View "public.dstexterne"
Column | Type | Modifiers | Storage | Description
---------+------+-----------+---------+-------------
dstaddr | ip4 | | plain |
View definition:
SELECT DISTINCT flowscompact.dstaddr
FROM flowscompact
LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
WHERE mynetworks.ipaddr IS NULL;
Thanks!

Charles

Charles,

Don’t change temp_buffers.
Try to increase work_mem even more, say work_mem=’12GB’, because it’s still using disk for sorting (starting around 20^th minute as you noticed).
See if this:
“Sort Method: external merge Disk: 3049216kB”
goes away.
Igor

Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/

В списке pgsql-performance по дате отправления:

Предыдущее

От: Mark Kirkwood
Дата: 16 июля 2017 г., 05:48:08
Сообщение: Re: [PERFORM] Very poor read performance, query independent

Следующее

От: Charles Nadeau
Дата: 16 июля 2017 г., 15:22:00
Сообщение: Re: [PERFORM] Very poor read performance, query independent

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PERFORM] Very poor read performance, query independent

Предыдущее

Следующее