[HACKERS] measuring the impact of increasing WAL segment size

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема [HACKERS] measuring the impact of increasing WAL segment size
Дата
Msg-id 2c6ac18b-8094-7a42-114c-ad5e1708cd89@2ndquadrant.com
обсуждение исходный текст
Ответы Re: [HACKERS] measuring the impact of increasing WAL segment size  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

A few months ago there was a discussion about increasing the default WAL 
segment size [1]. For various reasons we ended up only allowing values 
up to 1GB for --with-wal-segsize in configure, one of them being the 
absence of sufficient data about how the performance impact.

I've promised to do some benchmarking to provide us some hard data in 
that thread, and that's what this post is about. The benchmarks I ended 
up doing are not as extensive as I originally proposed, though.

What I have tested:

* 4 different hardware configurations (both spinning rust and flash)
* 2 workloads (tpcb-like and simple-update from pgbench)
* 3 scales (50, 300 and 2000)
* flushing enabled/disabled

I was interested in the impact of the flushing added in PostgreSQL 9.5, 
so the flushing refers to *_flush_after GUCs. Enabled means "default" 
while "disabled" means everything set to 0.

For each combination of parameters I've done a single 4-hour pgbench 
run, which means about 30 days of runtime (22 days is just "clean" 
runtime without initialization etc.). This somewhat explains why I 
scaled down the range of workloads to test, etc.

I've been collecting various database/system metrics (sar, pg_stat_*, 
...), in total it's about 10GB of data compressed, and likely more than 
100GB uncompressed. This post only presents some basic summary and 
charts, let me know if you're interested in the raw data. Additional 
charts are available at [2].

I've mentioned I've done the same tests on 4 different configurations, 
so here are some basic details:

1) i5-2500k-ssd-raid

     CPU: Intel i5-2500k (4 cores, released 2011)
     RAM: 8GB
     storage: 6 x Intel S3700 100GB SSD (RAID0, swraid)
     kernel: 4.10

2) xeon-e5-2620v4-nvme

     CPU: 2x Intel Xeon e5-2620 v4 (8/16 cores, released 2016)
     RAM: 32GB
     storage: Intel 750 SSD (NVMe, 400GB)
     kernel: 4.10

3) xeon-e5-2620v4-sata-raid

     CPU: 2x Intel Xeon e5-2620 v4 (8/16 cores, released 2016)
     RAM: 32GB
     storage: 3 x 7.2k SATA drives
     kernel: 4.10

4) xeon-e5450-sas-raid

     CPU: 2x Intel Xeon E5450 (4 cores, released 2006)
     RAM: 16GB
     storage: 6x 10k 146GB SAS (RAID 10, HP P400 with 512MB BBWC)
     kernel: 4.10

For all configuration everything (WAL, data) was placed on a single 
filesystem. Mostly for simplicity, but also because it shows the "worst 
case" impact.


Most of the charts is uninteresting, as there is almost no impact of 
either WAL segment size changes or (disabling the) flushing. The changes 
in average tps are typically within 1-2%, and if there's a trend it 
usually shows tiny improvement for larger WAL segments.

This is true in particular for all tests on configurations with flash 
storage, simple-update-50-xeon-e5-2620v4-nvme.eps is a nice example of 
such boring chart.


Now, let's look at the interesting charts ...

1) scales 300/2000 on SAS RAID (RAID controller with 512MB write cache)

     * simple-update-2000-xeon-e5450-sas-raid.eps
     * simple-update-300-xeon-e5450-sas-raid.eps
     * tpcb-like-2000-xeon-e5450-sas-raid.eps

I'm not sure what's happening on those two charts, but I suspect it's 
mostly a case of overloaded storage, as the machine has only 16GB of 
RAM, so scale 2000 does not fit into RAM (and the smaller scales do 
behave much more reasonably on this hardware).

2) pretty much everything on the software SATA RAID

     * simple-update-2000-xeon-e5-2620v4-sata-raid.eps
     * simple-update-300-xeon-e5-2620v4-sata-raid.eps
     * simple-update-50-xeon-e5-2620v4-sata-raid.eps
     * tpcb-like-2000-xeon-e5-2620v4-sata-raid.eps
     * tpcb-like-300-xeon-e5-2620v4-sata-raid.eps
     * tpcb-like-50-xeon-e5-2620v4-sata-raid.eps

I don't dare to make absolute judgments based on just one pgbench run, 
but the basic trends seem to be fairly clear:

a) The flushing has significant impact on average tps, in some cases 
reducing the throughput by 30-40%.

The primary reason for this is of course that the regular flushing 
significantly increases the number of fsyncs, which on SATA RAID has 
serious impact.

Granted - this chart does not show latency, so it's not a complete 
picture. Also, if you care about raw OLTP performance you're probably 
already running on flash, where this does not seem to be an issue. It's 
also not an issue if you have RAID controller with write cache, which 
can absorb those writes. And of course, those machines have reasonable 
dirty_background_bytes values (like 64MB or less).

But it's something to be aware of, watch for, and perhaps consider 
disabling the flushing (and instead tuning page cache eviction).

b) The "flushing enabled" case seems to be much more sensitive to WAL 
segment size increases. It seems the throughput drops a bit (by 10-20%), 
for some segment sizes, and then recovers. The behavior seems to be 
smooth (not just a sudden drop for one segment size) but the value 
varies depending on the scale, test type (tpc-b /simple-update).

There is almost no such impact in the "flushing disabled" cases.

Similarly to the SAS RAID config (with 512MB write cache on the RAID 
controller), the largest scale behaves a bit unpredictably. I assume the 
reasons are the same - overloaded spinning rust.

regards


[1] 
https://www.postgresql.org/message-id/flat/CA%2BTgmoZctR8Sqvgxp2-_fncsgvQSCaYZJ7e%2BoF8XnNLnJwOQ8Q%40mail.gmail.com

[2] https://github.com/tvondra/wal-segment-size-tests

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marko Tiikkaja
Дата:
Сообщение: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]
Следующее
От: Masahiko Sawada
Дата:
Сообщение: [HACKERS] Explicit relation name in VACUUM VERBOSE log