Обсуждение: Postgres IO sweet spot

Поиск
Список
Период
Сортировка

Postgres IO sweet spot

От
Riaan Stander
Дата:
Good day

We host our own Postgres (v17) server on-prem as the backbone of our SaS 
application. It's a fairly busy OLTP application with a database per 
tenant strategy. This obviously does complicate our setup.
Our hosting platform is as follows:
3 x Host Servers running Microsoft Storage Spaces in a 3 way mirror
Ubuntu VM hosting Postgres

A few months ago we had some severe performance issues with lots of 
queries and writing operations just pending. After some deep 
investigation we started realizing that it was disk IO causing the 
issue. We used iostat and could see the write await was above 30ms and 
sometimes even spiking much higher. This was resolved by moving our 
backups (made with Veeam) from backing up the primary to a slave on 
other infrastructure. Our current happy state where clients are not 
experiencing issues is a iostat write await of 5ms and lower.

All was good for a few months until recently when this issue started 
again. This time it could not be the backups. We had various hardware 
vendors involved, but at some point it came to light that the Storage 
Spaces hardware are all mechanical disks with NVME only used for Storage 
Spaces journaling and caching. There are now some discussions of 
upgrading drives to SSD, but my concern is that this is not guaranteed 
to solve the issue. Especially with the 3 way mirror it seems all writes 
will go to the other hosts before returning. So latency is almost 
impossible to remove.

So now my question. I started running some IO tests using fio, 
pg_test_fsync & pg_test_timing. Before we spend days/months trying to 
tune Postgres settings I'm trying to get some definitive published 
information about what IO numbers I should expect when running plain 
hardware tests with Postgres completely out of the loop. I've seen some 
info about 1ms and less write latency is what you want for WAL. My logic 
says that if you have a stiffie drive for storage you can tune it, but 
you still have a stiffie drive.

These are the tests I've run so far
1. WAL-Style Latency Test (4K random sync writes)
fio --name=wal-latency --filename=$TESTDIR/fio_wal_test --size=2G 
--rw=randwrite --bs=4k --iodepth=1 --ioengine=libaio --direct=1 
--fsync=1 --runtime=60 --group_reporting

2. Random Read IOPS Test (index lookup simulation)
fio --name=index-read --filename=$TESTDIR/fio_index_test --size=8G 
--rw=randread --bs=4k --iodepth=32 --ioengine=libaio --direct=1 
--runtime=60 --group_reporting

3. Mixed OLTP Test (70% read / 30% write)
fio --name=oltp-mixed --filename=$TESTDIR/fio_oltp_mixed --size=8G 
--rw=randrw --rwmixread=70 --bs=8k --iodepth=32 --ioengine=libaio 
--direct=1 --runtime=60 --group_reporting

4. Checkpoint Burst Test (sequential write pressure)
fio --name=checkpoint-burst --filename=$TESTDIR/fio_checkpoint 
--size=20G --rw=write --bs=1M --iodepth=64 --ioengine=libaio --direct=1 
--runtime=60 --group_reporting

5. PostgreSQL fsync Code Path Test
pg_test_fsync -f $TESTDIR/pg_test_fsync

6. Timer / Scheduling Jitter Test
pg_test_timing -d 3

Regards
Riaan




Re: Postgres IO sweet spot

От
Riaan Stander
Дата:
Good day all

Just following up of there is any advice from the community. My original post was very long, but just wanted to paint the picture.
In summary I just want to find out if anybody has some concrete advice on storage devices that is acceptable for usage with Postgres, especially latency. I've highlighted some tests I've done, but I need to interpret the numbers correctly.
Any feedback on the following will help
  • Acceptable write IO latency
    • WAL
    • Data
    • Temp
    • ...
  • Acceptable read IO latency
  • Any other storage/drive related advice

Regards
Riaan


On 2026/02/11 01:13, Riaan Stander wrote:
Good day

We host our own Postgres (v17) server on-prem as the backbone of our SaS application. It's a fairly busy OLTP application with a database per tenant strategy. This obviously does complicate our setup.
Our hosting platform is as follows:
3 x Host Servers running Microsoft Storage Spaces in a 3 way mirror
Ubuntu VM hosting Postgres

A few months ago we had some severe performance issues with lots of queries and writing operations just pending. After some deep investigation we started realizing that it was disk IO causing the issue. We used iostat and could see the write await was above 30ms and sometimes even spiking much higher. This was resolved by moving our backups (made with Veeam) from backing up the primary to a slave on other infrastructure. Our current happy state where clients are not experiencing issues is a iostat write await of 5ms and lower.

All was good for a few months until recently when this issue started again. This time it could not be the backups. We had various hardware vendors involved, but at some point it came to light that the Storage Spaces hardware are all mechanical disks with NVME only used for Storage Spaces journaling and caching. There are now some discussions of upgrading drives to SSD, but my concern is that this is not guaranteed to solve the issue. Especially with the 3 way mirror it seems all writes will go to the other hosts before returning. So latency is almost impossible to remove.

So now my question. I started running some IO tests using fio, pg_test_fsync & pg_test_timing. Before we spend days/months trying to tune Postgres settings I'm trying to get some definitive published information about what IO numbers I should expect when running plain hardware tests with Postgres completely out of the loop. I've seen some info about 1ms and less write latency is what you want for WAL. My logic says that if you have a stiffie drive for storage you can tune it, but you still have a stiffie drive.

These are the tests I've run so far
1. WAL-Style Latency Test (4K random sync writes)
fio --name=wal-latency --filename=$TESTDIR/fio_wal_test --size=2G --rw=randwrite --bs=4k --iodepth=1 --ioengine=libaio --direct=1 --fsync=1 --runtime=60 --group_reporting

2. Random Read IOPS Test (index lookup simulation)
fio --name=index-read --filename=$TESTDIR/fio_index_test --size=8G --rw=randread --bs=4k --iodepth=32 --ioengine=libaio --direct=1 --runtime=60 --group_reporting

3. Mixed OLTP Test (70% read / 30% write)
fio --name=oltp-mixed --filename=$TESTDIR/fio_oltp_mixed --size=8G --rw=randrw --rwmixread=70 --bs=8k --iodepth=32 --ioengine=libaio --direct=1 --runtime=60 --group_reporting

4. Checkpoint Burst Test (sequential write pressure)
fio --name=checkpoint-burst --filename=$TESTDIR/fio_checkpoint --size=20G --rw=write --bs=1M --iodepth=64 --ioengine=libaio --direct=1 --runtime=60 --group_reporting

5. PostgreSQL fsync Code Path Test
pg_test_fsync -f $TESTDIR/pg_test_fsync

6. Timer / Scheduling Jitter Test
pg_test_timing -d 3

Regards
Riaan




Re: Postgres IO sweet spot

От
Greg Sabino Mullane
Дата:
On Tue, Feb 10, 2026 at 6:13 PM Riaan Stander <rstander@exa.co.za> wrote:
3 x Host Servers running Microsoft Storage Spaces in a 3 way mirror

That's an expensive way to provide some HA. What's the business requirement? How does that tie into Postgres? Might be able to do it in other ways.

but at some point it came to light that the Storage Spaces hardware are all mechanical disks with NVME only used for Storage
Spaces journaling and caching. There are now some discussions of upgrading drives to SSD, but my concern is that this is not guaranteed
to solve the issue. Especially with the 3 way mirror it seems all writes will go to the other hosts before returning. So latency is almost
impossible to remove.

Yikes! Yes, SSD would be a big win. It's orders of magnitude faster, and just removes so many problems.

So now my question. I started running some IO tests using fio, pg_test_fsync & pg_test_timing. Before we spend days/months trying to tune Postgres settings I'm trying to get some definitive published information about what IO numbers I should expect when running plain hardware tests with Postgres completely out of the loop.

Sorry, I have no numbers to provide you there, but I cannot imagine any amount of tuning is going to be as big a win as going to SSD.

Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: Postgres IO sweet spot

От
Riaan Stander
Дата:
That's an expensive way to provide some HA. What's the business requirement? How does that tie into Postgres? Might be able to do it in other ways.
We used to run a SAN shared between our host servers, but this was replaced with Storage Spaces. I think they don't trust Postgres native HA capabilities and want some hardware guarantee.

Yikes! Yes, SSD would be a big win. It's orders of magnitude faster, and just removes so many problems.
I assume it will help, but I fear however that the overhead with a 3 way mirror is not going to be solved with just adding SSD. I'm trying to get them to rather deploy direct attached NVME/SSD to each Host and then use PG HA from there.

Sorry, I have no numbers to provide you there, but I cannot imagine any amount of tuning is going to be as big a win as going to SSD.
It does take a lot of convincing and arguing though, so concrete number help get the point across.

Thanks for the response

Re: Postgres IO sweet spot

От
Merlin Moncure
Дата:
On Tue, Feb 17, 2026 at 4:42 PM Riaan Stander <rstander@exa.co.za> wrote:
That's an expensive way to provide some HA. What's the business requirement? How does that tie into Postgres? Might be able to do it in other ways.
We used to run a SAN shared between our host servers, but this was replaced with Storage Spaces. I think they don't trust Postgres native HA capabilities and want some hardware guarantee.

Yikes! Yes, SSD would be a big win. It's orders of magnitude faster, and just removes so many problems.
I assume it will help, but I fear however that the overhead with a 3 way mirror is not going to be solved with just adding SSD. I'm trying to get them to rather deploy direct attached NVME/SSD to each Host and then use PG HA from there.

Sorry, I have no numbers to provide you there, but I cannot imagine any amount of tuning is going to be as big a win as going to SSD.
It does take a lot of convincing and arguing though, so concrete number help get the point across.

Thanks for the response

Spinning disks+cache  was the most common configuration before SSD came along.  Burst performance is great but if you overwhelm the cache, write performance can fall off a cliff.  This sounds like exactly what is happening to you; moving backups off just bought you some time.  Direct attached SSD will completely smoke your current setup.

> I think they don't trust Postgres native HA capabilities and want some hardware guarantee.

What is this, 2005?  Properly configured HS/SR setups are incredibly robust and are the default configuration for amazon RDS and many, many other platforms.  Reading between the lines here, it sounds like your storage team bought overpriced garbage and is refusing to admit it's not getting the job done.  Postgres failover gets tested routinely across a vast array of systems, how many times has your exact configuration been tested?

Is your storage shared with other systems?  Do you have any pgbench numbers for reference?  What are your commit rates?  (see xact_commit in pg_stat_database, tracked over time)

merlin