Обсуждение: How to prioritise walsender reading from pg_wal over WAL writes?
Hello, I've got a postgres master node that receives a lot of writes, WAL written at 100MB/sec or more at times. And when these load spikes happen streaming replication starts lagging. It looks like the lag happens on sending stage, and is limited by the master pg_wal partition throughput. It's an SSD RAID-1 but it was the same when it was an HDD RAID-1. I tried to prioritize reads using deadline scheduler, but even extreme values don't change the situation: iostat shows more bytes written than read, while the device is busy, often 90-100%. Writers connect via a transaction-pooling-mode pgbouncer, so I can tune the number of parallel connections. Replication works fine when I limit it to 3, which is quite low. It works for me so far, but looks really inflexible to me, as e.g. I'll have to allocate a separate pgbouncer server pool for less eager apps and for humans. I increased wal_buffers to 256MB hoping that it'll reduce the disk load, but it probably works only for initial lag accumulation, once the lag is there it's not going to help. ionice'ing walsender to best-effort -2 didn't help either Any ideas how to prioritize walsender reads over writes from wal writer and backends even if there are multiple quite active ones? The problem happens only occasionally, so if you ask for more details it may take some time to reply. Sorry about this. Best, Alex
If it helps, here's the details of the hardware config.
The controller is
the SSDs are
Current scheduler used is deadline.
Currently XFS is mounted without nobarriers, but I'm going to set that when there's high load next time to see how it affects throughput.
Properties of the array
The controller is
AVAGO MegaRAID SAS 9361-4i
,the SSDs are
INTEL SSDSC2KG960G8
(configured as a raid1).Current scheduler used is deadline.
Currently XFS is mounted without nobarriers, but I'm going to set that when there's high load next time to see how it affects throughput.
Properties of the array
VD6 Properties : ============== Strip Size = 256 KB Number of Blocks = 1874329600 VD has Emulated PD = Yes Span Depth = 1 Number of Drives Per Span = 2 Write Cache(initial setting) = WriteBack Disk Cache Policy = Disk's Default Encryption = None Data Protection = Disabled Active Operations = None Exposed to OS = Yes Creation Date = 28-10-2020 Creation Time = 05:17:16 PM Emulation type = default Cachebypass size = Cachebypass-64k Cachebypass Mode = Cachebypass Intelligent Is LD Ready for OS Requests = Yes SCSI NAA Id = 600605b00af03650272c641ca30c3196Best, Alex
On Wed, 2020-11-18 at 11:54 +0000, Alexey Bashtanov wrote: > If it helps, here's the details of the hardware config. > The controller is AVAGO MegaRAID SAS 9361-4i, > the SSDs are INTEL SSDSC2KG960G8 (configured as a raid1). > Current scheduler used is deadline. > Currently XFS is mounted without nobarriers, but I'm going to set that when there's high load next time to see how it affectsthroughput. > > Properties of the array > VD6 Properties : > ============== > Strip Size = 256 KB > Number of Blocks = 1874329600 > VD has Emulated PD = Yes > Span Depth = 1 > Number of Drives Per Span = 2 > Write Cache(initial setting) = WriteBack > Disk Cache Policy = Disk's Default > Encryption = None > Data Protection = Disabled > Active Operations = None > Exposed to OS = Yes > Creation Date = 28-10-2020 > Creation Time = 05:17:16 PM > Emulation type = default > Cachebypass size = Cachebypass-64k > Cachebypass Mode = Cachebypass Intelligent > Is LD Ready for OS Requests = Yes > SCSI NAA Id = 600605b00af03650272c641ca30c3196 Why?? WAL buffers has the most recent information, so that would result in unnecessary delay and I/O. You'd have to hack the code, but I wonder what leads you to this interesting requirement. Yours, Laurenz Albe -- +43-670-6056265 CYBERTEC PostgreSQL International GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
Sorry Laurenz, My reply did not get threaded appropriately. My original question was here: https://www.postgresql.org/message-id/a74d5732-60fd-d18b-05fd-7b2b97099f19%40imap.cc I'd like to prioritize walsender for replication not to lag too much. Otherwise, when I have load spikes on master, standby lags, sometimes by hundreds of gigabytes. Best, Alex
On Wed, 2020-11-18 at 12:15 +0000, Alexey Bashtanov wrote: > My reply did not get threaded appropriately. > My original question was here: > https://www.postgresql.org/message-id/a74d5732-60fd-d18b-05fd-7b2b97099f19%40imap.cc > I'd like to prioritize walsender for replication not to lag too much. > Otherwise, when I have load spikes on master, standby lags, sometimes by > hundreds of gigabytes. I would first determine where the bottleneck is. Is it really the walsender, or is it on the network or in the standby server's replay? Check the difference between "sent_lsn", "replay_lsn" from "pg_stat_replication" and pg_current_wal_lsn() on the primary. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
> I would first determine where the bottleneck is. > > Is it really the walsender, or is it on the network or in the standby server's replay? It is really the walsender, and it really is the performance of the WAL storage on the master. > Check the difference between "sent_lsn", "replay_lsn" from "pg_stat_replication" and > pg_current_wal_lsn() on the primary. Yes I've checked these numbers, the lagging one is sent_lsn. It doesn't look like it's hitting network capacity either. When we moved it to an NVMe as a short-term solution it worked fine. Best, Alex