Re: WAL replay is too slow on secondary server

Поиск
Список
Период
Сортировка
От OMPRAKASH SAHU
Тема Re: WAL replay is too slow on secondary server
Дата
Msg-id CAOZWJqNR3dxnwn+HGPszQB8BY67_E=eoa7SzArL=t=PMOtUAMQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WAL replay is too slow on secondary server  (Shubhang Joshi <shubhangjoshi2405@gmail.com>)
Список pgsql-admin
Hi Everyone,

Thankyou for the suggestions.

I have changed few things from DB side on secondary only till yesterday it seems fine I will be monitoring it further 

Below are the changes:

wal_decode_buffer_size
maintenance_io_concurrency
bgwriter_delay

I checked with AWS support as well if micro bursting had happening but allocation is enough as per them.


Regards,
OM




On Fri, 31 Oct 2025, 09:54 Shubhang Joshi, <shubhangjoshi2405@gmail.com> wrote:

Hi OM,
Hi Laurenz,

Thank you for your insights.

I apologize for my previous suggestion regarding network speed; upon further review, it was not the correct cause in this scenario.

Based on the current observations and system metrics, the accumulation of WAL on the standby server points to disk I/O limitations during replay—not network speed. CPU and RAM usage remain low, and WAL traffic is reaching the replica without delay, but replay/apply on disk is slow.

The root cause appears to be disk subsystem performance and the single-threaded nature of WAL replay in PostgreSQL recovery. Optimizing disk throughput or reconfiguring memory may help, but network latency does not seem to be affecting this scenario.

Regards,
Shubhang


On Thu, 30 Oct 2025 at 17:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:
> On Thu, 30 Oct, 2025, 10:07 am OMPRAKASH SAHU, <sahuop2121@gmail.com> wrote:
> > We have a postgresql cluster setup using patroni.
> > The DB is being used for heavy transactional application, now the problem is that on replica server WAL replay is too slow.
> > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS volume of wal directory and 10k &800 on data directory.
> >
> > but the WAL is being accumulated on the replica as usual and applying wal is having no improvement.
>
> Please check the network speed — we faced a similar issue earlier, and it turned out to be related to network performance.
> Kindly verify the network latency with your network team as well.

If WAL is piling up on the standby, how can network speed be the problem?

Yours,
Laurenz Albe

В списке pgsql-admin по дате отправления: