Обсуждение: Streaming Replication replay lag

Поиск
Список
Период
Сортировка

Streaming Replication replay lag

От
Nikhil Shetty
Дата:
Hi,

We have one Primary and three Standby Postgresql setup. Two standby in DC and one standby in DR.

We have configured streaming replication.When there are a lot of write transactions in the master, we observed that DR lags behind the master for 5-6 hours. We initially thought this could be because of the slow network between DC and DR and this is normal but when we saw the replay_lsn and receive_lsn , they had a huge difference.

DR standby was receiving the WAL's in time and there was no delay, the delay was in replaying the lag. Can anybody point me to where I can start with the investigation?
DC (Master and 2 Standby) and DR(Standby) have the same setup in terms of storage and some database configuration parameters but DR has a little less CPU and RAM.

Attaching replay and receive wal difference.

Вложения

Re: Streaming Replication replay lag

От
MichaelDBA
Дата:
Check replication settings:
max_standby_archive_delay
max_standby_streaming_delay
hot_standby_feedback

These can delay wals being applied to the replica while queries are 
still running.

Regards,
Michael Vitale

Nikhil Shetty wrote on 11/24/2020 9:24 AM:
> Hi,
>
> We have one Primary and three Standby Postgresql setup. Two standby in 
> DC and one standby in DR.
>
> We have configured streaming replication.When there are a lot of write 
> transactions in the master, we observed that DR lags behind the master 
> for 5-6 hours. We initially thought this could be because of the slow 
> network between DC and DR and this is normal but when we saw the 
> replay_lsn and receive_lsn , they had a huge difference.
>
> DR standby was receiving the WAL's in time and there was no delay, the 
> delay was in replaying the lag. Can anybody point me to where I can 
> start with the investigation?
> DC (Master and 2 Standby) and DR(Standby) have the same setup in terms 
> of storage and some database configuration parameters but DR has a 
> little less CPU and RAM.
>
> Attaching replay and receive wal difference.
>




Re: Streaming Replication replay lag

От
Laurenz Albe
Дата:
On Tue, 2020-11-24 at 19:54 +0530, Nikhil Shetty wrote:
> We have one Primary and three Standby Postgresql setup. Two standby in DC and one standby in DR.
> 
> We have configured streaming replication.When there are a lot of write transactions
>  in the master, we observed that DR lags behind the master for 5-6 hours. We initially
>  thought this could be because of the slow network between DC and DR and this is normal
>  but when we saw the replay_lsn and receive_lsn , they had a huge difference.
> 
> DR standby was receiving the WAL's in time and there was no delay, the delay was in
>  replaying the lag. Can anybody point me to where I can start with the investigation?
> DC (Master and 2 Standby) and DR(Standby) have the same setup in terms of storage
>  and some database configuration parameters but DR has a little less CPU and RAM.

If there is a delay replaying WAL, the cause is probably a replication conflict.

Set "hot_standby = off" on the standby to avoid the problem totally.

If you want queries on the standby, set "max_standby_streaming_delay" to 0
to keep replication from falling behind.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Streaming Replication replay lag

От
Nikhil Shetty
Дата:
Hi Michael,

Sure, I will check on this parameter but i don't think there are any queries running on th DR. If there are no queries running on the server then will these parameters make any difference?

Thanks and Regards,
Nikhil

On Tue, Nov 24, 2020 at 8:08 PM MichaelDBA <MichaelDBA@sqlexec.com> wrote:
Check replication settings:
max_standby_archive_delay
max_standby_streaming_delay
hot_standby_feedback

These can delay wals being applied to the replica while queries are
still running.

Regards,
Michael Vitale

Nikhil Shetty wrote on 11/24/2020 9:24 AM:
> Hi,
>
> We have one Primary and three Standby Postgresql setup. Two standby in
> DC and one standby in DR.
>
> We have configured streaming replication.When there are a lot of write
> transactions in the master, we observed that DR lags behind the master
> for 5-6 hours. We initially thought this could be because of the slow
> network between DC and DR and this is normal but when we saw the
> replay_lsn and receive_lsn , they had a huge difference.
>
> DR standby was receiving the WAL's in time and there was no delay, the
> delay was in replaying the lag. Can anybody point me to where I can
> start with the investigation?
> DC (Master and 2 Standby) and DR(Standby) have the same setup in terms
> of storage and some database configuration parameters but DR has a
> little less CPU and RAM.
>
> Attaching replay and receive wal difference.
>

Re: Streaming Replication replay lag

От
Nikhil Shetty
Дата:
Hi Laurenz,

We don't have any queries running on the DR server so setting "hot_standby=off" can be done and we will observe the WAL replay but ss there a way to identify that replay lag is because of the queries running on DR. I don't see any long running queries in pg_stat_activity to suggest that queries are not allowing the replay to happen.

Thanks & Regards,
Nikhil

On Tue, Nov 24, 2020 at 9:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2020-11-24 at 19:54 +0530, Nikhil Shetty wrote:
> We have one Primary and three Standby Postgresql setup. Two standby in DC and one standby in DR.
>
> We have configured streaming replication.When there are a lot of write transactions
>  in the master, we observed that DR lags behind the master for 5-6 hours. We initially
>  thought this could be because of the slow network between DC and DR and this is normal
>  but when we saw the replay_lsn and receive_lsn , they had a huge difference.
>
> DR standby was receiving the WAL's in time and there was no delay, the delay was in
>  replaying the lag. Can anybody point me to where I can start with the investigation?
> DC (Master and 2 Standby) and DR(Standby) have the same setup in terms of storage
>  and some database configuration parameters but DR has a little less CPU and RAM.

If there is a delay replaying WAL, the cause is probably a replication conflict.

Set "hot_standby = off" on the standby to avoid the problem totally.

If you want queries on the standby, set "max_standby_streaming_delay" to 0
to keep replication from falling behind.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: Streaming Replication replay lag

От
Laurenz Albe
Дата:
On Wed, 2020-11-25 at 11:02 +0530, Nikhil Shetty wrote:
> 
> On Tue, Nov 24, 2020 at 9:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2020-11-24 at 19:54 +0530, Nikhil Shetty wrote:
> > > We have one Primary and three Standby Postgresql setup. Two standby in DC and one standby in DR.
> > > 
> > > We have configured streaming replication.When there are a lot of write transactions
> > >  in the master, we observed that DR lags behind the master for 5-6 hours. We initially
> > >  thought this could be because of the slow network between DC and DR and this is normal
> > >  but when we saw the replay_lsn and receive_lsn , they had a huge difference.
> > > 
> > > DR standby was receiving the WAL's in time and there was no delay, the delay was in
> > >  replaying the lag. Can anybody point me to where I can start with the investigation?
> > > DC (Master and 2 Standby) and DR(Standby) have the same setup in terms of storage
> > >  and some database configuration parameters but DR has a little less CPU and RAM.
> > 
> > If there is a delay replaying WAL, the cause is probably a replication conflict.
> > 
> > Set "hot_standby = off" on the standby to avoid the problem totally.
> > 
> > If you want queries on the standby, set "max_standby_streaming_delay" to 0
> > to keep replication from falling behind.
>
> We don't have any queries running on the DR server so setting "hot_standby=off"
> can be done and we will observe the WAL replay but ss there a way to identify
> that replay lag is because of the queries running on DR. I don't see any long
> running queries in pg_stat_activity to
> suggest that queries are not allowing the replay to happen.

You could look for conflicts in "pg_stat_database" on the standby.

If that is not the cause, perhaps the network is too slow for the amount of
WAL, or the standby cannot replay fast enough (the startup process is a
single thread).

Yours,
Laurenz Albe




Re: Streaming Replication replay lag

От
Nikhil Shetty
Дата:
Hi Laurenz,

Thank you ,we  will monitor this.

Thanks and Regards,
Nikhil

On Wed, Nov 25, 2020 at 8:21 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2020-11-25 at 11:02 +0530, Nikhil Shetty wrote:
>
> On Tue, Nov 24, 2020 at 9:35 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Tue, 2020-11-24 at 19:54 +0530, Nikhil Shetty wrote:
> > > We have one Primary and three Standby Postgresql setup. Two standby in DC and one standby in DR.
> > >
> > > We have configured streaming replication.When there are a lot of write transactions
> > >  in the master, we observed that DR lags behind the master for 5-6 hours. We initially
> > >  thought this could be because of the slow network between DC and DR and this is normal
> > >  but when we saw the replay_lsn and receive_lsn , they had a huge difference.
> > >
> > > DR standby was receiving the WAL's in time and there was no delay, the delay was in
> > >  replaying the lag. Can anybody point me to where I can start with the investigation?
> > > DC (Master and 2 Standby) and DR(Standby) have the same setup in terms of storage
> > >  and some database configuration parameters but DR has a little less CPU and RAM.
> >
> > If there is a delay replaying WAL, the cause is probably a replication conflict.
> >
> > Set "hot_standby = off" on the standby to avoid the problem totally.
> >
> > If you want queries on the standby, set "max_standby_streaming_delay" to 0
> > to keep replication from falling behind.
>
> We don't have any queries running on the DR server so setting "hot_standby=off"
> can be done and we will observe the WAL replay but ss there a way to identify
> that replay lag is because of the queries running on DR. I don't see any long
> running queries in pg_stat_activity to
> suggest that queries are not allowing the replay to happen.

You could look for conflicts in "pg_stat_database" on the standby.

If that is not the cause, perhaps the network is too slow for the amount of
WAL, or the standby cannot replay fast enough (the startup process is a
single thread).

Yours,
Laurenz Albe