Обсуждение: Lag clarification with Sync Replication

Поиск
Список
Период
Сортировка

Lag clarification with Sync Replication

От
Raj kumar
Дата:
I have made a synchronous replication setup with PG12.
During data loading of 200 gig in master node, I found that there is a lag that it shows on the slave. But, ideally for sync replication, lag should be always 0 right.
Because commit never happens in master before updating the slave.
 log_delay
------------
 871.549671
(1 row) log_delay
------------
 871.557246
(1 row) log_delay
-----------
         0
(1 row) log_delay
-----------
         0
(1 row)

Query Used for checking Standby Replication Lag:
psql -d perfbench -p 5432 -c "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"

Setting Used in PG12

 cat postgresql.conf |grep synchronous_

synchronous_commit = remote_write                # synchronization level;

synchronous_standby_names = 'ANY 1 (standby1,standby2)'        # standby servers that provide sync rep


cat postgresql.conf |grep primary_conn

primary_conninfo = 'user=edbrepl host= 10.12.11.101 port=5432 application_name=standby1 '


Thanks,
Raj Kumar

Re: Lag clarification with Sync Replication

От
Scott Ribe
Дата:
Your query is unrelated to whether or not the commit has completed in the master.




Re: Lag clarification with Sync Replication

От
Keith
Дата:


On Fri, May 22, 2020 at 12:36 AM Raj kumar <rajkumar820999@gmail.com> wrote:
I have made a synchronous replication setup with PG12.
During data loading of 200 gig in master node, I found that there is a lag that it shows on the slave. But, ideally for sync replication, lag should be always 0 right.
Because commit never happens in master before updating the slave.
 log_delay
------------
 871.549671
(1 row) log_delay
------------
 871.557246
(1 row) log_delay
-----------
         0
(1 row) log_delay
-----------
         0
(1 row)

Query Used for checking Standby Replication Lag:
psql -d perfbench -p 5432 -c "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"

Setting Used in PG12

 cat postgresql.conf |grep synchronous_

synchronous_commit = remote_write                # synchronization level;

synchronous_standby_names = 'ANY 1 (standby1,standby2)'        # standby servers that provide sync rep


cat postgresql.conf |grep primary_conn

primary_conninfo = 'user=edbrepl host= 10.12.11.101 port=5432 application_name=standby1 '


Thanks,
Raj Kumar


For synchronous replication, and any replication really, I would recommend monitoring byte lag from the primary. This is much more accurate and, if the count of replicas goes down from an expected number, you'll know it's been disconnected as well. Monitoring replication from the replica can be done, but it has caveats and shouldn't be the only way it's monitored

Wrote a short post on this a while ago. See relavant updates further down - https://www.keithf4.com/monitoring_streaming_slave_lag/

Re: Lag clarification with Sync Replication

От
Rui DeSousa
Дата:


On May 22, 2020, at 12:36 AM, Raj kumar <rajkumar820999@gmail.com> wrote:

. But, ideally for sync replication, lag should be always 0 right.

Incorrect.  Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica.  That means the transaction is written to WAL file on both primary and replica.  On the primary, the transaction is also visible to transactions with a later xmin.  On the replica, the transaction has been recorded in the WAL; it still needs to get applied to the database for it to become visible to read transactions.

Also, changes could occur faster on the primary where the replica could be lagging behind — either by lagging in receiving the updates or writing the information to the WAL.  What will happen is when the commit is iussed; the session will hang until replica catches up and writes the all the data need safely to the WAL.

I would look at pg_stat_replication view on the primary/upstream provider.  Here’s a query that make it easier.

Sent_lag — data that still needs to be sent to the replica
Flush_lag —data that still has not been fsync’d to WAL files yet
Replay_lag — data that still need to be applied to the database so that it can be queried

select pg_stat_replication.client_addr
  , pg_stat_replication.application_name
  , pg_stat_replication.sync_priority
  , pg_stat_replication.sync_state
  , pg_stat_replication.state
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), sent_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))
    end as sent_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), flush_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) 
    end as flush_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), replay_lsn)) 
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn))
    end as replay_lag
from pg_stat_replication
;

Re: Lag clarification with Sync Replication

От
Raj kumar
Дата:
Thanks Rui, Keith and Scott for nice explanation and blog :)

Thanks,
Raj


On Fri, May 22, 2020 at 10:18 PM Rui DeSousa <rui@crazybean.net> wrote:


On May 22, 2020, at 12:36 AM, Raj kumar <rajkumar820999@gmail.com> wrote:

. But, ideally for sync replication, lag should be always 0 right.

Incorrect.  Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica.  That means the transaction is written to WAL file on both primary and replica.  On the primary, the transaction is also visible to transactions with a later xmin.  On the replica, the transaction has been recorded in the WAL; it still needs to get applied to the database for it to become visible to read transactions.

Also, changes could occur faster on the primary where the replica could be lagging behind — either by lagging in receiving the updates or writing the information to the WAL.  What will happen is when the commit is iussed; the session will hang until replica catches up and writes the all the data need safely to the WAL.

I would look at pg_stat_replication view on the primary/upstream provider.  Here’s a query that make it easier.

Sent_lag — data that still needs to be sent to the replica
Flush_lag —data that still has not been fsync’d to WAL files yet
Replay_lag — data that still need to be applied to the database so that it can be queried

select pg_stat_replication.client_addr
  , pg_stat_replication.application_name
  , pg_stat_replication.sync_priority
  , pg_stat_replication.sync_state
  , pg_stat_replication.state
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), sent_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))
    end as sent_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), flush_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) 
    end as flush_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), replay_lsn)) 
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn))
    end as replay_lag
from pg_stat_replication
;

Re: Lag clarification with Sync Replication

От
Laurenz Albe
Дата:
On Fri, 2020-05-22 at 12:48 -0400, Rui DeSousa wrote:
> > On May 22, 2020, at 12:36 AM, Raj kumar <rajkumar820999@gmail.com> wrote:
> > 
> > . But, ideally for sync replication, lag should be always 0 right.
> 
> Incorrect.  Synchronous replication means that a commit will not return until it has been safely written to disk on
theprimary and the replica.  That means the transaction is written to WAL file on
 
> both primary and replica.  On the primary, the transaction is also visible to transactions with a later xmin.  On the
replica,the transaction has been recorded in the WAL; it still needs to get
 
> applied to the database for it to become visible to read transactions.

If you set "synchronous_commit = remote_apply", the commit will only return
when the change has been replayed on the synchronous standby server.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com