Обсуждение: sync_standbys_defined and pg_stat_replication
For failover to work correctly, if someone changes the GUC synchronous_standby_names to enable sync replication, then we need to understand the exact moment when backends will begin to block in order to correctly determine when we can failover without data loss. There's an older mailing list thread that discusses one aspect of this https://www.postgresql.org/message-id/flat/CABrsG8j3kPD%2Bkbbsx_isEpFvAgaOBNGyGpsqSjQ6L8vwVUaZAQ%40mail.gmail.com I've also gone through the code for SyncRepWaitForLSN() and worked backwards to where the checkpointer sets sync_standbys_defined. But I have a question which I couldn't answer so far. It looks like sync_standbys_defined is only updated by the checkpointer process. Is there a short period of time where the pg_stat_replication view would show sync_state=sync and state=streaming, but the checkpointer has not yet updated sync_standbys_defined? I'm wondering if this is a race condition where COMMITs are not being blocked for replication but external tools which rely on pg_stat_replication would think it's safe to failover with zero data loss? -Jeremy
On Mon, 6 Oct 2025 22:59:33 -0700 Jeremy Schneider <schneider@ardentperf.com> wrote: > For failover to work correctly, if someone changes the GUC > synchronous_standby_names to enable sync replication, then we need to > understand the exact moment when backends will begin to block in order > to correctly determine when we can failover without data loss. > > There's an older mailing list thread that discusses one aspect of this > > https://www.postgresql.org/message-id/flat/CABrsG8j3kPD%2Bkbbsx_isEpFvAgaOBNGyGpsqSjQ6L8vwVUaZAQ%40mail.gmail.com > > I've also gone through the code for SyncRepWaitForLSN() and worked > backwards to where the checkpointer sets sync_standbys_defined. But I > have a question which I couldn't answer so far. > > It looks like sync_standbys_defined is only updated by the > checkpointer process. Is there a short period of time where the > pg_stat_replication view would show sync_state=sync and > state=streaming, but the checkpointer has not yet updated > sync_standbys_defined? > > I'm wondering if this is a race condition where COMMITs are not being > blocked for replication but external tools which rely on > pg_stat_replication would think it's safe to failover with zero data > loss? FYI - some more details on the background of my question are here https://github.com/cloudnative-pg/cloudnative-pg/issues/8790 I'm running Jepsen tests of a new CNPG feature (quorum failover) and Jepsen picked up data loss when I ran it in conjuction with CNPG's "preferred" dataDurability setting and I'm theorizing it may be related to this delay with SyncRepWaitForLSN() starting to block COMMITs. The "preferred durability" configuration is the equivalent to "Max Availability" mode with Oracle Data Guard Broker; if anyone is curious I have a table comparing Oracle modes to patroni/cnpg configs in this blog: https://ardentperf.com/2025/10/05/testing-cloudnativepg-preferred-data-durability/ -Jeremy
On Tue, 7 Oct 2025 at 08:59, Jeremy Schneider <schneider@ardentperf.com> wrote: > For failover to work correctly, if someone changes the GUC > synchronous_standby_names to enable sync replication, then we need to > understand the exact moment when backends will begin to block in order > to correctly determine when we can failover without data loss. There is an early out in SyncRepWaitForLSN() when WalSndCtl->sync_standbys_status has SYNC_STANDBY_DEFINED unset. That flag gets set by the checkpointer in SyncRepUpdateSyncStandbysDefined() via CheckpointWriteDelay() among other places. But only when it's not executing a fast checkpoint or it's not behind on checkpoints. In other words, synchronous_standby_names will not become effective until checkpointer has some downtime. While this is a small problem on its own, there is no way to check if this has happened or not. For the config update getting delayed the fix seems simple - just do the config update unconditionally. Patch attached. For the other problem, my thinking is to provide a new function that allows a user to check if synchronous replication is active. Ideally this function would give other information also needed by cluster managers. Specifically when a replica is removed from synchronous standby names we would need still need to consider that replica as a potential synchronous replica until a quorum matching the current synchronous_standby_names setting overtakes the last LSN confirmed by a replica matching the removed name. To illustrate the situation where this is needed, consider s_s_n = 'ANY 1 (A B)'. While this setting is active we have to check latest replicated LSN from both A and B to know which one to promote. Lets say transaction X is replicated to A and confirmed, but not yet to B. Now A is removed so s_s_n becomes 'ANY 1 (B)'. Based on this setting it is always safe to promote B, but until B receives the LSN that was on primary when synchronous_standby_names was changed, it might not have all the data. This one is possible to work around by checking the relevant values from pg_stat_replication, but it would be nice to have a neater interface. My proposal is something like this: postgres=# SELECT * FROM pg_sync_replication_status(); is_active | synchronous_standby_names | has_quorum -----------+---------------------------+------------ t | ANY 1 (A B) | f (1 row) Thoughts? Regards, Ants Aasma
Вложения
On Wed, 8 Oct 2025 20:38:53 +0300 Ants Aasma <ants.aasma@cybertec.at> wrote: > > For the other problem, my thinking is to provide a new function that > allows a user to check if synchronous replication is active. > > Ideally this function would give other information also needed by > cluster managers. Specifically when a replica is removed from > synchronous standby names we would need still need to consider that > replica as a potential synchronous replica until a quorum matching the > current synchronous_standby_names setting overtakes the last LSN > confirmed by a replica matching the removed name. > > ... > > My proposal is something like this: > > postgres=# SELECT * FROM pg_sync_replication_status(); > is_active | synchronous_standby_names | has_quorum > -----------+---------------------------+------------ > t | ANY 1 (A B) | f > (1 row) Would it make sense to also add a fourth column in this proposed new function with the effective LSN from the last time synchronous_standby_names was changed? I'm also thinking about changes to the synchronous_commit setting; if we wanted the highest level of correctness, should we also capture the LSN of that change and reflect it in this view? I haven't thought this through fully yet. Should external tools use procs corresponding with synchronous_commit setting for promotion decisions? (pg_last_wal_receive_lsn vs pg_last_wal_replay_lsn) This also relates to Andrey Borodin's other thread about adding direct access to the write LSN on replicas for promotion decisions. https://www.postgresql.org/message-id/flat/0B44E464-BA62-4056-9465-3320DD2D0302%40yandex-team.ru -Jeremy