Обсуждение: sync_standbys_defined and pg_stat_replication

Поиск
Список
Период
Сортировка

sync_standbys_defined and pg_stat_replication

От
Jeremy Schneider
Дата:
For failover to work correctly, if someone changes the GUC
synchronous_standby_names to enable sync replication, then we need to
understand the exact moment when backends will begin to block in order
to correctly determine when we can failover without data loss.

There's an older mailing list thread that discusses one aspect of this

https://www.postgresql.org/message-id/flat/CABrsG8j3kPD%2Bkbbsx_isEpFvAgaOBNGyGpsqSjQ6L8vwVUaZAQ%40mail.gmail.com

I've also gone through the code for SyncRepWaitForLSN() and worked
backwards to where the checkpointer sets sync_standbys_defined. But I
have a question which I couldn't answer so far.

It looks like sync_standbys_defined is only updated by the checkpointer
process. Is there a short period of time where the pg_stat_replication
view would show sync_state=sync and state=streaming, but the
checkpointer has not yet updated sync_standbys_defined?

I'm wondering if this is a race condition where COMMITs are not being
blocked for replication but external tools which rely on
pg_stat_replication would think it's safe to failover with zero data
loss?

-Jeremy



Re: sync_standbys_defined and pg_stat_replication

От
Jeremy Schneider
Дата:
On Mon, 6 Oct 2025 22:59:33 -0700
Jeremy Schneider <schneider@ardentperf.com> wrote:

> For failover to work correctly, if someone changes the GUC
> synchronous_standby_names to enable sync replication, then we need to
> understand the exact moment when backends will begin to block in order
> to correctly determine when we can failover without data loss.
> 
> There's an older mailing list thread that discusses one aspect of this
> 
> https://www.postgresql.org/message-id/flat/CABrsG8j3kPD%2Bkbbsx_isEpFvAgaOBNGyGpsqSjQ6L8vwVUaZAQ%40mail.gmail.com
> 
> I've also gone through the code for SyncRepWaitForLSN() and worked
> backwards to where the checkpointer sets sync_standbys_defined. But I
> have a question which I couldn't answer so far.
> 
> It looks like sync_standbys_defined is only updated by the
> checkpointer process. Is there a short period of time where the
> pg_stat_replication view would show sync_state=sync and
> state=streaming, but the checkpointer has not yet updated
> sync_standbys_defined?
> 
> I'm wondering if this is a race condition where COMMITs are not being
> blocked for replication but external tools which rely on
> pg_stat_replication would think it's safe to failover with zero data
> loss?

FYI - some more details on the background of my question are here

https://github.com/cloudnative-pg/cloudnative-pg/issues/8790

I'm running Jepsen tests of a new CNPG feature (quorum failover) and
Jepsen picked up data loss when I ran it in conjuction with CNPG's
"preferred" dataDurability setting and I'm theorizing it may be related
to this delay with SyncRepWaitForLSN() starting to block COMMITs. The 
"preferred durability" configuration is the equivalent to "Max
Availability" mode with Oracle Data Guard Broker; if anyone is curious I
have a table comparing Oracle modes to patroni/cnpg configs in this
blog:

https://ardentperf.com/2025/10/05/testing-cloudnativepg-preferred-data-durability/

-Jeremy



Re: sync_standbys_defined and pg_stat_replication

От
Ants Aasma
Дата:
On Tue, 7 Oct 2025 at 08:59, Jeremy Schneider <schneider@ardentperf.com> wrote:
> For failover to work correctly, if someone changes the GUC
> synchronous_standby_names to enable sync replication, then we need to
> understand the exact moment when backends will begin to block in order
> to correctly determine when we can failover without data loss.

There is an early out in SyncRepWaitForLSN() when
WalSndCtl->sync_standbys_status has SYNC_STANDBY_DEFINED unset. That
flag gets set by the checkpointer in
SyncRepUpdateSyncStandbysDefined() via CheckpointWriteDelay() among
other places. But only when it's not executing a fast checkpoint or
it's not behind on checkpoints.

In other words, synchronous_standby_names will not become effective
until checkpointer has some downtime. While this is a small problem on
its own, there is no way to check if this has happened or not.

For the config update getting delayed the fix seems simple - just do
the config update unconditionally. Patch attached.

For the other problem, my thinking is to provide a new function that
allows a user to check if synchronous replication is active.

Ideally this function would give other information also needed by
cluster managers. Specifically when a replica is removed from
synchronous standby names we would need still need to consider that
replica as a potential synchronous replica until a quorum matching the
current synchronous_standby_names setting overtakes the last LSN
confirmed by a replica matching the removed name.

To illustrate the situation where this is needed, consider s_s_n =
'ANY 1 (A B)'. While this setting is active we have to check latest
replicated LSN from both A and B to know which one to promote. Lets
say transaction X is replicated to A and confirmed, but not yet to B.
Now A is removed so s_s_n becomes 'ANY 1 (B)'. Based on this setting
it is always safe to promote B, but until B receives the LSN  that was
on primary when synchronous_standby_names was changed, it might not
have all the data. This one is possible to work around by checking the
relevant values from pg_stat_replication, but it would be nice to have
a neater interface.

My proposal is something like this:

postgres=# SELECT * FROM pg_sync_replication_status();
 is_active | synchronous_standby_names | has_quorum
-----------+---------------------------+------------
 t         | ANY 1 (A B)               | f
(1 row)

Thoughts?

Regards,
Ants Aasma

Вложения

Re: sync_standbys_defined and pg_stat_replication

От
Jeremy Schneider
Дата:
On Wed, 8 Oct 2025 20:38:53 +0300
Ants Aasma <ants.aasma@cybertec.at> wrote:

> 
> For the other problem, my thinking is to provide a new function that
> allows a user to check if synchronous replication is active.
> 
> Ideally this function would give other information also needed by
> cluster managers. Specifically when a replica is removed from
> synchronous standby names we would need still need to consider that
> replica as a potential synchronous replica until a quorum matching the
> current synchronous_standby_names setting overtakes the last LSN
> confirmed by a replica matching the removed name.
> 
> ...
> 
> My proposal is something like this:
> 
> postgres=# SELECT * FROM pg_sync_replication_status();
>  is_active | synchronous_standby_names | has_quorum
> -----------+---------------------------+------------
>  t         | ANY 1 (A B)               | f
> (1 row)

Would it make sense to also add a fourth column in this proposed new
function with the effective LSN from the last time
synchronous_standby_names was changed?

I'm also thinking about changes to the synchronous_commit setting; if we
wanted the highest level of correctness, should we also capture the LSN
of that change and reflect it in this view? I haven't thought this
through fully yet.

Should external tools use procs corresponding with synchronous_commit
setting for promotion decisions? (pg_last_wal_receive_lsn vs
pg_last_wal_replay_lsn) This also relates to Andrey Borodin's other
thread about adding direct access to the write LSN on replicas for
promotion decisions.

https://www.postgresql.org/message-id/flat/0B44E464-BA62-4056-9465-3320DD2D0302%40yandex-team.ru

-Jeremy