Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

Поиск
Список
Период
Сортировка
От SATYANARAYANA NARLAPURAM
Тема Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers
Дата
Msg-id CAHg+QDc-3bPT52yqB-_J-_8bc6DzpxVVzvQMA99XV_0tqOR9wg@mail.gmail.com
обсуждение исходный текст
Ответ на Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers  (Jeff Davis <pgsql@j-davis.com>)
Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
Consider a cluster formation where we have a Primary(P), Sync Replica(S1), and multiple async replicas for disaster recovery and read scaling (within the region and outside the region). In this setup, S1 is the preferred failover target in an event of the primary failure. When a transaction is committed on the primary, it is not acknowledged to the client until the primary gets an acknowledgment from the sync standby that the WAL is flushed to the disk (assume synchrnous_commit configuration is remote_flush). However, walsenders corresponds to the async replica on the primary don't wait for the flush acknowledgment from the primary and send the WAL to the async standbys (also any logical replication/decoding clients). So it is possible for the async replicas and logical client ahead of the sync replica. If a failover is initiated in such a scenario, to bring the formation into a healthy state we have to either
  1.  run the pg_rewind on the async replicas for them to reconnect with the new primary or
  2. collect the latest WAL across the replicas and feed the standby. 
Both these operations are involved, error prone, and can cause multiple minutes of downtime if done manually. In addition, there is a window where the async replicas can show the data that was neither acknowledged to the client nor committed on standby. Logical clients if they are ahead may need to reseed the data as no easy rewind option for them.

I would like to propose a GUC send_Wal_after_quorum_committed which when set to ON, walsenders corresponds to async standbys and logical replication workers wait until the LSN is quorum committed on the primary before sending it to the standby. This not only simplifies the post failover steps but avoids unnecessary downtime for the async replicas. Thoughts?

Thanks,
Satya




On Sun, Dec 5, 2021 at 8:35 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Hi,

It looks like the logical replication subscribers are receiving the
quorum uncommitted transactions even before the synchronous (sync)
standbys. Most of the times it is okay, but it can be a problem if the
primary goes down/crashes (while the primary is in SyncRepWaitForLSN)
before the quorum commit is achieved (i.e. before the sync standbys
receive the committed txns from the primary) and the failover is to
happen on to the sync standby. The subscriber would have received the
quorum uncommitted txns whereas the sync standbys didn't. After the
failover, the new primary (the old sync standby) would be behind the
subscriber i.e. the subscriber will be seeing the data that the new
primary can't. Is there a way the subscriber can get back  to be in
sync with the new primary? In other words, can we reverse the effects
of the quorum uncommitted txns on the subscriber? Naive way is to do
it manually, but it doesn't seem to be elegant.

We have performed a small experiment to observe the above behaviour
with 1 primary, 1 sync standby and 1 subscriber:
1) Have a wait loop in SyncRepWaitForLSN (a temporary hack to
illustrate the standby receiving the txn a bit late or fail to
receive)
2) Insert data into a table on the primary
3) The primary waits i.e. the insert query hangs (because of the wait
loop hack ()) before the local txn is sent to the sync standby,
whereas the subscriber receives the inserted data.
4) If the primary crashes/goes down and unable to come up, if the
failover happens to sync standby (which didn't receive the data that
got inserted on tbe primary), the subscriber would see the data that
the sync standby can't.

This looks to be a problem. A possible solution is to let the
subscribers receive the txns only after the primary achieves quorum
commit (gets out of the SyncRepWaitForLSN or after all sync standbys
received the txns). The logical replication walsenders can wait until
the quorum commit is obtained and then can send the WAL. A new GUC can
be introduced to control this, default being the current behaviour.

Thoughts?

Thanks Satya (cc-ed) for the use-case and off-list discussion.

Regards,
Bharath Rupireddy.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: pl/pgsql feature request: shorthand for argument and local variable references
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Make relfile tombstone files conditional on WAL level