Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers
Дата
Msg-id CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers  (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>)
Список pgsql-hackers
Hi,

It looks like the logical replication subscribers are receiving the
quorum uncommitted transactions even before the synchronous (sync)
standbys. Most of the times it is okay, but it can be a problem if the
primary goes down/crashes (while the primary is in SyncRepWaitForLSN)
before the quorum commit is achieved (i.e. before the sync standbys
receive the committed txns from the primary) and the failover is to
happen on to the sync standby. The subscriber would have received the
quorum uncommitted txns whereas the sync standbys didn't. After the
failover, the new primary (the old sync standby) would be behind the
subscriber i.e. the subscriber will be seeing the data that the new
primary can't. Is there a way the subscriber can get back  to be in
sync with the new primary? In other words, can we reverse the effects
of the quorum uncommitted txns on the subscriber? Naive way is to do
it manually, but it doesn't seem to be elegant.

We have performed a small experiment to observe the above behaviour
with 1 primary, 1 sync standby and 1 subscriber:
1) Have a wait loop in SyncRepWaitForLSN (a temporary hack to
illustrate the standby receiving the txn a bit late or fail to
receive)
2) Insert data into a table on the primary
3) The primary waits i.e. the insert query hangs (because of the wait
loop hack ()) before the local txn is sent to the sync standby,
whereas the subscriber receives the inserted data.
4) If the primary crashes/goes down and unable to come up, if the
failover happens to sync standby (which didn't receive the data that
got inserted on tbe primary), the subscriber would see the data that
the sync standby can't.

This looks to be a problem. A possible solution is to let the
subscribers receive the txns only after the primary achieves quorum
commit (gets out of the SyncRepWaitForLSN or after all sync standbys
received the txns). The logical replication walsenders can wait until
the quorum commit is obtained and then can send the WAL. A new GUC can
be introduced to control this, default being the current behaviour.

Thoughts?

Thanks Satya (cc-ed) for the use-case and off-list discussion.

Regards,
Bharath Rupireddy.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: Optionally automatically disable logical replication subscriptions on error