Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers
Дата
Msg-id 20220111.173027.655878819168411223.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers  (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>)
Список pgsql-hackers
At Fri, 7 Jan 2022 09:44:15 -0800, SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote in 
> On Fri, Jan 7, 2022 at 12:27 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> > One is to serialize WAL sending (of course it is unacceptable at all)
> > or aotehr is to send WAL to all standbys at once then make the
> > decision after making sure receiving replies from all standbys (this
> > is no longer quorum commit in another sense..)
> >
> 
> There is no need to serialize sending the WAL among sync standbys. The only
> serialization required is first to all the sync replicas and then to sync
> replicas if any. Once an LSN is quorum committed, no failover subsystem
> initiates an automatic failover such that the LSN is lost (data loss)

Sync standbys on PostgreSQL is ex post facto. When a certain set of
standbys have first reported catching-up for a commit, they are called
"sync standbys".

We can maintain a fixed set of sync standbys based on the set of
sync-standbys at a past commits, but that implies performance
degradation even if not a single standby is gone.

If we send WAL only to the fixed-set of sync standbys, when any of the
standbys is gone, the primary is forced to wait until some timeout
expires.  The same commit would finish immediately if WAL had been
sent also to out-of-quorum standbys.

> > So I'm afraid that there's no sensible solution to avoid the
> > hiding-forerunner problem on quorum commit.
> 
> Could you elaborate on the problem here?

If a primary have received response for LSN=X from N standbys, that
fact doesn't guarantee that none of the other standbys reached the
same LSN.  If one of the yet-unresponded standbys already reached
LSN=X+10 but its response does not arrived to the primary for some
reasons, the true-fastest standby is hiding from primary.

Even if the primary examines the responses from all standbys, it is
uncertain if the responses reflect the truly current state of the
standbys.  Thus if we want to guarantee that no unresponded standby is
going beyond LSN=X, there's no means other than we refrain from
sending WAL beyond X. In that case, we need to serialize the period
from WAL-sending to response-reception, which would lead to critical
performance degradation.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes