Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
От | Bharath Rupireddy |
---|---|
Тема | Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) |
Дата | |
Msg-id | CALj2ACVUa8WddVDS20QmVKNwTbeOQqy4zy59NPzh8NnLipYZGw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>) |
Ответы |
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
(Nathan Bossart <nathandbossart@gmail.com>)
|
Список | pgsql-hackers |
On Thu, Jan 6, 2022 at 1:29 PM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > Consider a cluster formation where we have a Primary(P), Sync Replica(S1), and multiple async replicas for disaster recoveryand read scaling (within the region and outside the region). In this setup, S1 is the preferred failover target inan event of the primary failure. When a transaction is committed on the primary, it is not acknowledged to the client untilthe primary gets an acknowledgment from the sync standby that the WAL is flushed to the disk (assume synchrnous_commitconfiguration is remote_flush). However, walsenders corresponds to the async replica on the primary don'twait for the flush acknowledgment from the primary and send the WAL to the async standbys (also any logical replication/decodingclients). So it is possible for the async replicas and logical client ahead of the sync replica. If afailover is initiated in such a scenario, to bring the formation into a healthy state we have to either > > run the pg_rewind on the async replicas for them to reconnect with the new primary or > collect the latest WAL across the replicas and feed the standby. > > Both these operations are involved, error prone, and can cause multiple minutes of downtime if done manually. In addition,there is a window where the async replicas can show the data that was neither acknowledged to the client nor committedon standby. Logical clients if they are ahead may need to reseed the data as no easy rewind option for them. > > I would like to propose a GUC send_Wal_after_quorum_committed which when set to ON, walsenders corresponds to async standbysand logical replication workers wait until the LSN is quorum committed on the primary before sending it to the standby.This not only simplifies the post failover steps but avoids unnecessary downtime for the async replicas. Thoughts? Thanks Satya and others for the inputs. Here's the v1 patch that basically allows async wal senders to wait until the sync standbys report their flush lsn back to the primary. Please let me know your thoughts. I've done pgbench testing to see if the patch causes any problems. I ran tests two times, there isn't much difference in the txns per seconds (tps), although there's a delay in the async standby receiving the WAL, after all, that's the feature we are pursuing. [1] HEAD or WITHOUT PATCH: ./pgbench -c 10 -t 500 -P 10 testdb transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 1 number of transactions per client: 500 number of transactions actually processed: 5000/5000 latency average = 247.395 ms latency stddev = 74.409 ms initial connection time = 13.622 ms tps = 39.713114 (without initial connection time) PATCH: ./pgbench -c 10 -t 500 -P 10 testdb transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 1 number of transactions per client: 500 number of transactions actually processed: 5000/5000 latency average = 251.757 ms latency stddev = 72.846 ms initial connection time = 13.025 ms tps = 39.315862 (without initial connection time) TEST SETUP: primary in region 1 async standby 1 in the same region as that of the primary region 1 i.e. close to primary sync standby 1 in region 2 sync standby 2 in region 3 an archive location in a region different from the primary and standbys regions, region 4 Note that I intentionally kept sync standbys in regions far from primary because it allows sync standbys to receive WAL a bit late by default, which works well for our testing. PGBENCH SETUP: ./psql -d postgres -c "drop database testdb" ./psql -d postgres -c "create database testdb" ./pgbench -i -s 100 testdb ./psql -d testdb -c "\dt" ./psql -d testdb -c "SELECT pg_size_pretty(pg_database_size('testdb'))" ./pgbench -c 10 -t 500 -P 10 testdb Regards, Bharath Rupireddy.
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Nitin JadhavДата:
Сообщение: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)