Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

Поиск

Список

Период

Сортировка

От	Bharath Rupireddy
Тема	Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Дата	25 февраля 2022 г. 18:01:37
Msg-id	CALj2ACVUa8WddVDS20QmVKNwTbeOQqy4zy59NPzh8NnLipYZGw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>)
Ответы	Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) (Nathan Bossart <nathandbossart@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Jan 6, 2022 at 1:29 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> Consider a cluster formation where we have a Primary(P), Sync Replica(S1), and multiple async replicas for disaster
recoveryand read scaling (within the region and outside the region). In this setup, S1 is the preferred failover target
inan event of the primary failure. When a transaction is committed on the primary, it is not acknowledged to the client
untilthe primary gets an acknowledgment from the sync standby that the WAL is flushed to the disk (assume
synchrnous_commitconfiguration is remote_flush). However, walsenders corresponds to the async replica on the primary
don'twait for the flush acknowledgment from the primary and send the WAL to the async standbys (also any logical
replication/decodingclients). So it is possible for the async replicas and logical client ahead of the sync replica. If
afailover is initiated in such a scenario, to bring the formation into a healthy state we have to either 
>
>  run the pg_rewind on the async replicas for them to reconnect with the new primary or
> collect the latest WAL across the replicas and feed the standby.
>
> Both these operations are involved, error prone, and can cause multiple minutes of downtime if done manually. In
addition,there is a window where the async replicas can show the data that was neither acknowledged to the client nor
committedon standby. Logical clients if they are ahead may need to reseed the data as no easy rewind option for them. 
>
> I would like to propose a GUC send_Wal_after_quorum_committed which when set to ON, walsenders corresponds to async
standbysand logical replication workers wait until the LSN is quorum committed on the primary before sending it to the
standby.This not only simplifies the post failover steps but avoids unnecessary downtime for the async replicas.
Thoughts?

Thanks Satya and others for the inputs. Here's the v1 patch that
basically allows async wal senders to wait until the sync standbys
report their flush lsn back to the primary. Please let me know your
thoughts.

I've done pgbench testing to see if the patch causes any problems. I
ran tests two times, there isn't much difference in the txns per
seconds (tps), although there's a delay in the async standby receiving
the WAL, after all, that's the feature we are pursuing.

[1]
HEAD or WITHOUT PATCH:
./pgbench -c 10 -t 500 -P 10 testdb
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 500
number of transactions actually processed: 5000/5000
latency average = 247.395 ms
latency stddev = 74.409 ms
initial connection time = 13.622 ms
tps = 39.713114 (without initial connection time)

PATCH:
./pgbench -c 10 -t 500 -P 10 testdb
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 500
number of transactions actually processed: 5000/5000
latency average = 251.757 ms
latency stddev = 72.846 ms
initial connection time = 13.025 ms
tps = 39.315862 (without initial connection time)

TEST SETUP:
primary in region 1
async standby 1 in the same region as that of the primary region 1
i.e. close to primary
sync standby 1 in region 2
sync standby 2 in region 3
an archive location in a region different from the primary and
standbys regions, region 4
Note that I intentionally kept sync standbys in regions far from
primary because it allows sync standbys to receive WAL a bit late by
default, which works well for our testing.

PGBENCH SETUP:
./psql -d postgres -c "drop database testdb"
./psql -d postgres -c "create database testdb"
./pgbench -i -s 100 testdb
./psql -d testdb -c "\dt"
./psql -d testdb -c "SELECT pg_size_pretty(pg_database_size('testdb'))"
./pgbench -c 10 -t 500 -P 10 testdb

Regards,
Bharath Rupireddy.

Вложения

v1-0001-Allow-async-standbys-wait-for-sync-replication.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Fabrízio de Royes Mello
Дата: 25 февраля 2022 г., 17:58:48
Сообщение: Size functions inconsistent results

Следующее

От: Nitin Jadhav
Дата: 25 февраля 2022 г., 18:07:28
Сообщение: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

Вложения

Предыдущее

Следующее