Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Дата
Msg-id CALj2ACWoB3k0kfjA7JxJgskVGGeE3jWzmGRPjP9QTRSCgSjhOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Laurenz Albe <laurenz.albe@cybertec.at>)
Ответы Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-hackers
On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote:
> > With synchronous replication typically all the transactions (txns)
> > first locally get committed, then streamed to the sync standbys and
> > the backend that generated the transaction will wait for ack from sync
> > standbys. While waiting for ack, it may happen that the query or the
> > txn gets canceled (QueryCancelPending is true) or the waiting backend
> > is asked to exit (ProcDiePending is true). In either of these cases,
> > the wait for ack gets canceled and leaves the txn in an inconsistent
> > state [...]
> >
> > Here's a proposal (mentioned previously by Satya [1]) to avoid the
> > above problems:
> > 1) Wait a configurable amount of time before canceling the sync
> > replication by the backends i.e. delay processing of
> > QueryCancelPending and ProcDiePending in Introduced a new timeout GUC
> > synchronous_replication_naptime_before_cancel, when set, it will let
> > the backends wait for the ack before canceling the synchronous
> > replication so that the transaction can be available in sync standbys
> > as well.
> > 2) Wait for sync standbys to catch up upon restart after the crash or
> > in the next txn after the old locally committed txn was canceled.
>
> While this may mitigate the problem, I don't think it will deal with
> all the cases which could cause a transaction to end up committed locally,
> but not on the synchronous standby.  I think that only using the full
> power of two-phase commit can make this bulletproof.

Not sure if it's recommended to use 2PC in postgres HA with sync
replication where the documentation says that "PREPARE TRANSACTION"
and other 2PC commands are "intended for use by external transaction
management systems" and with explicit transactions. Whereas, the txns
within a postgres HA with sync replication always don't have to be
explicit txns. Am I missing something here?

> Is it worth adding additional complexity that is not a complete solution?

The proposed approach helps to avoid some common possible problems
that arise with simple scenarios (like cancelling a long running query
while in SyncRepWaitForLSN) within sync replication.

[1] https://www.postgresql.org/docs/devel/sql-prepare-transaction.html

Regards,
Bharath Rupireddy.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Support logical replication of DDLs
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication