Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAD21AoD==vwyNWsrndRSbQWEnHujpOTC059u-arTFF3hxAye5g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Dec 15, 2021 at 1:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 15, 2021 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > > > While the worker is skipping one of the skip transactions specified by
> > > > the user and immediately if the user specifies another skip
> > > > transaction while the skipping of the transaction is in progress this
> > > > new value will be reset by the worker while clearing the skip xid. I
> > > > felt once the worker has identified the skip xid and is about to skip
> > > > the xid, the worker can acquire a lock to prevent concurrency issues:
> > >
> > > That's a good point.
> > > If only the last_error_xid could be skipped, then this wouldn't be an
> > > issue, right?
> > > If a different xid to skip is specified while the worker is currently
> > > skipping a transaction, should that even be allowed?
> > >
> >
> > We don't expect such usage but yes, it could happen and seems not
> > good. I thought we can acquire Share lock on pg_subscription during
> > the skip but not sure it's a good idea. It would be better if we can
> > find a way to allow users to specify only XID that has failed.
> >
>
> Yeah, but as we don't have a definite way to allow specifying only
> failed XID, I think it is better to use share lock on that particular
> subscription. We are already using it for add/update rel state (see,
> AddSubscriptionRelState, UpdateSubscriptionRelState), so this will be
> another place to use a similar technique.

Yes, but it seems to mean that we disallow users to change skip_xid
while the apply worker is skipping changes so we will end up having
the same problem we discussed so far;

In the current patch, we don't clear skip_xid at prepare time but do
that at commit-prepare time. But we cannot keep holding the lock until
commit-prepared comes because we don’t know when commit-prepared
comes. It’s possible that another conflict occurs before the
commit-prepared comes. We also cannot only clear skip_xid at prepare
time because it doesn’t solve the concurrency problem at
commit-prepared time. So if my understanding is correct, we need to
both clear skip_xid and unlock the lock at prepare time, and commit
the prepared (empty) transaction at commit-prepared time (I assume
that we prepare even empty transactions).

Suppose that at prepare time, we clear skip_xid (and release the lock)
and then prepare the transaction, if the server crashes right after
clearing skip_xid, skip_xid is already cleared but the transaction
will be sent again. The user has to specify skip_xid again. So let’s
change the order; we prepare the transaction and then clear skip_xid.
But if the server crashes between them, the transaction won’t be sent
again, but skip_xid is left. The user has to clear it. The left
skip_xid can automatically be cleared at commit-prepared time if XID
in the commit-prepared message matches skip_xid, but this actually
doesn’t solve the concurrency problem. If the user changed skip_xid
before commit-prepared, we would end up clearing the value. So we
might want to hold the lock until we clear skip_xid but we want to
avoid that as I explained first. It seems like we entered a loop.

It sounds better among these ideas that we clear skip_xid and then
prepare the transaction. Or we might want to revisit the idea of
storing skip_xid on shmem (e.g., ReplicationState) instead of the
catalog.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Aliaksandr Kalenik
Дата:
Сообщение: [PATCH] sort leaf pages by ctid for gist indexes built using sorted method
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Add id's to various elements in protocol.sgml