Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAD21AoBnxmRGMv5CZHVosp+g-vUzZtcJ0ecxfvk+aLEqTKFa-g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Dec 8, 2021 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I'll submit the patch tomorrow.
> >
> > While updating the patch, I realized that skipping a transaction that
> > is prepared on the publisher will be tricky a bit;
> >
> > First of all, since skip-xid is in pg_subscription catalog, we need to
> > do a catalog update in a transaction and commit it to disable it. I
> > think we need to set origin-lsn and timestamp of the transaction being
> > skipped to the transaction that does the catalog update. That is,
> > during skipping the (not prepared) transaction, we skip all
> > data-modification changes coming from the publisher, do a catalog
> > update, and commit the transaction. If we do the catalog update in the
> > next transaction after skipping the whole transaction, skip_xid could
> > be left in case of a server crash between them.
> >
>
> But if we haven't updated origin_lsn/timestamp before the crash, won't
> it request the same transaction again from the publisher? If so, it
> will be again able to skip it because skip_xid is still not updated.

Yes. I mean that if we update origin_lsn and origin_timestamp when
committing the skipped transaction and then update the catalog in the
next transaction it doesn't work in case of a crash. But it's not
possible in the first place since the first transaction is empty and
we cannot set origin_lsn and origin_timestamp to it.

>
> > Also, we cannot set
> > origin-lsn and timestamp to an empty transaction.
> >
>
> But won't we update the catalog for skip_xid in that case?

Yes. Probably my explanation was not clear. Even if we skip all
changes of the transaction, the transaction doesn't become empty since
we update the catalog.

>
> Do we see any advantage of updating the skip_xid in the same
> transaction vs. doing it in a separate transaction? If not then
> probably we can choose either of those ways and add some comments to
> indicate the possibility of doing it another way.

I think that since the skipped transaction is always empty there is
always one transaction. What we need to consider is when we update
origin_lsn and origin_timestamp. In non-prepared transaction cases,
the only option is when updating the catalog.

>
> > In prepared transaction cases, I think that when handling a prepare
> > message, we need to commit the transaction to update the catalog,
> > instead of preparing it. And at the commit prepared and rollback
> > prepared time, we skip it since there is not the prepared transaction
> > on the subscriber.
> >
>
> Can't we think of just allowing prepare in this case and updating the
> skip_xid only at commit time? I see that in this case, we would be
> doing prepare for a transaction that has no changes but as such cases
> won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right? If
so, since these are separate transactions it can be a problem in case
of a crash between these two commits.

>
> > Currently, handling rollback prepared already
> > behaves so; it first checks whether we have prepared the transaction
> > or not and skip it if haven’t. So I think we need to do that also for
> > commit prepared case. With that, this requires protocol changes so
> > that the subscriber can get prepare-lsn and prepare-time when handling
> > commit prepared.
> >
> > So I’m writing a separate patch to add prepare-lsn and timestamp to
> > commit_prepared message, which will be a building block for skipping
> > prepared transactions. Actually, I think it’s beneficial even today;
> > we can skip preparing the transaction if it’s an empty transaction.
> > Although the comment it’s not a common case, I think that it could
> > happen quite often in some cases:
> >
> >     * XXX, We can optimize such that at commit prepared time, we first check
> >     * whether we have prepared the transaction or not but that doesn't seem
> >     * worthwhile because such cases shouldn't be common.
> >     */
> >
> > For example, if the publisher has multiple subscriptions and there are
> > many prepared transactions that modify the particular table subscribed
> > by one publisher, many empty transactions are replicated to other
> > subscribers.
> >
>
> I think this is not clear to me. Why would one have multiple
> subscriptions for the same publication? I thought it is possible when
> say some publisher doesn't publish any data of prepared transaction
> say because the corresponding action is not published or something
> like that. I don't deny that someday we want to optimize this case but
> it might be better if we don't need to do it along with this patch.

I imagined that the publisher has two publications (say pub-A and
pub-B) that publishes a diferent set of relations in the database and
there are two subscribers that are subscribing to either one
publication (e.g, subscriber-A subscribes to pub-A and subscriber-B
subscribes to pub-B). If many prepared transactions happen on the
publisher and these transactions modify only relations published by
pub-A,  both subscriber-A and subscriber-B would prepare the same
number of transactions but all of them in subscriber-B is empty.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: GUC flags