Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAD21AoDHCAD9anS6ZzNyG1-B7F_BvzyiNfZ=O-tMEdwmKhfW8A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
> > > <david.g.johnston@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >>
> > > >> Yeah, I think it's a good idea to clear the subskipxid after the first
> > > >> transaction regardless of whether the worker skipped it.
> > > >>
> > > >
> > > > So basically instead of stopping the worker with an error you suggest having the worker continue applying
changes(after resetting subskipxid, and - arguably - the ?_error_* fields).  Log the transaction xid mis-match as a
warningin the log file as opposed to an error. 
> > >
> > > Agreed, I think it's better to log a warning than to raise an error.
> > > In the case where the user specified the wrong XID, the worker should
> > > fail again due to the same error.
> > >
> >
> > IIUC, the proposal is to compare the skip_xid with the very
> > transaction the apply worker received to apply and raise a warning if
> > it doesn't match with skip_xid and then continue. This seems like a
> > reasonable idea but can we guarantee that it is always the first
> > transaction that we want to skip? We seem to guarantee that we won't
> > get something again once it is written durably/flushed on the
> > subscriber side. I guess here it can happen that before the errored
> > transaction, there is some empty xact, or maybe part of the stream
> > (consider streaming transactions) of some xact, or there could be
> > other cases as well where the server will send those xacts again.
>
> Good point.
>
> I guess that in the situation the worker entered an error loop, we can
> guarantee that the worker fails while applying the first non-empty
> transaction since starting logical replication. And the transaction is
> what we’d like to skip. If the transaction that can be applied without
> an error is resent after a restart, it’s a problem of logical
> replication. As you pointed out, it's possible that there are some
> empty transactions before the transaction in question since we don't
> advance replication origin LSN if the transaction is empty. Also,
> probably the same is true for a streamed transaction that is rolled
> back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
> subskipxid if the transaction is empty? That is, we make sure to clear
> it after applying the first non-empty transaction. We would need to
> carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
> ends up not working at all in some cases.

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: BRIN summarization vs. WAL logging
Следующее
От: "wangw.fnst@fujitsu.com"
Дата:
Сообщение: RE: Logical replication timeout problem