Re: Skipping logical replication transactions on subscriber side

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Skipping logical replication transactions on subscriber side
Дата
Msg-id CAD21AoBdEcyXKMCMws7HjcYDbbPyq_KfUbCnTX84rDeP45Hbrw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Skipping logical replication transactions on subscriber side  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Jan 26, 2022 at 8:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 26, 2022 at 12:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
> > <david.g.johnston@gmail.com> wrote:
> > >
> > > We probably should just provide an option for the user to specify "subrelid".  If null, only the main apply
workerwill skip the given xid, otherwise only the worker tasked with syncing that particular table will do so.  It
mighttake a sequence of ALTER SUBSCRIPTION SET commands to get a broken initial table synchronization to load
completelybut at least there will not be any surprises as to which tables had transactions skipped and which did not. 
> >
> > That would work but I’m concerned that the users can specify it
> > properly. Also, we would need to change the errcontext message
> > generated by apply_error_callback() so the user can know that the
> > error occurred in either apply worker or tablesync worker.
> >
> > Or, as another idea, since an error during table synchronization is
> > not common and could be resolved by truncating the table and
> > restarting the synchronization in practice, there might be no need
> > this much and we can support it only for apply worker errors.
> >
>
> Yes, that is what I have also in mind. We can always extend this
> feature for tablesync process because it can not only fail for the
> specified skip_xid but also for many other reasons during the initial
> copy.

I'll update the patch accordingly to test and verify this approach.

In the meantime, I’d like to discuss the possible ideas of storing the
error XID somewhere the worker can see it even after a restart. It has
been proposed that the worker updates the catalog when an error
occurs, which was criticized as updating the catalog in such a
situation is not a good idea.

The next idea I considered was to store the error XID somewhere on
shmem (e.g., ReplicationState). But It requires entries at least as
much as subscriptions in principle, not
max_logical_replcation_workers. Since we don’t know it at startup
time, we need to use DSM or cache with a fixed number of entries. It
seems overkill to me.

The third idea, which is slightly better than others, is to update the
catalog by the launcher process, not the worker process; when an error
occurs, the apply worker stores the error XID (and maybe its
subscription OID) into its LogicalRepWorker entry, and the launcher
updates the corresponding entry of pg_subscription catalog before
launching workers. After the worker restarts, it clears the error XID
on the catalog if it successfully applied the transaction with the
error XID. The user can enable the skipping transaction behavior by a
query say ALTER SUBSCRIPTION SKIP ENABLED. The user cannot enable the
skipping behavior if the error XID is not set. If the skipping
behavior is enabled and the error XID is a valid value, the worker
skips the transaction and then clears both the error XID and a flag of
skipping behavior on the catalog.

With this idea, we don’t need a complex mechanism to store the error
XID for each subscription and can ensure to skip only the transaction
in question. But my concern is that the launcher updates the catalog.
Since it doesn’t connect to any database, probably it cannot open the
catalog indexes (because it requires lookup pg_class). Therefore, we
have to use in-place updates here. Through quick tests, I’ve confirmed
that using heap_inplace_update() to update the error XID on
pg_subscription tuples seems to work but not sure using an in-place
update here is a legitimate approach.

What do you think and any ideas?


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Design of pg_stat_subscription_workers vs pgstats
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Output clause for Upsert aka INSERT...ON CONFLICT