Re: Skipping logical replication transactions on subscriber side
От | Masahiko Sawada |
---|---|
Тема | Re: Skipping logical replication transactions on subscriber side |
Дата | |
Msg-id | CAD21AoBdEcyXKMCMws7HjcYDbbPyq_KfUbCnTX84rDeP45Hbrw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Skipping logical replication transactions on subscriber side (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Wed, Jan 26, 2022 at 8:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 26, 2022 at 12:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston > > <david.g.johnston@gmail.com> wrote: > > > > > > We probably should just provide an option for the user to specify "subrelid". If null, only the main apply workerwill skip the given xid, otherwise only the worker tasked with syncing that particular table will do so. It mighttake a sequence of ALTER SUBSCRIPTION SET commands to get a broken initial table synchronization to load completelybut at least there will not be any surprises as to which tables had transactions skipped and which did not. > > > > That would work but I’m concerned that the users can specify it > > properly. Also, we would need to change the errcontext message > > generated by apply_error_callback() so the user can know that the > > error occurred in either apply worker or tablesync worker. > > > > Or, as another idea, since an error during table synchronization is > > not common and could be resolved by truncating the table and > > restarting the synchronization in practice, there might be no need > > this much and we can support it only for apply worker errors. > > > > Yes, that is what I have also in mind. We can always extend this > feature for tablesync process because it can not only fail for the > specified skip_xid but also for many other reasons during the initial > copy. I'll update the patch accordingly to test and verify this approach. In the meantime, I’d like to discuss the possible ideas of storing the error XID somewhere the worker can see it even after a restart. It has been proposed that the worker updates the catalog when an error occurs, which was criticized as updating the catalog in such a situation is not a good idea. The next idea I considered was to store the error XID somewhere on shmem (e.g., ReplicationState). But It requires entries at least as much as subscriptions in principle, not max_logical_replcation_workers. Since we don’t know it at startup time, we need to use DSM or cache with a fixed number of entries. It seems overkill to me. The third idea, which is slightly better than others, is to update the catalog by the launcher process, not the worker process; when an error occurs, the apply worker stores the error XID (and maybe its subscription OID) into its LogicalRepWorker entry, and the launcher updates the corresponding entry of pg_subscription catalog before launching workers. After the worker restarts, it clears the error XID on the catalog if it successfully applied the transaction with the error XID. The user can enable the skipping transaction behavior by a query say ALTER SUBSCRIPTION SKIP ENABLED. The user cannot enable the skipping behavior if the error XID is not set. If the skipping behavior is enabled and the error XID is a valid value, the worker skips the transaction and then clears both the error XID and a flag of skipping behavior on the catalog. With this idea, we don’t need a complex mechanism to store the error XID for each subscription and can ensure to skip only the transaction in question. But my concern is that the launcher updates the catalog. Since it doesn’t connect to any database, probably it cannot open the catalog indexes (because it requires lookup pg_class). Therefore, we have to use in-place updates here. Through quick tests, I’ve confirmed that using heap_inplace_update() to update the error XID on pg_subscription tuples seems to work but not sure using an in-place update here is a legitimate approach. What do you think and any ideas? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
В списке pgsql-hackers по дате отправления: