Re: Conflict detection for update_deleted in logical replication
От | Amit Kapila |
---|---|
Тема | Re: Conflict detection for update_deleted in logical replication |
Дата | |
Msg-id | CAA4eK1LA7mnvKT9L8Nx_h+0TCvq-Ob2BGPO1bQKhkGHtoZKsow@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Ответы |
Re: Conflict detection for update_deleted in logical replication
|
Список | pgsql-hackers |
On Wed, Sep 10, 2025 at 9:08 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Tuesday, September 9, 2025 7:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Sep 9, 2025 at 11:47 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > > wrote: > > > > Few other comments: > > 1. > > + /* > > + * Return if the launcher has not initialized oldest_nonremovable_xid. > > + * > > + * It might seem feasible to directly check the conflict detection > > + * slot.xmin instead of relying on the launcher to assign the worker's > > + * oldest_nonremovable_xid; however, that could lead to a race > > + condition > > + * where slot.xmin is set to InvalidTransactionId immediately after the > > + * check. In such cases, oldest_nonremovable_xid would no longer be > > + * protected by a replication slot and could become unreliable if a > > + * wraparound occurs. > > + */ > > + if (!TransactionIdIsValid(nonremovable_xid)) > > + return; > > > > I understand the reason why we get assigned the worker's non_removable_xid > > from launcher but all this makes the code complex to understand. Isn't there > > any other way to achieve it? One naïve and inefficient way could be to just > > restart the worker probably after updating its retentionactive flag. I am not > > suggesting to make this change but just a brainstorming point. > > I'll keep thinking about it, and for now, I've added a comment mentioning > that rebooting is a simpler solution. > Say we have a LW LogicalRepRetentionLock. We acquire it in SHARED mode as soon as we encounter the first subscription with retain_dead_tuples set during the traversal of the sublist. We release it only after updating xmin outside the sublist traversal. We then acquire it in EXCLUSIVE mode to fetch the resume the retention in worker for the period till we fetch slot's xmin. This will close the above race condition but acquiring LWLock while traversing subscription is not advisable as that will make it uninterruptible. The other possibility is to use some heavy-weight lock here, say a lock on pg_subscription catalog but that has a drawback that it can conflict with DDL operations. Yet another way is to invent a new lock-type for this. OTOH, the simple strategy to let apply-worker restart to resume retention will keep the handling simpler. We do something similar at the start of apply-worker if we find that some subscription parameter is changed. I think we need more opinions on this matter. One other comment: + if (!MySubscription->retentionactive) + { + rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION; + process_rdt_phase_transition(rdt_data, false); + return; + } It is better that the caller processes the next phase, otherwise, this looks a bit ad hoc. Similarly, please check other places. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: