Re: Conflict detection for update_deleted in logical replication

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Conflict detection for update_deleted in logical replication
Дата	10 сентября 11:39:34
Msg-id	CAA4eK1LA7mnvKT9L8Nx_h+0TCvq-Ob2BGPO1bQKhkGHtoZKsow@mail.gmail.com обсуждение исходный текст
Ответ на	RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Ответы	Re: Conflict detection for update_deleted in logical replication
Список	pgsql-hackers

Дерево обсуждения

On Wed, Sep 10, 2025 at 9:08 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, September 9, 2025 7:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Tue, Sep 9, 2025 at 11:47 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> > wrote:
> >
> > Few other comments:
> > 1.
> > + /*
> > + * Return if the launcher has not initialized oldest_nonremovable_xid.
> > + *
> > + * It might seem feasible to directly check the conflict detection
> > + * slot.xmin instead of relying on the launcher to assign the worker's
> > + * oldest_nonremovable_xid; however, that could lead to a race
> > + condition
> > + * where slot.xmin is set to InvalidTransactionId immediately after the
> > + * check. In such cases, oldest_nonremovable_xid would no longer be
> > + * protected by a replication slot and could become unreliable if a
> > + * wraparound occurs.
> > + */
> > + if (!TransactionIdIsValid(nonremovable_xid))
> > + return;
> >
> > I understand the reason why we get assigned the worker's non_removable_xid
> > from launcher but all this makes the code complex to understand. Isn't there
> > any other way to achieve it? One naïve and inefficient way could be to just
> > restart the worker probably after updating its retentionactive flag. I am not
> > suggesting to make this change but just a brainstorming point.
>
> I'll keep thinking about it, and for now, I've added a comment mentioning
> that rebooting is a simpler solution.
>

Say we have a LW LogicalRepRetentionLock. We acquire it in SHARED mode
as soon as we encounter the first subscription with retain_dead_tuples
set during the traversal of the sublist. We release it only after
updating xmin outside the sublist traversal. We then acquire it in
EXCLUSIVE mode to fetch the resume the retention in worker for the
period till we fetch slot's xmin.

This will close the above race condition but acquiring LWLock while
traversing subscription is not advisable as that will make it
uninterruptible. The other possibility is to use some heavy-weight
lock here, say a lock on pg_subscription catalog but that has a
drawback that it can conflict with DDL operations. Yet another way is
to invent a new lock-type for this.

OTOH, the simple strategy to let apply-worker restart to resume
retention will keep the handling simpler. We do something similar at
the start of apply-worker if we find that some subscription parameter
is changed. I think we need more opinions on this matter.

One other comment:
+ if (!MySubscription->retentionactive)
+ {
+ rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION;
+ process_rdt_phase_transition(rdt_data, false);
+ return;
+ }

It is better that the caller processes the next phase, otherwise, this
looks a bit ad hoc. Similarly, please check other places.

--
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Conflict detection for update_deleted in logical replication