Re: Conflict detection for update_deleted in logical replication

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Conflict detection for update_deleted in logical replication
Дата	9 сентября 14:00:43
Msg-id	CAA4eK1+POfiOGDaGQ=Mab5R8Vgd8k1N9vKkqMay+jYQKT3w+cA@mail.gmail.com обсуждение исходный текст
Ответ на	RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Ответы	RE: Conflict detection for update_deleted in logical replication
Список	pgsql-hackers

Дерево обсуждения

On Tue, Sep 9, 2025 at 11:47 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is V71 patch set which addressed above comments and [1].
>

IIUC, this patch after stopping the retention, it immediately starts
retrying to resume by transitioning through various phases. This can
consume CPU and network resources, if the apply_worker takes a long
time to catch up.

Few other comments:
1.
+ /*
+ * Return if the launcher has not initialized oldest_nonremovable_xid.
+ *
+ * It might seem feasible to directly check the conflict detection
+ * slot.xmin instead of relying on the launcher to assign the worker's
+ * oldest_nonremovable_xid; however, that could lead to a race condition
+ * where slot.xmin is set to InvalidTransactionId immediately after the
+ * check. In such cases, oldest_nonremovable_xid would no longer be
+ * protected by a replication slot and could become unreliable if a
+ * wraparound occurs.
+ */
+ if (!TransactionIdIsValid(nonremovable_xid))
+ return;

I understand the reason why we get assigned the worker's
non_removable_xid from launcher but all this makes the code complex to
understand. Isn't there any other way to achieve it? One naïve and
inefficient way could be to just restart the worker probably after
updating its retentionactive flag. I am not suggesting to make this
change but just a brainstorming point.

2. The function should_stop_conflict_info_retention() is invoked from
wait_for_local_flush() and then it can lead further state
transitioning which doesn't appear neat and makes code difficult to
understand.

3.
+ /*
+ * If conflict info retention was previously stopped due to a timeout, and
+ * the time required to advance the non-removable transaction ID has now
+ * decreased to within acceptable limits, resume the rentention.
+ */
+ if (!MySubscription->retentionactive)
+ {
+ rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION;
+ process_rdt_phase_transition(rdt_data, false);
+ return;
+ }

In this check, where do we check the time has come in acceptable
range? Can you update comments to make it clear?

--
With Regards,
Amit Kapila.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Conflict detection for update_deleted in logical replication