Re: Conflict detection for update_deleted in logical replication
От | Amit Kapila |
---|---|
Тема | Re: Conflict detection for update_deleted in logical replication |
Дата | |
Msg-id | CAA4eK1+POfiOGDaGQ=Mab5R8Vgd8k1N9vKkqMay+jYQKT3w+cA@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Conflict detection for update_deleted in logical replication ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Ответы |
RE: Conflict detection for update_deleted in logical replication
|
Список | pgsql-hackers |
On Tue, Sep 9, 2025 at 11:47 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Here is V71 patch set which addressed above comments and [1]. > IIUC, this patch after stopping the retention, it immediately starts retrying to resume by transitioning through various phases. This can consume CPU and network resources, if the apply_worker takes a long time to catch up. Few other comments: 1. + /* + * Return if the launcher has not initialized oldest_nonremovable_xid. + * + * It might seem feasible to directly check the conflict detection + * slot.xmin instead of relying on the launcher to assign the worker's + * oldest_nonremovable_xid; however, that could lead to a race condition + * where slot.xmin is set to InvalidTransactionId immediately after the + * check. In such cases, oldest_nonremovable_xid would no longer be + * protected by a replication slot and could become unreliable if a + * wraparound occurs. + */ + if (!TransactionIdIsValid(nonremovable_xid)) + return; I understand the reason why we get assigned the worker's non_removable_xid from launcher but all this makes the code complex to understand. Isn't there any other way to achieve it? One naïve and inefficient way could be to just restart the worker probably after updating its retentionactive flag. I am not suggesting to make this change but just a brainstorming point. 2. The function should_stop_conflict_info_retention() is invoked from wait_for_local_flush() and then it can lead further state transitioning which doesn't appear neat and makes code difficult to understand. 3. + /* + * If conflict info retention was previously stopped due to a timeout, and + * the time required to advance the non-removable transaction ID has now + * decreased to within acceptable limits, resume the rentention. + */ + if (!MySubscription->retentionactive) + { + rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION; + process_rdt_phase_transition(rdt_data, false); + return; + } In this check, where do we check the time has come in acceptable range? Can you update comments to make it clear? -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: