RE: Avoid retaining conflict-related data when no tables are subscribed
От | Zhijie Hou (Fujitsu) |
---|---|
Тема | RE: Avoid retaining conflict-related data when no tables are subscribed |
Дата | |
Msg-id | TY4PR01MB16907424D996A8930F6D293AF9406A@TY4PR01MB16907.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Avoid retaining conflict-related data when no tables are subscribed (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Avoid retaining conflict-related data when no tables are subscribed
|
Список | pgsql-hackers |
On Friday, August 29, 2025 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 28, 2025 at 7:54 AM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > Hi, > > > > My colleague Nisha reported an issue to me off-list: dead tuples can't > > be removed when retain_dead_tuples is enabled for a subscription with no > tables. > > > > This appears to stem from the inability to advance the non-removable > > transaction ID when AllTablesyncsReady() returns false. Since this > > function returns false when no tables are present, which leads to > > unnecessary data retention until a table is added to the subscription. > > > > Since dead tuples don't need to be retained when no tables are > > subscribed, here is a patch to fix it, modifying AllTablesyncsReady() > > to allows no tables to be treated as a ready state when explicitly requested. > > > > Few comments: > ============ > Aren't following two paragraphs in comments contradict each other: > > * It is safe to add new tables with initial states to the subscription > * after this check because any changes applied to these tables should > * have a WAL position greater than the rdt_data->remote_lsn. > + * > + * Advancing the transaction ID is also necessary when no tables are > + * subscribed, as it prevents unnecessary retention of dead tuples. > +Although > + * it seem feasible to skip all phases and directly assign > +candidate_xid to > + * oldest_nonremovable_xid in the RDT_GET_CANDIDATE_XID phase > when no > +tables > + * are currently subscribed, this approach is unsafe. This is because > +new > + * tables may be added to the subscription after the initial table > +check, > + * requiring tuples deleted before candidate_xid for conflict > +detection in > + * upcoming transactions. Therefore, it remains necessary to wait for > +all > + * concurrent transactions to be fully applied. > */ > > In the first para, the comments say that it is okay to add tables after this check > and in the second para, it says that is not okay? I have removed the 1st para because it's inaccurate. > > 2. > + * If the subscription has no tables, return the value determined by > + * 'ready_if_no_tables'. > + * > + * Otherwise, return whether all the tables for the subscription are in > + the > + * READY state. > * > * Note: This function is not suitable to be called from outside of apply or > * tablesync workers because MySubscription needs to be already initialized. > */ > bool > -AllTablesyncsReady(void) > +AllTablesyncsReady(bool ready_if_no_tables) > > This change serves the purpose but I find it makes the API complex to > understand because now it needs to make decisions based on different states > depending on the boolean parameter passed. Can we introduce a new API for > the empty subscription case? Added a new function HasSubscriptionRelationsCached() as suggested. Attached is the V2 patch which addresses the above comments and fixes a typo. Apart from the original reported issue, we noticed another point that can be improved during the implementation of update_deleted patches: When a disabled subscription is created with retain_dead_tuples set to true, the launcher is not woken up immediately, which may lead to delays in creating the conflict detection slot and cause user confusion. So, I prepared the 0002 patch to fix this issue. Best Regards, Hou zj
Вложения
В списке pgsql-hackers по дате отправления: