Re: Proposal: Conflict log history table for Logical Replication
| От | shveta malik |
|---|---|
| Тема | Re: Proposal: Conflict log history table for Logical Replication |
| Дата | |
| Msg-id | CAJpy0uC0ZWgHOivJ102A1fMkppwK3RuSMafRPKyjwkmJrjhVUw@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Proposal: Conflict log history table for Logical Replication (shveta malik <shveta.malik@gmail.com>) |
| Ответы |
Re: Proposal: Conflict log history table for Logical Replication
|
| Список | pgsql-hackers |
On Wed, Nov 12, 2025 at 3:14 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Nov 12, 2025 at 2:40 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Nov 12, 2025 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Fri, Sep 26, 2025 at 4:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > I agree that marking tables with a flag to easily exclude them during > > > publishing would be cleaner. In the current patch, for an ALL-TABLES > > > publication, we scan pg_subscription for each table in pg_class to > > > check its subconflicttable and decide whether to ignore it. But since > > > this only happens during create/alter subscription and refresh > > > publication, the overhead should be acceptable. > > > > Thanks for your opinion. > > > > > Introducing a ‘NON_PUBLISHABLE_TABLE’ option would be a good > > > enhancement but since we already have the EXCEPT list built in a > > > separate thread, that might be sufficient for now. IMO, such > > > conflict-tables should be marked internally (for example, with a > > > ‘non_publishable’ or ‘conflict_log_table’ flag) so they can be easily > > > identified within the system, without requiring users to explicitly > > > specify them in EXCEPT or as NON_PUBLISHABLE_TABLE. I would like to > > > see what others think on this. > > > For the time being, the current implementation looks fine, considering > > > it runs only during a few publication-related DDL operations. > > > > +1 > > > > Here is the rebased patch, changes apart from rebasing it > > 1) Dropped the conflict history table during drop subscription > > 2) Added test cases for testing the conflict history table behavior > > with CREATE/ALTER/DROP subscription > > Thanks. > > > TODO: > > 1) Need more thoughts on the table schema whether we need to capture > > more items or shall we drop some fields if we think those are not > > necessary. > > Yes, this needs some more thoughts. I will review. > > I feel since design is somewhat agreed upon, we may handle > code-correction/completion. I have not looked at the rebased patch > yet, but here are a few comments based on old-version. > > Few observations related to publication. > ------------------------------ > > (In the below comments, clt/CLT implies Conflict Log Table) > > 1) > 'select pg_relation_is_publishable(clt)' returns true for conflict-log table. > > 2) > '\d+ clt' shows all-tables publication name. I feel we should not > show that for clt. > > 3) > I am able to create a publication for clt table, should it be allowed? > > create subscription sub1 connection '...' publication pub1 > WITH(conflict_log_table='clt'); > create publication pub3 for table clt; > > 4) > Is there a reason we have not made '!IsConflictHistoryRelid' check as > part of is_publishable_class() itself? If we do so, other code-logics > will also get clt as non-publishable always (and will solve a few of > the above issues I think). IIUC, there is no place where we want to > mark CLT as publishable or is there any? > > 5) Also, I feel we can add some documentation now to help others to > understand/review the patch better without going through the long > thread. > > > Few observations related to conflict-logging: > ------------------------------ > 1) > I found that for the conflicts which ultimately result in Error, we do > not insert any conflict-record in clt. > > a) > Example: insert_exists, update_Exists > create table tab1 (i int primary key, j int); > sub: insert into tab1 values(30,10); > pub: insert into tab1 values(30,10); > ERROR: conflict detected on relation "public.tab1": conflict=insert_exists > No record in clt. > > sub: > <some pre-data needed> > update tab1 set i=40 where i = 30; > pub: update tab1 set i=40 where i = 20; > ERROR: conflict detected on relation "public.tab1": conflict=update_exists > No record in clt. > > b) > Another question related to this is, since these conflicts (which > results in error) keep on happening until user resolves these or skips > these or 'disable_on_error' is set. Then are we going to insert these > multiple times? We do count these in 'confl_insert_exists' and > 'confl_update_exists' everytime, so it makes sense to log those each > time in clt as well. Thoughts? > > 2) > Conflicts where row on sub is missing, local_ts incorrectly inserted. > It is '2000-01-01 05:30:00+05:30'. Should it be Null or something > indicating that it is not applicable for this conflict-type? > > Example: delete_missing, update_missing > pub: > insert into tab1 values(10,10); > insert into tab1 values(20,10); > sub: delete from tab1 where i=10; > pub: delete from tab1 where i=10; > 3) We also need to think how we are going to display the info in case of multiple_unique_conflicts as there could be multiple local and remote tuples conflicting for one single operation. Example: create table conf_tab (a int primary key, b int unique, c int unique); sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4); pub: insert into conf_tab values (2,3,4); ERROR: conflict detected on relation "public.conf_tab": conflict=multiple_unique_conflicts DETAIL: Key already exists in unique index "conf_tab_pkey", modified locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30. Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4). Key already exists in unique index "conf_tab_b_key", modified locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30. Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4). Key already exists in unique index "conf_tab_c_key", modified locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30. Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4). CONTEXT: processing remote data for replication origin "pg_16392" during message type "INSERT" for replication target relation "public.conf_tab" in transaction 781, finished at 0/017FDDA0 Currently in clt, we have singular terms such as 'key_tuple', 'local_tuple', 'remote_tuple'. Shall we have multiple rows inserted? But it does not look reasonable to have multiple rows inserted for a single conflict raised. I will think more about this. thanks Shveta
В списке pgsql-hackers по дате отправления: