Обсуждение: Add into REFRESH PUBLICATION parameter exception_behaviour
Hi, The LR Subscription has an option disable_on_error that allows the user to define the exception behaviour in case of an error during replication: Postgres can disable the subscription or let it retry the operation later. It seems ok for the subscription as a whole. Since the introduction of 'ADD/DROP table' into the ALTER PUBLICATION command, we have an option to sync/resync some table - it may be useful in case of row filter changed or to ensure we have correct data if conflict stats has detected potential issues. So, when executing REFRESH PUBLICATION or even performing the initial synchronisation of dozens of tables, Postgres may stop the whole process if a table has synchronisation issues - it may be a constraint, or we forget to truncate a specific table on the subscriber and get a conflict. So, for the sake of not disabling the whole subscription that may be replicating a massive database and not interrupting the synchronisation process, it makes sense to introduce a parameter, like reloption, that could override the current exception behaviour. For example, exception_behaviour can be 'disable', 'retry', or 'skip'. The last value allows skipping errors during a table copy, checking sync statuses later, fixing issues and repeating the process for problematic tables. In terms of SQL, it may look like the following: ALTER SUBSCRIPTION mysub REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’); Does it make sense? -- regards, Andrei Lepikhov, pgEdge
On Mon, Feb 16, 2026 at 6:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > > The LR Subscription has an option disable_on_error that allows the user > to define the exception behaviour in case of an error during > replication: Postgres can disable the subscription or let it retry the > operation later. It seems ok for the subscription as a whole. > > > Since the introduction of 'ADD/DROP table' into the ALTER PUBLICATION > command, we have an option to sync/resync some table - it may be useful > in case of row filter changed or to ensure we have correct data if > conflict stats has detected potential issues. > So, when executing REFRESH PUBLICATION or even performing the initial > synchronisation of dozens of tables, Postgres may stop the whole process > if a table has synchronisation issues - it may be a constraint, or we > forget to truncate a specific table on the subscriber and get a conflict. > > > So, for the sake of not disabling the whole subscription that may be > replicating a massive database and not interrupting the synchronisation > process, it makes sense to introduce a parameter, like reloption, that > could override the current exception behaviour. > For example, exception_behaviour can be 'disable', 'retry', or 'skip'. > The last value allows skipping errors during a table copy, checking sync > statuses later, fixing issues and repeating the process for problematic > tables. > > In terms of SQL, it may look like the following: > > ALTER SUBSCRIPTION mysub > REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’); > It will lead to skipping all future changes to that table by apply worker as we skip applying till the table is in READY state. So, all changes for transactions will get applied but the ones where we skipped copy which could lead to inconsistency. I think the better way to allow skip copying of initial data for particular tables is to someway provision copy_data = off for a set of tables. -- With Regards, Amit Kapila.
On 17/2/26 05:07, Amit Kapila wrote: > On Mon, Feb 16, 2026 at 6:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >> ALTER SUBSCRIPTION mysub >> REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’); >> > > It will lead to skipping all future changes to that table by apply > worker as we skip applying till the table is in READY state. So, all > changes for transactions will get applied but the ones where we > skipped copy which could lead to inconsistency. I think the better way > to allow skip copying of initial data for particular tables is to > someway provision copy_data = off for a set of tables. Hmm, in my mind, there should be a FAIL table state introduced to let users know that a specific table has not been synchronised, and they need to check and repeat a smaller part of the job. Or do you mean that a synchronising table might already contain some data, and that it is impossible to undo the sync and repeat it? -- regards, Andrei Lepikhov, pgEdge
On Tue, Feb 17, 2026 at 9:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > > On 17/2/26 05:07, Amit Kapila wrote: > > On Mon, Feb 16, 2026 at 6:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > >> ALTER SUBSCRIPTION mysub > >> REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’); > >> > > > > It will lead to skipping all future changes to that table by apply > > worker as we skip applying till the table is in READY state. So, all > > changes for transactions will get applied but the ones where we > > skipped copy which could lead to inconsistency. I think the better way > > to allow skip copying of initial data for particular tables is to > > someway provision copy_data = off for a set of tables. > Hmm, in my mind, there should be a FAIL table state introduced to let > users know that a specific table has not been synchronised, and they > need to check and repeat a smaller part of the job. > Or do you mean that a synchronising table might already contain some > data, and that it is impossible to undo the sync and repeat it? > If we introduce a new state like FAIL for a table, then in that state apply_worker should skip the new updates for that table (see should_apply_changes_for_rel()) till the copy is successful. So, all other changes in future transactions will keep getting applied except for tables that have failed status. I think this could lead to inconsistency while replicating data. -- With Regards, Amit Kapila.
On 17/2/26 06:11, Amit Kapila wrote: > On Tue, Feb 17, 2026 at 9:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > If we introduce a new state like FAIL for a table, then in that state > apply_worker should skip the new updates for that table (see > should_apply_changes_for_rel()) till the copy is successful. So, all > other changes in future transactions will keep getting applied except > for tables that have failed status. I think this could lead to > inconsistency while replicating data. Thanks for your answer, but I still don't get the idea. The case I am talking about is the following: In the absence of DDL propagation, it is a good palliative to DROP a table from publication, perform the necessary ALTER TABLEs on both sides, and ADD the table back to the publication. I only proposed that if the REFRESH PUBLICATION that re-introduced such a table fails, complain and remove it from the subscription, as if you had never executed the 'REFRESH PUBLICATION' command. Where is the inconsistency? -- regards, Andrei Lepikhov, pgEdge
On Tue, Feb 17, 2026 at 1:39 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > > On 17/2/26 06:11, Amit Kapila wrote: > > On Tue, Feb 17, 2026 at 9:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > > If we introduce a new state like FAIL for a table, then in that state > > apply_worker should skip the new updates for that table (see > > should_apply_changes_for_rel()) till the copy is successful. So, all > > other changes in future transactions will keep getting applied except > > for tables that have failed status. I think this could lead to > > inconsistency while replicating data. > > Thanks for your answer, but I still don't get the idea. > > The case I am talking about is the following: > In the absence of DDL propagation, it is a good palliative to DROP a > table from publication, perform the necessary ALTER TABLEs on both > sides, and ADD the table back to the publication. > > I only proposed that if the REFRESH PUBLICATION that re-introduced such > a table fails, complain and remove it from the subscription, as if you > had never executed the 'REFRESH PUBLICATION' command. Where is the > inconsistency? > The key point is that the apply worker will continue even when the table_sync worker has failed and skipped the relation from syncing. Consider an example, where the user adds two tables t1_pk, t1_fk to the publication and then performs REFRESH. In this case, further consider, the application maintains the primary/foreign key relation between these two tables which means the app will delete rows from both pk and fk tables. In such a situation, say when REFRESH command is executed and table sync for t1_fk failed and we mark that table status as FAILED, then the future replication will perform operations only on t_pk table even though the same transaction has delete on both t1_pk and t_fk tables. This is because we don't apply operations on a table for which initial sync is not finished yet which will be true for t_fk. -- With Regards, Amit Kapila.
On 19/2/26 07:44, Amit Kapila wrote: > On Tue, Feb 17, 2026 at 1:39 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >> ... >> I only proposed that if the REFRESH PUBLICATION that re-introduced such >> a table fails, complain and remove it from the subscription, as if you >> had never executed the 'REFRESH PUBLICATION' command. Where is the >> inconsistency? > ... > In such a situation, say when REFRESH command is executed and table > sync for t1_fk failed and we mark that table status as FAILED, then > the future replication will perform operations only on t_pk table even > though the same transaction has delete on both t1_pk and t_fk tables. > This is because we don't apply operations on a table for which initial > sync is not finished yet which will be true for t_fk. Thank you for clarifying. A proper 'soft' exception should revert all changes made by REFRESH PUBLICATION on the subscriber's side. However, AlterSubscription_refresh updates and commits the publication's state in pg_catalog before synchronisation begins in the apply worker. Implementing a full revert would require significant redesign and may be more appropriate for enterprise solutions seeking this level of functionality. Additionally, automation scripts may depend on subscription statistics, particularly 'sync_seq_error_count' and 'sync_table_error_count'. I was using an older Postgres version and did not realise that newer versions now provide clear indicators when synchronisation issues occur. -- regards, Andrei Lepikhov, pgEdge