Обсуждение: Add into REFRESH PUBLICATION parameter exception_behaviour

Поиск
Список
Период
Сортировка

Add into REFRESH PUBLICATION parameter exception_behaviour

От
Andrei Lepikhov
Дата:
Hi,

The LR Subscription has an option disable_on_error that allows the user 
to define the exception behaviour in case of an error during 
replication: Postgres can disable the subscription or let it retry the 
operation later. It seems ok for the subscription as a whole.


Since the introduction of 'ADD/DROP table' into the ALTER PUBLICATION 
command, we have an option to sync/resync some table - it may be useful 
in case of row filter changed or to ensure we have correct data if 
conflict stats has detected potential issues.
So, when executing REFRESH PUBLICATION or even performing the initial 
synchronisation of dozens of tables, Postgres may stop the whole process 
if a table has synchronisation issues - it may be a constraint, or we 
forget to truncate a specific table on the subscriber and get a conflict.


So, for the sake of not disabling the whole subscription that may be 
replicating a massive database and not interrupting the synchronisation 
process, it makes sense to introduce a parameter, like reloption, that 
could override the current exception behaviour.
For example, exception_behaviour can be 'disable', 'retry', or 'skip'. 
The last value allows skipping errors during a table copy, checking sync 
statuses later, fixing issues and repeating the process for problematic 
tables.

In terms of SQL, it may look like the following:

ALTER SUBSCRIPTION mysub
   REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’);

Does it make sense?

-- 
regards, Andrei Lepikhov,
pgEdge




Re: Add into REFRESH PUBLICATION parameter exception_behaviour

От
Amit Kapila
Дата:
On Mon, Feb 16, 2026 at 6:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> The LR Subscription has an option disable_on_error that allows the user
> to define the exception behaviour in case of an error during
> replication: Postgres can disable the subscription or let it retry the
> operation later. It seems ok for the subscription as a whole.
>
>
> Since the introduction of 'ADD/DROP table' into the ALTER PUBLICATION
> command, we have an option to sync/resync some table - it may be useful
> in case of row filter changed or to ensure we have correct data if
> conflict stats has detected potential issues.
> So, when executing REFRESH PUBLICATION or even performing the initial
> synchronisation of dozens of tables, Postgres may stop the whole process
> if a table has synchronisation issues - it may be a constraint, or we
> forget to truncate a specific table on the subscriber and get a conflict.
>
>
> So, for the sake of not disabling the whole subscription that may be
> replicating a massive database and not interrupting the synchronisation
> process, it makes sense to introduce a parameter, like reloption, that
> could override the current exception behaviour.
> For example, exception_behaviour can be 'disable', 'retry', or 'skip'.
> The last value allows skipping errors during a table copy, checking sync
> statuses later, fixing issues and repeating the process for problematic
> tables.
>
> In terms of SQL, it may look like the following:
>
> ALTER SUBSCRIPTION mysub
>    REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’);
>

It will lead to skipping all future changes to that table by apply
worker as we skip applying till the table is in READY state. So, all
changes for transactions will get applied but the ones where we
skipped copy which could lead to inconsistency. I think the better way
to allow skip copying of initial data for particular tables is to
someway provision copy_data = off for a set of tables.

--
With Regards,
Amit Kapila.



Re: Add into REFRESH PUBLICATION parameter exception_behaviour

От
Andrei Lepikhov
Дата:
On 17/2/26 05:07, Amit Kapila wrote:
> On Mon, Feb 16, 2026 at 6:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> ALTER SUBSCRIPTION mysub
>>     REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’);
>>
> 
> It will lead to skipping all future changes to that table by apply
> worker as we skip applying till the table is in READY state. So, all
> changes for transactions will get applied but the ones where we
> skipped copy which could lead to inconsistency. I think the better way
> to allow skip copying of initial data for particular tables is to
> someway provision copy_data = off for a set of tables.
Hmm, in my mind, there should be a FAIL table state introduced to let 
users know that a specific table has not been synchronised, and they 
need to check and repeat a smaller part of the job.
Or do you mean that a synchronising table might already contain some 
data, and that it is impossible to undo the sync and repeat it?

-- 
regards, Andrei Lepikhov,
pgEdge



Re: Add into REFRESH PUBLICATION parameter exception_behaviour

От
Amit Kapila
Дата:
On Tue, Feb 17, 2026 at 9:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 17/2/26 05:07, Amit Kapila wrote:
> > On Mon, Feb 16, 2026 at 6:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >> ALTER SUBSCRIPTION mysub
> >>     REFRESH PUBLICATION (WITH exception_behaviour = ‘skip’);
> >>
> >
> > It will lead to skipping all future changes to that table by apply
> > worker as we skip applying till the table is in READY state. So, all
> > changes for transactions will get applied but the ones where we
> > skipped copy which could lead to inconsistency. I think the better way
> > to allow skip copying of initial data for particular tables is to
> > someway provision copy_data = off for a set of tables.
> Hmm, in my mind, there should be a FAIL table state introduced to let
> users know that a specific table has not been synchronised, and they
> need to check and repeat a smaller part of the job.
> Or do you mean that a synchronising table might already contain some
> data, and that it is impossible to undo the sync and repeat it?
>

If we introduce a new state like FAIL for a table, then in that state
apply_worker should skip the new updates for that table (see
should_apply_changes_for_rel()) till the copy is successful. So, all
other changes in future transactions will keep getting applied except
for tables that have failed status. I think this could lead to
inconsistency while replicating data.

--
With Regards,
Amit Kapila.



Re: Add into REFRESH PUBLICATION parameter exception_behaviour

От
Andrei Lepikhov
Дата:
On 17/2/26 06:11, Amit Kapila wrote:
> On Tue, Feb 17, 2026 at 9:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> If we introduce a new state like FAIL for a table, then in that state
> apply_worker should skip the new updates for that table (see
> should_apply_changes_for_rel()) till the copy is successful. So, all
> other changes in future transactions will keep getting applied except
> for tables that have failed status. I think this could lead to
> inconsistency while replicating data.

Thanks for your answer, but I still don't get the idea.

The case I am talking about is the following:
In the absence of DDL propagation, it is a good palliative to DROP a 
table from publication, perform the necessary ALTER TABLEs on both 
sides, and ADD the table back to the publication.

I only proposed that if the REFRESH PUBLICATION that re-introduced such 
a table fails, complain and remove it from the subscription, as if you 
had never executed the 'REFRESH PUBLICATION' command. Where is the 
inconsistency?

-- 
regards, Andrei Lepikhov,
pgEdge



Re: Add into REFRESH PUBLICATION parameter exception_behaviour

От
Amit Kapila
Дата:
On Tue, Feb 17, 2026 at 1:39 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 17/2/26 06:11, Amit Kapila wrote:
> > On Tue, Feb 17, 2026 at 9:52 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > If we introduce a new state like FAIL for a table, then in that state
> > apply_worker should skip the new updates for that table (see
> > should_apply_changes_for_rel()) till the copy is successful. So, all
> > other changes in future transactions will keep getting applied except
> > for tables that have failed status. I think this could lead to
> > inconsistency while replicating data.
>
> Thanks for your answer, but I still don't get the idea.
>
> The case I am talking about is the following:
> In the absence of DDL propagation, it is a good palliative to DROP a
> table from publication, perform the necessary ALTER TABLEs on both
> sides, and ADD the table back to the publication.
>
> I only proposed that if the REFRESH PUBLICATION that re-introduced such
> a table fails, complain and remove it from the subscription, as if you
> had never executed the 'REFRESH PUBLICATION' command. Where is the
> inconsistency?
>

The key point is that the apply worker will continue even when the
table_sync worker has failed and skipped the relation from syncing.
Consider an example, where the user adds two tables t1_pk, t1_fk to
the publication and then performs REFRESH. In this case, further
consider, the application maintains the primary/foreign key relation
between these two tables which means the app will delete rows from
both pk and fk tables.

In such a situation, say when REFRESH command is executed and table
sync for t1_fk failed and we mark that table status as FAILED, then
the future replication will perform operations only on t_pk table even
though the same transaction has delete on both t1_pk and t_fk tables.
This is because we don't apply operations on a table for which initial
sync is not finished yet which will be true for t_fk.

--
With Regards,
Amit Kapila.



Re: Add into REFRESH PUBLICATION parameter exception_behaviour

От
Andrei Lepikhov
Дата:
On 19/2/26 07:44, Amit Kapila wrote:
> On Tue, Feb 17, 2026 at 1:39 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>> ...
>> I only proposed that if the REFRESH PUBLICATION that re-introduced such
>> a table fails, complain and remove it from the subscription, as if you
>> had never executed the 'REFRESH PUBLICATION' command. Where is the
>> inconsistency?
> ...
> In such a situation, say when REFRESH command is executed and table
> sync for t1_fk failed and we mark that table status as FAILED, then
> the future replication will perform operations only on t_pk table even
> though the same transaction has delete on both t1_pk and t_fk tables.
> This is because we don't apply operations on a table for which initial
> sync is not finished yet which will be true for t_fk.
Thank you for clarifying.

A proper 'soft' exception should revert all changes made by REFRESH 
PUBLICATION on the subscriber's side. However, AlterSubscription_refresh 
updates and commits the publication's state in pg_catalog before 
synchronisation begins in the apply worker. Implementing a full revert 
would require significant redesign and may be more appropriate for 
enterprise solutions seeking this level of functionality.

Additionally, automation scripts may depend on subscription statistics, 
particularly 'sync_seq_error_count' and 'sync_table_error_count'. I was 
using an older Postgres version and did not realise that newer versions 
now provide clear indicators when synchronisation issues occur.

-- 
regards, Andrei Lepikhov,
pgEdge