Обсуждение: Add an option to skip loading missing publication to avoid logical replication failure

Поиск
Список
Период
Сортировка
Hi,

Currently ALTER SUBSCRIPTION ... SET PUBLICATION will break the
logical replication in certain cases. This can happen as the apply
worker will get restarted after SET PUBLICATION, the apply worker will
use the existing slot and replication origin corresponding to the
subscription. Now, it is possible that before restart the origin has
not been updated and the WAL start location points to a location prior
to where PUBLICATION pub exists which can lead to such an error. Once
this error occurs, apply worker will never be able to proceed and will
always return the same error.

There was discussion on this and Amit had posted a patch to handle
this at [2]. Amit's patch does continue using a historic snapshot but
ignores publications that are not found for the purpose of computing
RelSyncEntry attributes. We won't mark such an entry as valid till all
the publications are loaded without anything missing. This means we
won't publish operations on tables corresponding to that publication
till we found such a publication and that seems okay.
I have added an option skip_not_exist_publication to enable this
operation only when skip_not_exist_publication is specified as true.
There is no change in default behavior when skip_not_exist_publication
is specified as false.

But one thing to note with the patch (with skip_not_exist_publication
option) is that replication of few WAL entries will be skipped till
the publication is loaded like in the below example:
-- Create table in publisher and subscriber
create table t1(c1 int);
create table t2(c1 int);

-- Create publications
create publication pub1 for table t1;
create publication pub2 for table t2;

-- Create subscription
create subscription test1 connection 'dbname=postgres host=localhost
port=5432' publication pub1, pub2;

-- Drop one publication
drop publication pub1;

-- Insert in the publisher
insert into t1 values(11);
insert into t2 values(21);

-- Select in subscriber
postgres=# select * from t1;
 c1
----
(0 rows)

postgres=# select * from t2;
 c1
----
 21
(1 row)

-- Create the dropped publication in publisher
create publication pub1 for table t1;

-- Insert in the publisher
insert into t1 values(12);
postgres=# select * from t1;
 c1
----
 11
 12
(2 rows)

-- Select data in subscriber
postgres=# select * from t1; -- record with value 11 will be missing
in subscriber
 c1
----
 12
(1 row)

Thoughts?

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BT-ETXeRM4DHWzGxBpKafLCp__5bPA_QZfFQp7-0wj4Q%40mail.gmail.com

Regards,
Vignesh

Вложения
On Mon, 19 Feb 2024 at 12:48, vignesh C <vignesh21@gmail.com> wrote:
>
> Hi,
>
> Currently ALTER SUBSCRIPTION ... SET PUBLICATION will break the
> logical replication in certain cases. This can happen as the apply
> worker will get restarted after SET PUBLICATION, the apply worker will
> use the existing slot and replication origin corresponding to the
> subscription. Now, it is possible that before restart the origin has
> not been updated and the WAL start location points to a location prior
> to where PUBLICATION pub exists which can lead to such an error. Once
> this error occurs, apply worker will never be able to proceed and will
> always return the same error.
>
> There was discussion on this and Amit had posted a patch to handle
> this at [2]. Amit's patch does continue using a historic snapshot but
> ignores publications that are not found for the purpose of computing
> RelSyncEntry attributes. We won't mark such an entry as valid till all
> the publications are loaded without anything missing. This means we
> won't publish operations on tables corresponding to that publication
> till we found such a publication and that seems okay.
> I have added an option skip_not_exist_publication to enable this
> operation only when skip_not_exist_publication is specified as true.
> There is no change in default behavior when skip_not_exist_publication
> is specified as false.

I have updated the patch to now include changes for pg_dump, added few
tests, describe changes and added documentation changes. The attached
v2 version patch has the changes for the same.

Regards,
Vignesh

Вложения