Обсуждение: Conflict detection and logging in logical replication

Поиск
Список
Период
Сортировка

Conflict detection and logging in logical replication

От
"Zhijie Hou (Fujitsu)"
Дата:
Hi hackers,
Cc people involved in the original thread[1].

I am starting a new thread to share and discuss the implementation of
conflict detection and logging in logical replication, as well as the
collection of statistics related to these conflicts.

In the original conflict resolution thread[1], we have decided to
split this work into multiple patches to facilitate incremental progress
towards supporting conflict resolution in logical replication. This phased
approach will allow us to address simpler tasks first. The overall work
plan involves: 1. conflict detection (detect and log conflicts like
'insert_exists', 'update_differ', 'update_missing', and 'delete_missing')
2. implement simple built-in resolution strategies like
'apply(remote_apply)' and 'skip(keep_local)'. 3. monitor capability for
conflicts and resolutions in statistics or history table.

Following the feedback received from PGconf.dev and discussions in the
conflict resolution thread, features 1 and 3 are important independently.
So, we start a separate thread for them.

Here are the basic designs for the detection and statistics:

- The detail of the conflict detection

We add a new parameter detect_conflict for CREATE and ALTER subscription
commands. This new parameter will decide if subscription will go for
confict detection. By default, conflict detection will be off for a
subscription.

When conflict detection is enabled, additional logging is triggered in the
following conflict scenarios:
insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint.
update_differ: updating a row that was previously modified by another origin.
update_missing: The tuple to be updated is missing.
delete_missing: The tuple to be deleted is missing.

For insert_exists conflict, the log can include origin and commit
timestamp details of the conflicting key with track_commit_timestamp
enabled. And update_differ conflict can only be detected when
track_commit_timestamp is enabled.

Regarding insert_exists conflicts, the current design is to pass
noDupErr=true in ExecInsertIndexTuples() to prevent immediate error
handling on duplicate key violation. After calling
ExecInsertIndexTuples(), if there was any potential conflict in the
unique indexes, we report an ERROR for the insert_exists conflict along
with additional information (origin, committs, key value) for the
conflicting row. Another way for this is to conduct a pre-check for
duplicate key violation before applying the INSERT operation, but this
could introduce overhead for each INSERT even in the absence of conflicts.
We welcome any alternative viewpoints on this matter.

- The detail of statistics collection

We add columns(insert_exists_count, update_differ_count,
update_missing_count, delete_missing_count) in view
pg_stat_subscription_workers to shows information about the conflict which
occur during the application of logical replication changes.

The conflicts will be tracked when track_conflict option of the
subscription is enabled. Additionally, update_differ can be detected only
when track_commit_timestamp is enabled.


The patches for above features are attached.
Suggestions and comments are highly appreciated.

[1] https://www.postgresql.org/message-id/CAA4eK1LgPyzPr_Vrvvr4syrde4hyT%3DQQnGjdRUNP-tz3eYa%3DGQ%40mail.gmail.com

Best Regards,
Hou Zhijie


Вложения

RE: Conflict detection and logging in logical replication

От
"Zhijie Hou (Fujitsu)"
Дата:
On Friday, June 21, 2024 3:47 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> - The detail of the conflict detection
>
> We add a new parameter detect_conflict for CREATE and ALTER subscription
> commands. This new parameter will decide if subscription will go for
> confict detection. By default, conflict detection will be off for a
> subscription.
>
> When conflict detection is enabled, additional logging is triggered in the
> following conflict scenarios:
> insert_exists: Inserting a row that violates a NOT DEFERRABLE unique
> constraint.
> update_differ: updating a row that was previously modified by another origin.
> update_missing: The tuple to be updated is missing.
> delete_missing: The tuple to be deleted is missing.
>
> For insert_exists conflict, the log can include origin and commit
> timestamp details of the conflicting key with track_commit_timestamp
> enabled. And update_differ conflict can only be detected when
> track_commit_timestamp is enabled.
>
> Regarding insert_exists conflicts, the current design is to pass
> noDupErr=true in ExecInsertIndexTuples() to prevent immediate error
> handling on duplicate key violation. After calling
> ExecInsertIndexTuples(), if there was any potential conflict in the
> unique indexes, we report an ERROR for the insert_exists conflict along
> with additional information (origin, committs, key value) for the
> conflicting row. Another way for this is to conduct a pre-check for
> duplicate key violation before applying the INSERT operation, but this
> could introduce overhead for each INSERT even in the absence of conflicts.
> We welcome any alternative viewpoints on this matter.

When testing the patch, I noticed a bug that when reporting the conflict
after calling ExecInsertIndexTuples(), we might find the tuple that we
just inserted and report it.(we should only report conflict if there are
other conflict tuples which are not inserted by us) Here is a new patch
which fixed this and fixed a compile warning reported by CFbot.

Best Regards,
Hou zj



Вложения

Re: Conflict detection and logging in logical replication

От
shveta malik
Дата:
On Mon, Jun 24, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> When testing the patch, I noticed a bug that when reporting the conflict
> after calling ExecInsertIndexTuples(), we might find the tuple that we
> just inserted and report it.(we should only report conflict if there are
> other conflict tuples which are not inserted by us) Here is a new patch
> which fixed this and fixed a compile warning reported by CFbot.
>

Thanks for the patch. Few comments:

1) Few typos:
Commit msg of patch001:     iolates--> violates
execIndexing.c:                      ingored --> ignored

2) Commit msg of stats patch: "The commit adds columns in view
pg_stat_subscription_workers to shows"
--"pg_stat_subscription_workers" --> "pg_stat_subscription_stats"

3) I feel, chapter '31.5. Conflicts' in docs should also mention about
detection or point to the page where it is already mentioned.

thanks
Shveta



Re: Conflict detection and logging in logical replication

От
Nisha Moond
Дата:
On Mon, Jun 24, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> When testing the patch, I noticed a bug that when reporting the conflict
> after calling ExecInsertIndexTuples(), we might find the tuple that we
> just inserted and report it.(we should only report conflict if there are
> other conflict tuples which are not inserted by us) Here is a new patch
> which fixed this and fixed a compile warning reported by CFbot.
>
Thank you for the patch!
A review comment: The patch does not detect 'update_differ' conflicts
when the Publisher has a non-partitioned table and the Subscriber has
a partitioned version.

Here’s a simple failing test case:
Pub: create table tab (a int primary key, b int not null, c varchar(5));

Sub: create table tab (a int not null, b int not null, c varchar(5))
partition by range (b);
alter table tab add constraint tab_pk primary key (a, b);
create table tab_1 partition of tab for values from (minvalue) to (100);
create table tab_2 partition of tab for values from (100) to (maxvalue);

With the above setup, in case the Subscriber table has a tuple with
its own origin, the incoming remote update from the Publisher fails to
detect the 'update_differ' conflict.

--
Thanks,
Nisha



RE: Conflict detection and logging in logical replication

От
"Zhijie Hou (Fujitsu)"
Дата:
On Monday, June 24, 2024 8:35 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
> 
> On Mon, Jun 24, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > When testing the patch, I noticed a bug that when reporting the
> > conflict after calling ExecInsertIndexTuples(), we might find the
> > tuple that we just inserted and report it.(we should only report
> > conflict if there are other conflict tuples which are not inserted by
> > us) Here is a new patch which fixed this and fixed a compile warning
> reported by CFbot.
> >
> Thank you for the patch!
> A review comment: The patch does not detect 'update_differ' conflicts when
> the Publisher has a non-partitioned table and the Subscriber has a partitioned
> version.

Thanks for reporting the issue !

Here is the new version patch set which fixed this issue. I also fixed
some typos and improved the doc in logical replication conflict based
on the comments from Shveta[1].

[1] https://www.postgresql.org/message-id/CAJpy0uABSf15E%2BbMDBRCpbFYo0dh4N%3DEtpv%2BSNw6RMy8ohyrcQ%40mail.gmail.com

Best Regards,
Hou zj

Вложения