Conflict detection and logging in logical replication

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема Conflict detection and logging in logical replication
Дата
Msg-id OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответы RE: Conflict detection and logging in logical replication
Список pgsql-hackers
Hi hackers,
Cc people involved in the original thread[1].

I am starting a new thread to share and discuss the implementation of
conflict detection and logging in logical replication, as well as the
collection of statistics related to these conflicts.

In the original conflict resolution thread[1], we have decided to
split this work into multiple patches to facilitate incremental progress
towards supporting conflict resolution in logical replication. This phased
approach will allow us to address simpler tasks first. The overall work
plan involves: 1. conflict detection (detect and log conflicts like
'insert_exists', 'update_differ', 'update_missing', and 'delete_missing')
2. implement simple built-in resolution strategies like
'apply(remote_apply)' and 'skip(keep_local)'. 3. monitor capability for
conflicts and resolutions in statistics or history table.

Following the feedback received from PGconf.dev and discussions in the
conflict resolution thread, features 1 and 3 are important independently.
So, we start a separate thread for them.

Here are the basic designs for the detection and statistics:

- The detail of the conflict detection

We add a new parameter detect_conflict for CREATE and ALTER subscription
commands. This new parameter will decide if subscription will go for
confict detection. By default, conflict detection will be off for a
subscription.

When conflict detection is enabled, additional logging is triggered in the
following conflict scenarios:
insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint.
update_differ: updating a row that was previously modified by another origin.
update_missing: The tuple to be updated is missing.
delete_missing: The tuple to be deleted is missing.

For insert_exists conflict, the log can include origin and commit
timestamp details of the conflicting key with track_commit_timestamp
enabled. And update_differ conflict can only be detected when
track_commit_timestamp is enabled.

Regarding insert_exists conflicts, the current design is to pass
noDupErr=true in ExecInsertIndexTuples() to prevent immediate error
handling on duplicate key violation. After calling
ExecInsertIndexTuples(), if there was any potential conflict in the
unique indexes, we report an ERROR for the insert_exists conflict along
with additional information (origin, committs, key value) for the
conflicting row. Another way for this is to conduct a pre-check for
duplicate key violation before applying the INSERT operation, but this
could introduce overhead for each INSERT even in the absence of conflicts.
We welcome any alternative viewpoints on this matter.

- The detail of statistics collection

We add columns(insert_exists_count, update_differ_count,
update_missing_count, delete_missing_count) in view
pg_stat_subscription_workers to shows information about the conflict which
occur during the application of logical replication changes.

The conflicts will be tracked when track_conflict option of the
subscription is enabled. Additionally, update_differ can be detected only
when track_commit_timestamp is enabled.


The patches for above features are attached.
Suggestions and comments are highly appreciated.

[1] https://www.postgresql.org/message-id/CAA4eK1LgPyzPr_Vrvvr4syrde4hyT%3DQQnGjdRUNP-tz3eYa%3DGQ%40mail.gmail.com

Best Regards,
Hou Zhijie


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Pgoutput not capturing the generated columns
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: call for applications: mentoring program for code contributors