Re: Conflict Detection and Resolution

Поиск
Список
Период
Сортировка
От Nisha Moond
Тема Re: Conflict Detection and Resolution
Дата
Msg-id CABdArM7Uos6x5cRiOGUUi-O41kZy6eiXUS+zw_UGTO9t9LAuWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Conflict Detection and Resolution  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: Conflict Detection and Resolution
Список pgsql-hackers
On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > On 5/23/24 08:36, shveta malik wrote:
> > > Hello hackers,
> > >
> > > Please find the proposal for Conflict Detection and Resolution (CDR)
> > > for Logical replication.
> > > <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
> > > below details.>
> > >
> > > Introduction
> > > ================
> > > In case the node is subscribed to multiple providers, or when local
> > > writes happen on a subscriber, conflicts can arise for the incoming
> > > changes.  CDR is the mechanism to automatically detect and resolve
> > > these conflicts depending on the application and configurations.
> > > CDR is not applicable for the initial table sync. If locally, there
> > > exists conflicting data on the table, the table sync worker will fail.
> > > Please find the details on CDR in apply worker for INSERT, UPDATE and
> > > DELETE operations:
> > >
> >
> > Which architecture are you aiming for? Here you talk about multiple
> > providers, but the wiki page mentions active-active. I'm not sure how
> > much this matters, but it might.
>
> Currently, we are working for multi providers case but ideally it
> should work for active-active also. During further discussion and
> implementation phase, if we find that, there are cases which will not
> work in straight-forward way for active-active, then our primary focus
> will remain to first implement it for multiple providers architecture.
>
> >
> > Also, what kind of consistency you expect from this? Because none of
> > these simple conflict resolution methods can give you the regular
> > consistency models we're used to, AFAICS.
>
> Can you please explain a little bit more on this.
>
> >
> > > INSERT
> > > ================
> > > To resolve INSERT conflict on subscriber, it is important to find out
> > > the conflicting row (if any) before we attempt an insertion. The
> > > indexes or search preference for the same will be:
> > > First check for replica identity (RI) index.
> > >   - if not found, check for the primary key (PK) index.
> > >     - if not found, then check for unique indexes (individual ones or
> > > added by unique constraints)
> > >          - if unique index also not found, skip CDR
> > >
> > > Note: if no RI index, PK, or unique index is found but
> > > REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
> > > The reason being that even though a row can be identified with
> > > REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
> > > rows. Hence, we should not go for conflict detection in such a case.
> > >
> >
> > It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
> > allowed to have duplicate values? It just means the upstream is sending
> > the whole original row, there can still be a PK/UNIQUE index on both the
> > publisher and subscriber.
>
> Yes, right. Sorry for confusion. I meant the same i.e. in absence of
> 'RI index, PK, or unique index', tables can have duplicates. So even
> in presence of Replica-identity (FULL in this case) but in absence of
> unique/primary index, CDR will be skipped for INSERT.
>
> >
> > > In case of replica identity ‘nothing’ and in absence of any suitable
> > > index (as defined above), CDR will be skipped for INSERT.
> > >
> > > Conflict Type:
> > > ----------------
> > > insert_exists: A conflict is detected when the table has the same
> > > value for a key column as the new value in the incoming row.
> > >
> > > Conflict Resolution
> > > ----------------
> > > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> > > c) apply:   Always apply the remote change.
> > > d) skip:    Remote change is skipped.
> > > e) error:   Error out on conflict. Replication is stopped, manual
> > > action is needed.
> > >
> >
> > Why not to have some support for user-defined conflict resolution
> > methods, allowing to do more complex stuff (e.g. merging the rows in
> > some way, perhaps even with datatype-specific behavior)?
>
> Initially, for the sake of simplicity, we are targeting to support
> built-in resolvers. But we have a plan to work on user-defined
> resolvers as well. We shall propose that separately.
>
> >
> > > The change will be converted to 'UPDATE' and applied if the decision
> > > is in favor of applying remote change.
> > >
> > > It is important to have commit timestamp info available on subscriber
> > > when latest_timestamp_wins or earliest_timestamp_wins method is chosen
> > > as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
> > > on subscriber, in absence of which, configuring the said
> > > timestamp-based resolution methods will result in error.
> > >
> > > Note: If the user has chosen the latest or earliest_timestamp_wins,
> > > and the remote and local timestamps are the same, then it will go by
> > > system identifier. The change with a higher system identifier will
> > > win. This will ensure that the same change is picked on all the nodes.
> >
> > How is this going to deal with the fact that commit LSN and timestamps
> > may not correlate perfectly? That is, commits may happen with LSN1 <
> > LSN2 but with T1 > T2.
>
> Are you pointing to the issue where a session/txn has taken
> 'xactStopTimestamp' timestamp earlier but is delayed to insert record
> in XLOG, while another session/txn which has taken timestamp slightly
> later succeeded to insert the record IN XLOG sooner than the session1,
> making LSN and Timestamps out of sync? Going by this scenario, the
> commit-timestamp may not be reflective of actual commits and thus
> timestamp-based resolvers may take wrong decisions. Or do you mean
> something else?
>
> If this is the problem you are referring to, then I think this needs a
> fix at the publisher side. Let me think more about it . Kindly let me
> know if you have ideas on how to tackle it.
>
> > >
> > >  UPDATE
> > > ================
> > >
> > > Conflict Detection Method:
> > > --------------------------------
> > > Origin conflict detection: The ‘origin’ info is used to detect
> > > conflict which can be obtained from commit-timestamp generated for
> > > incoming txn at the source node. To compare remote’s origin with the
> > > local’s origin, we must have origin information for local txns as well
> > > which can be obtained from commit-timestamp after enabling
> > > ‘track_commit_timestamp’ locally.
> > > The one drawback here is the ‘origin’ information cannot be obtained
> > > once the row is frozen and the commit-timestamp info is removed by
> > > vacuum. For a frozen row, conflicts cannot be raised, and thus the
> > > incoming changes will be applied in all the cases.
> > >
> > > Conflict Types:
> > > ----------------
> > > a) update_differ: The origin of an incoming update's key row differs
> > > from the local row i.e.; the row has already been updated locally or
> > > by different nodes.
> > > b) update_missing: The row with the same value as that incoming
> > > update's key does not exist. Remote is trying to update a row which
> > > does not exist locally.
> > > c) update_deleted: The row with the same value as that incoming
> > > update's key does not exist. The row is already deleted. This conflict
> > > type is generated only if the deleted row is still detectable i.e., it
> > > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > > it cannot detect this conflict. It will detect it as update_missing
> > > and will follow the default or configured resolver of update_missing
> > > itself.
> > >
> >
> > I don't understand the why should update_missing or update_deleted be
> > different, especially considering it's not detected reliably. And also
> > that even if we happen to find the row the associated TOAST data may
> > have already been removed. So why would this matter?
>
> Here, we are trying to tackle the case where the row is 'recently'
> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> want to opt for a different resolution in such a case as against the
> one where the corresponding row was not even present in the first
> place. The case where the row was deleted long back may not fall into
> this category as there are higher chances that they have been removed
> by vacuum and can be considered equivalent to the update_ missing
> case.
>
> Regarding "TOAST column" for deleted row cases, we may need to dig
> more. Thanks for bringing this case. Let me analyze more here.
>
I tested a simple case with a table with one TOAST column and found
that when a tuple with a TOAST column is deleted, both the tuple and
corresponding pg_toast entries are marked as ‘deleted’ (dead) but not
removed immediately. The main tuple and respective pg_toast entry are
permanently deleted only during vacuum. First, the main table’s dead
tuples are vacuumed, followed by the secondary TOAST relation ones (if
available).
Please let us know if you have a specific scenario in mind where the
TOAST column data is deleted immediately upon ‘delete’ operation,
rather than during vacuum, which we are missing.

Thanks,
Nisha



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Anthonin Bonnefoy
Дата:
Сообщение: Re: Use pgBufferUsage for block reporting in analyze
Следующее
От: Andy Fan
Дата:
Сообщение: Re: Parallel CREATE INDEX for GIN indexes