Re: Conflict Detection and Resolution

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Conflict Detection and Resolution
Дата
Msg-id a3a70a19-a35e-426c-8646-0898cdc207c8@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Conflict Detection and Resolution  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Conflict Detection and Resolution
Список pgsql-hackers
On 6/10/24 10:54, Amit Kapila wrote:
> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 5/27/24 07:48, shveta malik wrote:
>>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>
>>>> Which architecture are you aiming for? Here you talk about multiple
>>>> providers, but the wiki page mentions active-active. I'm not sure how
>>>> much this matters, but it might.
>>>
>>> Currently, we are working for multi providers case but ideally it
>>> should work for active-active also. During further discussion and
>>> implementation phase, if we find that, there are cases which will not
>>> work in straight-forward way for active-active, then our primary focus
>>> will remain to first implement it for multiple providers architecture.
>>>
>>>>
>>>> Also, what kind of consistency you expect from this? Because none of
>>>> these simple conflict resolution methods can give you the regular
>>>> consistency models we're used to, AFAICS.
>>>
>>> Can you please explain a little bit more on this.
>>>
>>
>> I was referring to the well established consistency models / isolation
>> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
>> guarantees the application developer can expect, what anomalies can
>> happen, etc.
>>
>> I don't think any such isolation level can be implemented with a simple
>> conflict resolution methods like last-update-wins etc. For example,
>> consider an active-active where both nodes do
>>
>>   UPDATE accounts SET balance=balance+1000 WHERE id=1
>>
>> This will inevitably lead to a conflict, and while the last-update-wins
>> resolves this "consistently" on both nodes (e.g. ending with the same
>> result), it's essentially a lost update.
>>
> 
> The idea to solve such conflicts is using the delta apply technique
> where the delta from both sides will be applied to the respective
> columns. We do plan to target this as a separate patch. Now, if the
> basic conflict resolution and delta apply both can't go in one
> release, we shall document such cases clearly to avoid misuse of the
> feature.
> 

Perhaps, but it's not like having delta conflict resolution (or even
CRDT as a more generic variant) would lead to a regular consistency
model in a distributed system. At least I don't think it can achieve
that, because of the asynchronicity.

Consider a table with "CHECK (amount < 1000)" constraint, and an update
that sets (amount = amount + 900) on two nodes. AFAIK there's no way to
reconcile this using delta (or any other other) conflict resolution.

Which does not mean we should not have some form of conflict resolution,
as long as we know what the goal is. I simply don't want to spend time
working on this, add a lot of complex code, and then realize it doesn't
give us a consistency model that makes sense.

Which leads me back to my original question - what is the consistency
model this you expect to get from this (possibly when combined with some
other pieces?)?

>> This is a very simplistic example of course, I recall there are various
>> more complex examples involving foreign keys, multi-table transactions,
>> constraints, etc. But in principle it's a manifestation of the same
>> inherent limitation of conflict detection and resolution etc.
>>
>> Similarly, I believe this affects not just active-active, but also the
>> case where one node aggregates data from multiple publishers. Maybe not
>> to the same extent / it might be fine for that use case,
>>
> 
> I am not sure how much it is a problem for general logical replication
> solution but we do intend to work on solving such problems in
> step-wise manner. Trying to attempt everything in one patch doesn't
> seem advisable to me.
> 

I didn't say it needs to be done in one patch. I asked for someone to
explain what is the goal - consistency model observed by the users.

>>
>  but you said
>> the end goal is to use this for active-active. So I'm wondering what's
>> the plan, there.
>>
> 
> I think at this stage we are not ready for active-active because
> leaving aside this feature we need many other features like
> replication of all commands/objects (DDL replication, replicate large
> objects, etc.), Global sequences, some sort of global two_phase
> transaction management for data consistency, etc. So, it would be
> better to consider logical replication cases intending to extend it
> for active-active when we have other required pieces.
> 

We're not ready for active-active, sure. And I'm not saying a conflict
resolution would make us ready. The question is what consistency model
we'd like to get from the active-active, and whether conflict resolution
can get us there ...

As for the other missing bits (DDL replication, large objects, global
sequences), I think those are somewhat independent of the question I'm
asking. And some of the stuff is also somewhat optional - for example I
think it'd be fine to not support large objects or global sequences.

>> If I'm writing an application for active-active using this conflict
>> handling, what assumptions can I make? Will Can I just do stuff as if on
>> a single node, or do I need to be super conscious about the zillion ways
>> things can misbehave in a distributed system?
>>
>> My personal opinion is that the closer this will be to the regular
>> consistency levels, the better. If past experience taught me anything,
>> it's very hard to predict how distributed systems with eventual
>> consistency behave, and even harder to actually test the application in
>> such environment.
>>
> 
> I don't think in any way this can enable users to start writing
> applications for active-active workloads. For something like what you
> are saying, we probably need a global transaction manager (or a global
> two_pc) as well to allow transactions to behave as they are on
> single-node or achieve similar consistency levels. With such
> transaction management, we can allow transactions to commit on a node
> only when it doesn't lead to a conflict on the peer node.
> 

But the wiki linked in the first message says:

   CDR is an important and necessary feature for active-active
   replication.

But if I understand your response, you're saying active-active should
probably use global transaction manager etc. which would prevent
conflicts - but seems to make CDR unnecessary. Or do I understand it wrong?

FWIW I don't think we'd need global components, there are ways to do
distributed snapshots using timestamps (for example), which would give
us snapshot isolation.


>> In any case, if there are any differences compared to the usual
>> behavior, it needs to be very clearly explained in the docs.
>>
> 
> I agree that docs should be clear about the cases that this can and
> can't support.
> 
>>>>
>>>> How is this going to deal with the fact that commit LSN and timestamps
>>>> may not correlate perfectly? That is, commits may happen with LSN1 <
>>>> LSN2 but with T1 > T2.
>>>
>>> Are you pointing to the issue where a session/txn has taken
>>> 'xactStopTimestamp' timestamp earlier but is delayed to insert record
>>> in XLOG, while another session/txn which has taken timestamp slightly
>>> later succeeded to insert the record IN XLOG sooner than the session1,
>>> making LSN and Timestamps out of sync? Going by this scenario, the
>>> commit-timestamp may not be reflective of actual commits and thus
>>> timestamp-based resolvers may take wrong decisions. Or do you mean
>>> something else?
>>>
>>> If this is the problem you are referring to, then I think this needs a
>>> fix at the publisher side. Let me think more about it . Kindly let me
>>> know if you have ideas on how to tackle it.
>>>
>>
>> Yes, this is the issue I'm talking about. We're acquiring the timestamp
>> when not holding the lock to reserve space in WAL, so the LSN and the
>> commit LSN may not actually correlate.
>>
>> Consider this example I discussed with Amit last week:
>>
>> node A:
>>
>>   XACT1: UPDATE t SET v = 1;    LSN1 / T1
>>
>>   XACT2: UPDATE t SET v = 2;    LSN2 / T2
>>
>> node B
>>
>>   XACT3: UPDATE t SET v = 3;    LSN3 / T3
>>
>> And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
>> and T2 < T3 < T1. Now consider that the messages may arrive in different
>> orders, due to async replication. Unfortunately, this would lead to
>> different results of the conflict resolution:
>>
>>   XACT1 - XACT2 - XACT3 => v=3 (T3 wins)
>>
>>   XACT3 - XACT1 - XACT2 => v=2 (T2 wins)
>>
>> Now, I realize there's a flaw in this example - the (T1 > T2) inversion
>> can't actually happen, because these transactions have a dependency, and
>> thus won't commit concurrently. XACT1 will complete the commit, because
>> XACT2 starts to commit. And with monotonic clock (which is a requirement
>> for any timestamp-based resolution), that should guarantee (T1 < T2).
>>
>> However, I doubt this is sufficient to declare victory. It's more likely
>> that there still are problems, but the examples are likely more complex
>> (changes to multiple tables, etc.).
>>
> 
> Fair enough, I think we need to analyze this more to find actual
> problems or in some way try to prove that there is no problem.
> 
>> I vaguely remember there were more issues with timestamp inversion, but
>> those might have been related to parallel apply etc.
>>
> 
> Okay, so considering there are problems due to timestamp inversion, I
> think the solution to that problem would probably be somehow
> generating commit LSN and timestamp in order. I don't have a solution
> at this stage but will think more both on the actual problem and
> solution. In the meantime, if you get a chance to refer to the place
> where you have seen such a problem please try to share the same with
> us. It would be helpful.
> 

I think the solution to this would be to acquire the timestamp while
reserving the space (because that happens in LSN order). The clock would
need to be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap.
AFAIK this is the main problem why it's being done outside the critical
section, because gettimeofday() may be quite expensive. There's a
concept of hybrid clock, combining "time" and logical counter, which I
think might be useful independently of CDR ...

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Logical Replication of sequences
Следующее
От: Ranier Vilela
Дата:
Сообщение: Re: list_free in addRangeTableEntryForJoin