Re: [HACKERS] MERGE SQL Statement for PG11

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: [HACKERS] MERGE SQL Statement for PG11
Дата
Msg-id CABOikdN4us_ExRc=q3CUnvHbKUiRJt-wmQX-cLaysC=rihd0=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] MERGE SQL Statement for PG11  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] MERGE SQL Statement for PG11
Список pgsql-hackers


On Fri, Feb 9, 2018 at 6:53 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Feb 7, 2018 at 7:51 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
> I understand getting EPQ semantics right is very important. Can you please
> (once again) summarise your thoughts on what you think is the *most*
> appropriate behaviour? I can then think how much efforts might be involved
> in that. If the efforts are disproportionately high, we can discuss if
> settling for some not-so-nice semantics, like we apparently did for
> partition key updates.

I personally believe that the existing EPQ semantics are already
not-so-nice. They're what we know, though, and we haven't actually had
any real world complaints, AFAIK.

I agree.
 

My concern is mostly just that MERGE manages to behave in a way that
actually "provides a single SQL statement that can conditionally
INSERT, UPDATE or DELETE rows, a task that would otherwise require
multiple procedural language statements", as the docs put it. As long
as MERGE manages to do something as close to that high level
description as possible in READ COMMITTED mode (with our current
semantics for multiple statements in RC taken as the baseline), then
I'll probably be happy.

IMO it will be quite hard, if not impossible, to guarantee the same semantics to a single statement MERGE and multi statement UPDATE/DELETE/INSERT in RC mode. For example, the multi statement model will execute each statement with a new MVCC snapshot and hence the rows visible to individual statement may vary. Whereas in MERGE, everything runs with a single snapshot. There could be other such subtle differences.
 

Some novel new behavior -- "EPQ with a twist"-- is clearly necessary.
I feel a bit uneasy about it because anything that anybody suggests is
likely to be at least a bit arbitrary (EPQ itself is kind of
arbitrary). We only get to make a decision on how "EPQ with a twist"
will work once, and that should be a decision that is made following
careful deliberation. Ambiguity is much more likely to kill a patch
than a specific technical defect, at least in my experience. Somebody
can usually just fix a technical defect.

While I agree, I think we need to make these decisions in a time bound fashion. If there is too much ambiguity, then it's not a bad idea to settle for throwing appropriate errors instead of providing semantically wrong answers, even in some remote corner case.
 



> TBH I did not consider partitioning any less complex and it was indeed very
> complex, requiring at least 3 reworks by me. And from what I understood, it
> would have been a blocker too. So is subquery handling and RLS. That's why I
> focused on addressing those items while you and Simon were still debating
> EPQ semantics.

Sorry if I came across as dismissive of that effort. That was
certainly not my intention. I am pleasantly surprised that you've
managed to move a number of things forward rather quickly.

I'll rephrase: while it would probably have been a blocker in theory
(I didn't actually weigh in on that), I doubted that it would actually
end up doing so in practice (and it now looks like I was right to
doubt that, since you got it done). It was a theoretical blocker, as
opposed to an open item that could drag on indefinitely despite
everyone's best efforts. Obviously details matter, and obviously there
are a lot of details to get right outside of RC semantics, but it
seems wise to focus on the big risk that is EPQ/RC conflict handling.

Ok. I am now back from holidays and I will too start thinking about this. I've also requested a colleague to help us with comparing it against Oracle's behaviour. N That's not a gold standard for us, but knowing how other major databases handle RC conflicts, is not a bad idea.

I see the following important areas and as long as we have a consistent and coherent handling of these cases, we should not have difficulty agreeing on a outcome.

1. Concurrent UPDATE does not affect MATCHED  case. The WHEN conditions may or may not be affected.
2. Concurrently UPDATEd tuple fails the join qual and the current source tuple no longer matches with the updated target tuple that the EPQ is set for. It matches no other target tuple either. So a MATCHED case is turned into a NOT MATCHED case.
3. Concurrently UPDATEd tuple fails the join qual and the current source tuple no longer matches with the updated target tuple that the EPQ is set for. But it matches some other target tuple. So it's still a MATCHED case, but with different target tuple(s).
4. Concurrent UPDATE/INSERT creates a matching target tuple for a source tuple, thus turning a NOT MATCHED case to a MATCHED case.
5. Concurrent DELETE turns a MATCHED case into NOT MATCHED case

Any other case that I am missing? Assuming all cases are covered, what should we do in each of these cases, so that there is no or very little ambiguity and the outcome seems consistent (at the very least as far as MERGE goes and hopefully with regular UPDATE/DELETE handling)

I think #1 is pretty straight forward. We should start from the top, re-evaluate the WHEN conditions again and execute the first matching action.

For #2, it seems natural that we skip the MATCHED actions and execute the NOT MATCHED action for the current source tuple. But the trouble is how to differentiate between #2 and #3. I'm not sure if we can really distinguish between these cases i.e. given the current source tuple, does another matching target tuple exists? If another matching target tuple exists, we must not invoke the NOT MATCHED action. Otherwise we might end up executing NOT MATCHED as well as MATCHED action for the same source tuple, which seems weird.

#4 looks similar to INSERT ON CONFLICT and one may argue that we should detect concurrent inserts/updates and execute the MATCHED action. But I don't know if that can be done in a reasonable way. It will probably require us to have a primary key on the table to detect those conflicts. I think we should just let the operation fail, like a regular INSERT.

It seems natural that #5 should skip the MATCHED action and instead execute the first satisfying NOT MATCHED action. But it's outcome may depend on how we handle #2 and #3 so that they are all consistent. If we allow #2 and #3 to error out, whenever there is ambiguity, we should do the same for #5.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: [WiP] GiST intrapage indexing
Следующее
От: Nikhil Sontakke
Дата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions