Re: UPSERT wiki page, and SQL MERGE syntax

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: UPSERT wiki page, and SQL MERGE syntax
Дата
Msg-id CAM3SWZT7jS6nS2vQyxwV6k7MfPorAqK4DJa0r9dCiE=qBS09dQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: UPSERT wiki page, and SQL MERGE syntax  (Kevin Grittner <kgrittn@ymail.com>)
Ответы Re: UPSERT wiki page, and SQL MERGE syntax
Список pgsql-hackers
On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> I have two questions I hope you can clarify.  I'm having trouble
> parsing what this statement means:
>
>> ... the SQL standard does not require that MERGE be atomic in the
>> sense of atomically providing either an INSERT or UPDATE, ...
>
> My understanding is that the standard logically requires (without
> concern for implementation details) that the second specified table
> (via table name or subquery -- which could be a VALUES statement)
> is scanned, and for each row it attempts to match a row in the
> target table.  That will either match or not, according to the
> boolean expression in the ON clause.  You can have one WHEN MATCHED
> THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
> If both clauses are present, I believe that it is guaranteed that
> one or the other (but not both) will fire for each row in the
> second table.  The usual guarantees for each isolation level must
> not be violated (although an implementation is not required to
> generate anomalies which could not happen with serial execution).

That seems like a statement that isn't getting to the heart of the
matter. Which is: simple UPSERT cases will get things like duplicate
key violations under concurrency with MERGE, in SQL Server, Oracle,
and DB2. I think serialization failures are inevitable and appropriate
when not using READ COMMITTED mode with MVCC, but these duplicate
violations are sort of like READ COMMITTED serialization failures in
disguise. Or they would be, if MERGE (as described by the standard, or
in practice) promised to care about atomicity in that sense. It
doesn't, so they're off the hook. I think we should care about
atomicity (in the sense of always getting an insert or update, never
an error at RC) for the simple reason that that's what people want.

It seems like your remarks here don't acknowledge the reality: That
there are slightly different ideas of "each row" that are in play
here.

> So as usual for a transaction involving multiple tables,
> serialization anomalies are possible if the transaction isolation
> level is REPEATABLE READ or less.  Is that what you're getting at,
> or something else?

My complaint is quite straightforward, really. I don't want to have to
tell users to do this: http://stackoverflow.com/a/22777749

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something like
selectively using different types of snapshots (MVCC or dirty - the
two different ideas of "each row" that are in tension) based on the
exact clauses used. That seems like asking for trouble, TBH.

>> -- Predicate within UPDATE auxiliary statement
>> -- (row is still locked when the UPDATE predicate
>> -- isn't satisfied):
>> INSERT INTO upsert(key, val) VALUES(1, 'insert')
>>   -- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE row-level triggers
>>   ON CONFLICT UPDATE SET val = CONFLICTING(val)
>>   -- Predicate has interesting semantics visibility-wise:
>>   WHERE val != 'delete';
>
> Can you explain what the WHERE clause there does?

Sure. Having already locked the tuple on the basis on finding a
conflict TID, but before an UPDATE, the qual is evaluated using
EvalPlanQual rescanning. The predicate must be satisfied in order for
the UPDATE to actually proceed. If, in this instance, the locked tuple
happened to have a val of 'delete', then we would not UPDATE at all.
It would still be locked, though (possibly without being visible to
our MVCC snapshot, just like when we might do that when we do UPDATE).

There are some interesting implications visibility-wise here that must
be considered. These are discussed on the Wiki page:
https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29

Obviously higher isolation levels just throw an error here instead.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: José Luis Tallón
Дата:
Сообщение: Re: DDL Damage Assessment
Следующее
От: David G Johnston
Дата:
Сообщение: Re: Trailing comma support in SELECT statements