Обсуждение: UPSERT wiki page, and SQL MERGE syntax

Поиск

Список

Период

Сортировка

UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

01 октября 2014 г., 23:52:30

Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to "value locking", I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT

Right now, I would like to address the less contentious but still
unresolved question of whether or not we should use the SQL-standard
MERGE syntax instead of the non-standard INSERT ... ON CONFLICT UPDATE
syntax that my patch proposes.

Note that I'm trying to direct the conversation along certain lines:
Is MERGE the right syntax for this particular effort ("UPSERT")? And,
in particular, not: Is SQL MERGE useful? I think that it is useful,
but is a largely unrelated feature.

Please correct the Wiki page if I have failed to credit SQL MERGE with
some advantage that someone stated at some point. I don't recall any
arguments for it, other than that it is in the SQL standard, but maybe
I missed one.

In general, add to this page, and edit it as you see fit. It'll be
useful to centralize the references, discussion and state of the patch
in one agreed upon place, as the patch continues to evolve.
-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Craig Ringer

Дата:

02 октября 2014 г., 04:44:51

On 10/02/2014 07:52 AM, Peter Geoghegan wrote:
> Having been surprisingly successful at advancing our understanding of
> arguments for and against various approaches to "value locking", I
> decided to try the same thing out elsewhere. I have created a
> general-purpose UPSERT wiki page.
> 
> The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks. That'll help keep things moving forward rather than around in
circles.

> In general, add to this page, and edit it as you see fit. It'll be
> useful to centralize the references, discussion and state of the patch
> in one agreed upon place, as the patch continues to evolve.

I added a summary of the status quo of upsert in Pg as it stands, and a
brief discussion of the state in other RDBMSes.

I'd love it if someone who knows MySQL better could add info on MySQL's
ON DUPLICATE KEY feature - advantages/problems, etc.

I've added a few points to the goals section:

- Any new upsert approach must be a usability improvement on the status
quo; we don't want to introduce subtle behaviour or unnecessary foot-guns.

- if possible, upsert of multiple values is desirable. We currently have
to loop on a per-value basis.

... and some miscellaneous edits/formatting changes.

I've also added sections for the other options:

* A custom syntax
https://wiki.postgresql.org/wiki/UPSERT#Custom_syntax

and

* Adopting an existing non-standard syntax
https://wiki.postgresql.org/wiki/UPSERT#Adopting_an_existing_non-standard_syntax

which I'd appreciate it if you could fill out based on your existing
research and notes. I don't think it makes sense for me to write those
when you've already done the required study/note taking and just need to
transfer it over.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: UPSERT wiki page, and SQL MERGE syntax

От

Heikki Linnakangas

Дата:

02 октября 2014 г., 07:07:57

On 10/02/2014 02:52 AM, Peter Geoghegan wrote:
> Having been surprisingly successful at advancing our understanding of
> arguments for and against various approaches to "value locking", I
> decided to try the same thing out elsewhere. I have created a
> general-purpose UPSERT wiki page.
>
> The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks!

Could you write down all of the discussed syntaxes, using a similar 
notation we use in the manual, with examples on how to use them? And 
some examples on what is possible with some syntaxes, and not with 
others? That would make it a lot easier to compare them.

- Heikki

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

02 октября 2014 г., 07:54:23

On Thu, Oct 2, 2014 at 12:07 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Could you write down all of the discussed syntaxes, using a similar notation
> we use in the manual, with examples on how to use them? And some examples on
> what is possible with some syntaxes, and not with others? That would make it
> a lot easier to compare them.

I've started off by adding varied examples of the use of the existing
proposed syntax. I'll expand on this soon.


-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

02 октября 2014 г., 19:46:23

On Thu, Oct 2, 2014 at 12:54 AM, Peter Geoghegan <pg@heroku.com> wrote:
> I've started off by adding varied examples of the use of the existing
> proposed syntax. I'll expand on this soon.

I spent some time today expanding on the details, and commenting on
the issues around the custom syntax (exactly what it does, issues
around ambiguous conflict-on unique indexes, etc).

Also, "Challenges, issues" has been updated - I've added some
"secondary concerns". These are non-contentious items that I think are
of concern, and would like to highlight now.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

03 октября 2014 г., 20:20:40

Peter Geoghegan <pg@heroku.com> wrote:

> The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks!

I have two questions I hope you can clarify.  I'm having trouble
parsing what this statement means:

> ... the SQL standard does not require that MERGE be atomic in the
> sense of atomically providing either an INSERT or UPDATE, ...

My understanding is that the standard logically requires (without
concern for implementation details) that the second specified table
(via table name or subquery -- which could be a VALUES statement)
is scanned, and for each row it attempts to match a row in the
target table.  That will either match or not, according to the
boolean expression in the ON clause.  You can have one WHEN MATCHED
THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
If both clauses are present, I believe that it is guaranteed that
one or the other (but not both) will fire for each row in the
second table.  The usual guarantees for each isolation level must
not be violated (although an implementation is not required to
generate anomalies which could not happen with serial execution).
So as usual for a transaction involving multiple tables,
serialization anomalies are possible if the transaction isolation
level is REPEATABLE READ or less.  Is that what you're getting at,
or something else?

Regarding this example:

> -- Predicate within UPDATE auxiliary statement
> -- (row is still locked when the UPDATE predicate
> -- isn't satisfied):
> INSERT INTO upsert(key, val) VALUES(1, 'insert')
>   -- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE row-level triggers
>   ON CONFLICT UPDATE SET val = CONFLICTING(val)
>   -- Predicate has interesting semantics visibility-wise:
>   WHERE val != 'delete'; 

Can you explain what the WHERE clause there does?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

03 октября 2014 г., 21:01:26

On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> I have two questions I hope you can clarify.  I'm having trouble
> parsing what this statement means:
>
>> ... the SQL standard does not require that MERGE be atomic in the
>> sense of atomically providing either an INSERT or UPDATE, ...
>
> My understanding is that the standard logically requires (without
> concern for implementation details) that the second specified table
> (via table name or subquery -- which could be a VALUES statement)
> is scanned, and for each row it attempts to match a row in the
> target table.  That will either match or not, according to the
> boolean expression in the ON clause.  You can have one WHEN MATCHED
> THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
> If both clauses are present, I believe that it is guaranteed that
> one or the other (but not both) will fire for each row in the
> second table.  The usual guarantees for each isolation level must
> not be violated (although an implementation is not required to
> generate anomalies which could not happen with serial execution).

That seems like a statement that isn't getting to the heart of the
matter. Which is: simple UPSERT cases will get things like duplicate
key violations under concurrency with MERGE, in SQL Server, Oracle,
and DB2. I think serialization failures are inevitable and appropriate
when not using READ COMMITTED mode with MVCC, but these duplicate
violations are sort of like READ COMMITTED serialization failures in
disguise. Or they would be, if MERGE (as described by the standard, or
in practice) promised to care about atomicity in that sense. It
doesn't, so they're off the hook. I think we should care about
atomicity (in the sense of always getting an insert or update, never
an error at RC) for the simple reason that that's what people want.

It seems like your remarks here don't acknowledge the reality: That
there are slightly different ideas of "each row" that are in play
here.

> So as usual for a transaction involving multiple tables,
> serialization anomalies are possible if the transaction isolation
> level is REPEATABLE READ or less.  Is that what you're getting at,
> or something else?

My complaint is quite straightforward, really. I don't want to have to
tell users to do this: http://stackoverflow.com/a/22777749

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something like
selectively using different types of snapshots (MVCC or dirty - the
two different ideas of "each row" that are in tension) based on the
exact clauses used. That seems like asking for trouble, TBH.

>> -- Predicate within UPDATE auxiliary statement
>> -- (row is still locked when the UPDATE predicate
>> -- isn't satisfied):
>> INSERT INTO upsert(key, val) VALUES(1, 'insert')
>>   -- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE row-level triggers
>>   ON CONFLICT UPDATE SET val = CONFLICTING(val)
>>   -- Predicate has interesting semantics visibility-wise:
>>   WHERE val != 'delete';
>
> Can you explain what the WHERE clause there does?

Sure. Having already locked the tuple on the basis on finding a
conflict TID, but before an UPDATE, the qual is evaluated using
EvalPlanQual rescanning. The predicate must be satisfied in order for
the UPDATE to actually proceed. If, in this instance, the locked tuple
happened to have a val of 'delete', then we would not UPDATE at all.
It would still be locked, though (possibly without being visible to
our MVCC snapshot, just like when we might do that when we do UPDATE).

There are some interesting implications visibility-wise here that must
be considered. These are discussed on the Wiki page:
https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29

Obviously higher isolation levels just throw an error here instead.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

03 октября 2014 г., 21:47:47

Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> I'm having trouble parsing what this statement means:
>>
>>> ... the SQL standard does not require that MERGE be atomic in
>>> the sense of atomically providing either an INSERT or UPDATE, ...

> ... always getting an insert or update, never an error ...

I've never seen "atomic" used to mean "you never get an error"
before.  Perhaps it would be clearer to replace "atomic" with
"error-free" or some such.  It certainly would be less confusing
for me.  "Atomic" in most cases is a property that can be enforced
by generating an error where it would otherwise be violated.

For example: http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29
 "Although implementations vary depending on factors such as  concurrency issues, the principle of atomicity — i.e.
complete success or complete failure — remain." 

When you are saying "atomic" you mean something quite different.

> My complaint is quite straightforward, really. I don't want to
> have to tell users to do this: http://stackoverflow.com/a/22777749

I think you are confusing syntax with semantics.  I grant that a
reasonable person could have concerns about using the MERGE syntax
to implement the semantics you want in the special case that an
appropriate unique index exists, but pretending that the semantics
can't be achieved with that syntax is just silly.

> At the same time, I also don't want to have to live with the
> consequences of implementing a MERGE that does not exhibit that
> behavior. Which is to say, the consequences of doing something
> like selectively using different types of snapshots (MVCC or
> dirty - the two different ideas of "each row" that are in
> tension) based on the exact clauses used. That seems like asking
> for trouble, TBH.

Now *that* is getting more to a real issue.  We routinely pick very
different plans based on the presence or absence of an index, and
we use special snapshots in the course of executing many DML
statements (if FK triggers are fired), but this would be the first
place I can think of that the primary DML statement behaves that
way.  You and a couple others have expressed concern about that,
but it seems pretty vague and hand-wavey.  If a different execution
node is used for the special behavior, and that node is generated
based on the unique index being used, I'm really having problems
seeing where the problem lies.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

03 октября 2014 г., 22:42:09

On Fri, Oct 3, 2014 at 2:44 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> I've never seen "atomic" used to mean "you never get an error"
> before.

> When you are saying "atomic" you mean something quite different.

Perhaps I should have been more careful on that point. Just to be
crystal clear: I don't intend that you'll never get an error (some
cases clearly *should* error). But I want confidence that reasonable,
simple UPSERTs will never error due to a duplicate violation that they
thought the statement was supposed to avoid. That's the point of
UPSERT. Guarantees matter. I would like to make guarantees that are at
least as good as the upsert looping subxact pattern.

Teradata refers to an "atomic UPSERT". It also has a MERGE command.

>> My complaint is quite straightforward, really. I don't want to
>> have to tell users to do this: http://stackoverflow.com/a/22777749
>
> I think you are confusing syntax with semantics.  I grant that a
> reasonable person could have concerns about using the MERGE syntax
> to implement the semantics you want in the special case that an
> appropriate unique index exists, but pretending that the semantics
> can't be achieved with that syntax is just silly.

I am not pretending that it can't be done - just, as I said, that to
do so would have bad consequences when time came to generalize the
syntax to support more advanced cases.

>> At the same time, I also don't want to have to live with the
>> consequences of implementing a MERGE that does not exhibit that
>> behavior. Which is to say, the consequences of doing something
>> like selectively using different types of snapshots (MVCC or
>> dirty - the two different ideas of "each row" that are in
>> tension) based on the exact clauses used. That seems like asking
>> for trouble, TBH.
>
> Now *that* is getting more to a real issue.

Yes, it is. I am opposed to using the MERGE syntax for this *because*
MERGE is useful (independently useful, when done properly), not
because it is not useful.

> We routinely pick very
> different plans based on the presence or absence of an index, and
> we use special snapshots in the course of executing many DML
> statements (if FK triggers are fired), but this would be the first
> place I can think of that the primary DML statement behaves that
> way.  You and a couple others have expressed concern about that,
> but it seems pretty vague and hand-wavey.  If a different execution
> node is used for the special behavior, and that node is generated
> based on the unique index being used, I'm really having problems
> seeing where the problem lies.

We need to avoid duplicate violations. The authoritative source of
truth here is a unique index, because the presence or absence of a
conclusively-visible conflict tuple controls whether our insert fails
or succeeds respectively. We need a dirty snapshot to see tuples not
in our MVCC snapshot, since they need to be handled too. Otherwise,
when we fail to handle them, the MERGE's INSERT has a dup violation
(which is not what I want, for practical reasons).

If we're seq scanning a table, even with a dirty snapshot, there is no
authoritative source of truth. Heap tuples will be inserted
concurrently, and we'll have no idea what to do about it without
reference to a unique index as a choke point/single source of final
truth about what is and isn't going to be duplicated.

As implemented, my UPSERT patch will have UPSERTs not really use their
MVCC snapshot for simple cases, just like INSERTs in general. UPDATEs
always use their MVCC snapshot, as the ModifyTable node pulls up
tuples from an ordinary relation scan. UPSERT needs a DirtySnapshot
scan of the unique index to find duplicates, because that's always how
we find duplicates. At the same time, MERGE is not at liberty to not
work just because of the absence of a unique index (plus even row
locking must be prepared for conflicts). And since the MERGE is driven
by an outer join - *not* an INSERT-like search for duplicates - we'll
find ourselves providing one behavior when detecting one case, and
another when detecting the other, with subtle and confusing
differences between each case. If we "fake" an outer join by doing an
INSERT-like search for duplicates in a unique index (to make sure we
see duplicates), then we get into trouble elsewhere. Like, we cannot
let users not ask for an INSERT in their MERGE, because we'd be
conclusively deciding to *not* (say) UPDATE on the basis of the
absence of a value in a unique index, as opposed to the presence.

Suddenly, we're in the business of proving a negative....or are we?.
My head hurts.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

03 октября 2014 г., 22:43:25

On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Yes, it is. I am opposed to using the MERGE syntax for this *because*
> MERGE is useful (independently useful, when done properly), not
> because it is not useful.

As I've mentioned, there is also the practical argument:  No one else
does this. That suggests to me that it's hard.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

03 октября 2014 г., 22:49:22

On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> We routinely pick very
>> different plans based on the presence or absence of an index, and
>> we use special snapshots in the course of executing many DML
>> statements (if FK triggers are fired)

Apart from FK snapshots, we also use dirty snapshots for unique index
enforcement, obviously. That doesn't mean that it's the "command/xact
snapshot", though. It's just a special constraint enforcement
mechanism.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Robert Haas

Дата:

06 октября 2014 г., 15:35:35

On Fri, Oct 3, 2014 at 4:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> ... the SQL standard does not require that MERGE be atomic in the
>> sense of atomically providing either an INSERT or UPDATE, ...
>
> My understanding is that the standard logically requires (without
> concern for implementation details) that the second specified table
> (via table name or subquery -- which could be a VALUES statement)
> is scanned, and for each row it attempts to match a row in the
> target table.  That will either match or not, according to the
> boolean expression in the ON clause.  You can have one WHEN MATCHED
> THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
> If both clauses are present, I believe that it is guaranteed that
> one or the other (but not both) will fire for each row in the
> second table.  The usual guarantees for each isolation level must
> not be violated (although an implementation is not required to
> generate anomalies which could not happen with serial execution).
> So as usual for a transaction involving multiple tables,
> serialization anomalies are possible if the transaction isolation
> level is REPEATABLE READ or less.  Is that what you're getting at,
> or something else?

I think the problem is that it's not possible to respect the "usual
guarantees" even at READ COMMITTED level when performing an INSERT OR
UPDATE operation (however spelled).  You may find that there's a tuple
with the same PK which is committed but not visible to the snapshot
you took at the beginning of the statement.  An INSERT would fail; an
UPDATE would see no rows and do nothing.  IOW, *neither* operation
succeeds according to its classic semantics; to succeed, the INSERT OR
UPDATE has to do something altogether novel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

06 октября 2014 г., 18:05:08

On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think the problem is that it's not possible to respect the "usual
> guarantees" even at READ COMMITTED level when performing an INSERT OR
> UPDATE operation (however spelled).

That's definitely the main problem. Also, there could be garden
variety race conditions.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

08 октября 2014 г., 20:26:42

On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think the problem is that it's not possible to respect the "usual
> guarantees" even at READ COMMITTED level when performing an INSERT OR
> UPDATE operation (however spelled).  You may find that there's a tuple
> with the same PK which is committed but not visible to the snapshot
> you took at the beginning of the statement.

Can you please comment on this, Kevin? It would be nice to converge on
an agreement on syntax here (without necessarily working out the exact
details right away, including for example the exact spelling of what
I'm calling WITHIN). Since Simon has changed his mind on MERGE (while
still wanting to retain a superficial flavor of MERGE,  a flavor that
does not include the MERGE keyword), I think that leaves you as the
only advocate for the MERGE syntax.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

08 октября 2014 г., 21:04:56

Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

>> I think the problem is that it's not possible to respect the "usual
>> guarantees" even at READ COMMITTED level when performing an INSERT OR
>> UPDATE operation (however spelled).  You may find that there's a tuple
>> with the same PK which is committed but not visible to the snapshot
>> you took at the beginning of the statement.
>
> Can you please comment on this, Kevin? It would be nice to converge on
> an agreement on syntax here

Robert said "however spelled" -- which I take to mean that he at
least sees that the MERGE-like UPSERT syntax can be turned into the
desired semantics.  I have no idea why anyone would think otherwise.

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.  That doesn't
seem as clean to me as saying (for example):

UPSERT targettable t USING (VALUES (123, 100)) x(id, quantity) ON t.id = x.id WHEN NOT MATCHED   INSERT (id, quantity,
inserted_at)VALUES (x.id, x.quantity, now()) WHEN MATCHED   UPDATE SET quantity = t.quantity + x.quantity, updated_at =
now();

As I understand it you are proposing that would be:

INSERT INTO targettable(key, quantity, inserted_at) VALUES(123, quantity, now()) ON CONFLICT WITHIN targettable_pkey
UPDATESET quantity = quantity + CONFLICTING(quantity), updated_at = now();

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

08 октября 2014 г., 22:52:02

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Although the last go-around does suggest that there is at least one
> point of difference on the semantics.  You seem to want to fire the
> BEFORE INSERT triggers before determining whether this will be an
> INSERT or an UPDATE.  That seems like a bad idea to me, but if the
> consensus is that we want to do that, it does argue for your plan
> of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.

> As I understand it you are proposing that would be:
>
> INSERT INTO targettable(key, quantity, inserted_at)
>   VALUES(123, quantity, now())
>   ON CONFLICT WITHIN targettable_pkey
>     UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

That's right, yes.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Marti Raudsepp

Дата:

08 октября 2014 г., 23:31:05

On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Although the last go-around does suggest that there is at least one
>> point of difference on the semantics.  You seem to want to fire the
>> BEFORE INSERT triggers before determining whether this will be an
>> INSERT or an UPDATE.  That seems like a bad idea to me

Indeed, the current behavior breaks even the canonical "keep track of
how many posts are in a thread" trigger example because INSERT
triggers are fired for an effective row UPDATE. I can't see how that's
acceptable.

> Well, it isn't that I'm doing it because I think that it is a great
> idea, with everything to recommend it. It's more like I don't see any
> practical alternative.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that? Even a "clarify
this", "this couldn't possibly work" or "you're not a major
contributor so your arguments are invalid" response. ;)

I admit I initially misunderstood some details about the current
behavior. I can dig into the code and write a clearer and/or more
detailed spec if I had even *some* feedback.

Regards,
Marti

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

08 октября 2014 г., 23:46:21

On Wed, Oct 8, 2014 at 4:30 PM, Marti Raudsepp <marti@juffo.org> wrote:
> On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> Although the last go-around does suggest that there is at least one
>>> point of difference on the semantics.  You seem to want to fire the
>>> BEFORE INSERT triggers before determining whether this will be an
>>> INSERT or an UPDATE.  That seems like a bad idea to me
>
> Indeed, the current behavior breaks even the canonical "keep track of
> how many posts are in a thread" trigger example because INSERT
> triggers are fired for an effective row UPDATE. I can't see how that's
> acceptable.

DB2 had the foresight to add the following restrictions to BEFORE ROW
triggers - they cannot:

"""
Contain any INSERT, DELETE, or UPDATE operations, nor invoke any
routine defined with MODIFIES SQL DATA, if it is not a compound SQL.
Contain any DELETE or UPDATE operations on the trigger subject table,
nor invoke any routine containing such operations, if it is a compound
SQL.
Reference a materialized query table defined with REFRESH IMMEDIATE
Reference a generated column other than the identity column in the NEW
transition variable.
"""

To get a sense of how doing fancy things in before triggers leads to
trouble, considering the hardening Kevin added in commit 6868ed74. In
short, use an AFTER trigger for this kind of thing, and all of these
issues go away.

>> Well, it isn't that I'm doing it because I think that it is a great
>> idea, with everything to recommend it. It's more like I don't see any
>> practical alternative.
>
> I proposed an alternative that avoids this surprise and might allow
> some other benefits. Can you please look into that?

I looked at that. You're talking about making the case where before
row triggers clearly are appropriate (when you just want to change the
tuple being inserted) fail, in order to make an arguably inappropriate
case for before row triggers work (when you don't change the tuple to
be inserted, but you do want to do some other thing). That seems
backwards. My proposed behavior seems far less surprising.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Marti Raudsepp

Дата:

09 октября 2014 г., 00:38:01

On Thu, Oct 9, 2014 at 2:46 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> Indeed, the current behavior breaks even the canonical "keep track of
>> how many posts are in a thread" trigger example
> use an AFTER trigger for this kind of thing, and all of these
> issues go away.

In the latest patches from CommitFest, the UPDATE case invokes
BEFORE&AFTER INSERT triggers, not AFTER UPDATE. Is that going to be
changed? If so, I concede that.

>> I proposed an alternative that avoids this surprise and might allow
>> some other benefits. Can you please look into that?
>
> I looked at that.

Thanks!

> You're talking about making the case where before
> row triggers clearly are appropriate (when you just want to change the
> tuple being inserted) fail

Only in case the trigger changes *key* columns necessary for atomicity
(i.e. from the WITHIN index). Other columns are fair game. The
restriction seems justifiable to me: it's unreasonable to be atomic
with respect to values that change mid-way.

* It can distinguish between BEFORE INSERT and BEFORE UPDATE triggers,
which your approach fundamentally cannot.
* It can probably avoid evaluating column defaults in the UPDATE case,
like nextval() for unrelated serial columns => reduced overhead
* The semantics match better with MERGE-like syntax that seems to come
up again and again.
* And it appears to solve (part?) of the problem you had with naming
key *colums* in the WITHIN clause, rather than using index name.

If you don't see any reasons why it can't be done, these benefits seem
clear to me. I think the tradeoffs at least warrant wider discussion.

Regards,
Marti

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

09 октября 2014 г., 00:47:50

On Wed, Oct 8, 2014 at 5:37 PM, Marti Raudsepp <marti@juffo.org> wrote:
> Only in case the trigger changes *key* columns necessary for atomicity
> (i.e. from the WITHIN index). Other columns are fair game. The
> restriction seems justifiable to me: it's unreasonable to be atomic
> with respect to values that change mid-way.

> If you don't see any reasons why it can't be done, these benefits seem
> clear to me. I think the tradeoffs at least warrant wider discussion.

I don't.  That's very surprising. One day, it will fail unexpectedly.
As proposed, the way BEFORE INSERT triggers fire almost forces users
to consider the issues up-front.

Note that the CONFLICTING() behavior with respect to BEFORE INSERT
triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
foo = VALUES(foo)" thing. There was agreement that that was the right
behavior, it seemed.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Marti Raudsepp

Дата:

09 октября 2014 г., 01:12:59

On Thu, Oct 9, 2014 at 3:47 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Oct 8, 2014 at 5:37 PM, Marti Raudsepp <marti@juffo.org> wrote:
>> Only in case the trigger changes *key* columns necessary for atomicity
>> (i.e. from the WITHIN index). Other columns are fair game. The
>> restriction seems justifiable to me: it's unreasonable to be atomic
>> with respect to values that change mid-way.

Oh, one more consideration: I believe you will run into the same issue
if you want to implement BEFORE UPDATE triggers in any form. Skipping
BEFORE UPDATE entirely seems to violate POLA.

It's common for applications to e.g. use triggers to keep track of
latest modified time for a row. With your proposal, every query needs
to include logic for that to work.

>> If you don't see any reasons why it can't be done, these benefits seem
>> clear to me. I think the tradeoffs at least warrant wider discussion.
>
> I don't.  That's very surprising. One day, it will fail unexpectedly.
> As proposed, the way BEFORE INSERT triggers fire almost forces users
> to consider the issues up-front.

Not necessarily "up-front", as proposed it causes existing triggers to
change behavior when users adopt the upsert feature. And that adoption
may even be transparent to the user due to ORM magic.

There are potential surprises with both approaches. Personally I
prefer a hard error instead of silently skipping logic that previously
worked. And that seems to match general PostgreSQL philosophy too.

> Note that the CONFLICTING() behavior with respect to BEFORE INSERT
> triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
> foo = VALUES(foo)" thing. There was agreement that that was the right
> behavior, it seemed.

MySQL gets away with lots of things, they have several other caveats
with triggers. I don't think it's a good example to follow wrt trigger
behavior.

Regards,
Marti

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

09 октября 2014 г., 01:25:16

On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp <marti@juffo.org> wrote:
> Oh, one more consideration: I believe you will run into the same issue
> if you want to implement BEFORE UPDATE triggers in any form. Skipping
> BEFORE UPDATE entirely seems to violate POLA.

Good thing that the patch doesn't do that, then. I clearly documented
this in a few places, including:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html

> It's common for applications to e.g. use triggers to keep track of
> latest modified time for a row. With your proposal, every query needs
> to include logic for that to work.

Wrong.

>>> If you don't see any reasons why it can't be done, these benefits seem
>>> clear to me. I think the tradeoffs at least warrant wider discussion.
>>
>> I don't.  That's very surprising. One day, it will fail unexpectedly.
>> As proposed, the way BEFORE INSERT triggers fire almost forces users
>> to consider the issues up-front.
>
> Not necessarily "up-front", as proposed it causes existing triggers to
> change behavior when users adopt the upsert feature. And that adoption
> may even be transparent to the user due to ORM magic.
>
> There are potential surprises with both approaches.

When you make the slightest effort to understand what my approach is,
I might take your remarks seriously.

>> Note that the CONFLICTING() behavior with respect to BEFORE INSERT
>> triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
>> foo = VALUES(foo)" thing. There was agreement that that was the right
>> behavior, it seemed.
>
> MySQL gets away with lots of things, they have several other caveats
> with triggers. I don't think it's a good example to follow wrt trigger
> behavior.

No true Scotsman.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Marti Raudsepp

Дата:

09 октября 2014 г., 01:57:32

On Thu, Oct 9, 2014 at 4:25 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp <marti@juffo.org> wrote:
>> Skipping
>> BEFORE UPDATE entirely seems to violate POLA.

> Good thing that the patch doesn't do that, then. I clearly documented
> this in a few places, including:
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html

I tried to understand the docs and found them somewhat confusing, so I
turned to testing the implementation you posted on 2014-10-07. It
shows no signs of running UPDATE triggers in the following case. So is
this just a bug or a missing feature?

create table evt_type (id serial primary key, name text unique,
evt_count int, update_count int default 0);
prepare upsert(text) as   INSERT into evt_type (name, evt_count) values ($1, 1)   on conflict within evt_type_name_key
UPDATE set evt_count=conflicting(evt_count)+1;
 
create or replace function ins() returns trigger language plpgsql as
$$begin raise notice 'trigger % %', TG_WHEN, TG_OP; return new; end$$;
create or replace function upd() returns trigger language plpgsql as
$$begin raise notice 'trigger % %', TG_WHEN, TG_OP;
new.update_count=new.update_count+1; return new; end$$;
create trigger ev1 before insert on evt_type execute procedure ins();
create trigger ev2 before update on evt_type execute procedure upd();
create trigger ev3 after insert on evt_type execute procedure ins();
create trigger ev4 after update on evt_type execute procedure upd();

db=# execute upsert('foo');
NOTICE:  trigger BEFORE INSERT
NOTICE:  trigger AFTER INSERT
INSERT 0 1
db=# execute upsert('foo');
NOTICE:  trigger BEFORE INSERT
NOTICE:  trigger AFTER INSERT
INSERT 0 0
marti=# table evt_type;id | name | evt_count | update_count
----+------+-----------+-------------- 1 | foo  |         2 |            0

>> MySQL gets away with lots of things, they have several other caveats
> No true Scotsman.

Eh? I'm just saying there may be good reasons not to imitate MySQL here.

Regards,
Marti

Re: UPSERT wiki page, and SQL MERGE syntax

От

Marti Raudsepp

Дата:

09 октября 2014 г., 02:05:38

On Thu, Oct 9, 2014 at 4:56 AM, Marti Raudsepp <marti@juffo.org> wrote:
> create trigger ev1 before insert on evt_type execute procedure ins();
> create trigger ev2 before update on evt_type execute procedure upd();
> create trigger ev3 after insert on evt_type execute procedure ins();
> create trigger ev4 after update on evt_type execute procedure upd();

Oooooops, I missed "for each row" here.
Sorry for wasting your time.

Regards,
Marti

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

09 октября 2014 г., 02:49:16

On Wed, Oct 8, 2014 at 7:04 PM, Marti Raudsepp <marti@juffo.org> wrote:
> Oooooops, I missed "for each row" here.

Note that I have an open item for what to do about statement level
triggers here: https://wiki.postgresql.org/wiki/UPSERT

I'm not satisfied with the current behavior. It needs more thought.
But I think row level triggers are fine.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Jim Nasby

Дата:

09 октября 2014 г., 23:39:04

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:
> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner<kgrittn@ymail.com>  wrote:
>> >Although the last go-around does suggest that there is at least one
>> >point of difference on the semantics.  You seem to want to fire the
>> >BEFORE INSERT triggers before determining whether this will be an
>> >INSERT or an UPDATE.  That seems like a bad idea to me, but if the
>> >consensus is that we want to do that, it does argue for your plan
>> >of making UPSERT a variant of the INSERT command.
> Well, it isn't that I'm doing it because I think that it is a great
> idea, with everything to recommend it. It's more like I don't see any
> practical alternative. We need the before row insert triggers to fire
> to figure out what to insert (or if we should update instead). No way
> around that. At the same time, those triggers are at liberty to do
> almost anything, and so in general we have no way of totally
> nullifying their effects (or side effects). Surely you see the
> dilemma.

FWIW, if each row was handled in a subtransaction then an insert that turned out to be an update could be rolled
back...but the performance impact of going that route might be pretty horrid. :( There's also the potential to get
stuckin a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into
anINSERT.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: UPSERT wiki page, and SQL MERGE syntax

От

Gavin Flower

Дата:

10 октября 2014 г., 00:00:38

On 10/10/14 12:38, Jim Nasby wrote:
> On 10/8/14, 5:51 PM, Peter Geoghegan wrote:
>> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner<kgrittn@ymail.com>  
>> wrote:
>>> >Although the last go-around does suggest that there is at least one
>>> >point of difference on the semantics.  You seem to want to fire the
>>> >BEFORE INSERT triggers before determining whether this will be an
>>> >INSERT or an UPDATE.  That seems like a bad idea to me, but if the
>>> >consensus is that we want to do that, it does argue for your plan
>>> >of making UPSERT a variant of the INSERT command.
>> Well, it isn't that I'm doing it because I think that it is a great
>> idea, with everything to recommend it. It's more like I don't see any
>> practical alternative. We need the before row insert triggers to fire
>> to figure out what to insert (or if we should update instead). No way
>> around that. At the same time, those triggers are at liberty to do
>> almost anything, and so in general we have no way of totally
>> nullifying their effects (or side effects). Surely you see the
>> dilemma.
>
> FWIW, if each row was handled in a subtransaction then an insert that 
> turned out to be an update could be rolled back... but the performance 
> impact of going that route might be pretty horrid. :( There's also the 
> potential to get stuck in a loop where a BEFORE INSERT trigger turns 
> the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into an 
> INSERT.
Perhaps you need an UPSERT trigger?

Re: UPSERT wiki page, and SQL MERGE syntax

От

Robert Haas

Дата:

10 октября 2014 г., 18:05:34

On Wed, Oct 8, 2014 at 5:01 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Peter Geoghegan <pg@heroku.com> wrote:
>> On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I think the problem is that it's not possible to respect the "usual
>>> guarantees" even at READ COMMITTED level when performing an INSERT OR
>>> UPDATE operation (however spelled).  You may find that there's a tuple
>>> with the same PK which is committed but not visible to the snapshot
>>> you took at the beginning of the statement.
>>
>> Can you please comment on this, Kevin? It would be nice to converge on
>> an agreement on syntax here
>
> Robert said "however spelled" -- which I take to mean that he at
> least sees that the MERGE-like UPSERT syntax can be turned into the
> desired semantics.  I have no idea why anyone would think otherwise.
>
> Although the last go-around does suggest that there is at least one
> point of difference on the semantics.  You seem to want to fire the
> BEFORE INSERT triggers before determining whether this will be an
> INSERT or an UPDATE.

OK, I have a comment on this.

Anything we do about triggers will by definition be novel.  Right now,
we have INSERT, UPDATE, and DELETE.  If we add a new operation,
whether it's called UPSERT or MERGE or FROB, or if we add a flag to
insert that makes it possibly do something other than inserting (e.g.
INSERT OR UPDATE), or if we add a flag to update that makes it do
something other than updating (e.g. UPDATE or INSERT), then some
people's triggers are going to get broken by that change no matter how
we do it.  When the new operation is invoked, we can fire the insert
triggers, the update triggers, some new kind of trigger, or no trigger
at all - and any decision we make there will not in all cases be
backward-compatible.  We can try to minimize the damage (and we
probably should) but we can't make it go to zero.

> INSERT INTO targettable(key, quantity, inserted_at)
>   VALUES(123, quantity, now())
>   ON CONFLICT WITHIN targettable_pkey
>     UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

I actually like this syntax reasonably well in some ways, but I don't
like that we're mentioning the index name, and the CONFLICTING()
notation is decidedly odd.  Maybe something like this:

INSERT INTO targettable(key, quantity, inserted_at)  VALUES(123, quantity, now())  ON DUPLICATE (key)    UPDATE SET
quantity= quantity + OLD.quantity, updated_at = now();

(Perhaps OLD should reference the tuple already in the table, and NEW
the value from the VALUES clause.  That would be snazzy indeed.)

Also, how about making the SET clause optional, with the semantics
that we just update all of the fields for which a value is explicitly
specified:

INSERT INTO overwrite_with_abandon (key, value)   VALUES (42, 'meaning of life')   ON DUPLICATE (key) UPDATE;

While the ability to specify a SET clause there explicitly is useful,
I bet a lot of key-value store users would love the heck out of a
syntax that let them omit it when they want to overwrite.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 18:30:01

On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Anything we do about triggers will by definition be novel.  Right now,
> we have INSERT, UPDATE, and DELETE.  If we add a new operation,
> whether it's called UPSERT or MERGE or FROB, or if we add a flag to
> insert that makes it possibly do something other than inserting (e.g.
> INSERT OR UPDATE), or if we add a flag to update that makes it do
> something other than updating (e.g. UPDATE or INSERT), then some
> people's triggers are going to get broken by that change no matter how
> we do it.  When the new operation is invoked, we can fire the insert
> triggers, the update triggers, some new kind of trigger, or no trigger
> at all - and any decision we make there will not in all cases be
> backward-compatible.  We can try to minimize the damage (and we
> probably should) but we can't make it go to zero.

+1. It's inevitable that someone isn't going to be entirely happy with
whatever we do. Let's be realistic about that, while minimizing the
risk.

> I actually like this syntax reasonably well in some ways, but I don't
> like that we're mentioning the index name, and the CONFLICTING()
> notation is decidedly odd.  Maybe something like this:

People keep remarking that they don't like that you can (optionally)
name a unique index explicitly, and I keep telling them why I've done
it that way [1]. There is a trade-off here. I am willing to go another
way in that trade-off, but let's have a realistic conversation about
it.

> Also, how about making the SET clause optional, with the semantics
> that we just update all of the fields for which a value is explicitly
> specified:
>
> INSERT INTO overwrite_with_abandon (key, value)
>     VALUES (42, 'meaning of life')
>     ON DUPLICATE (key) UPDATE;
>
> While the ability to specify a SET clause there explicitly is useful,
> I bet a lot of key-value store users would love the heck out of a
> syntax that let them omit it when they want to overwrite.

While I initially like that idea, now I'm not so sure about it [2].
MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
command allows it, since it is defined as a DELETE followed by an
INSERT (or something like that), but I think that in general that
feature has been a disaster for them.

[1] http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93KXeHY4g@mail.gmail.com

[2] http://www.postgresql.org/message-id/CAM3SWZT=VXBJ7QKAidAmYbU40aP10udSqOOqhViX3Ykj7WBv9A@mail.gmail.com
-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

10 октября 2014 г., 18:33:24

Robert Haas <robertmhaas@gmail.com> wrote:

> Anything we do about triggers will by definition be novel.  Right
> now, we have INSERT, UPDATE, and DELETE.  If we add a new
> operation, whether it's called UPSERT or MERGE or FROB, [ ... ]

If we call it MERGE, then we had better follow the rules the
standard lays out for triggers on a MERGE statement -- which is
that UPDATE triggers fire for rows updated and that INSERT triggers
fire for rows inserted.  That kinda makes sense, since the MERGE
command is almost like a row-by-row CASE statement allowing one or
the other.

If we make up our own command verb, we are free to do as we like.

> Maybe something like this:
>
> INSERT INTO targettable(key, quantity, inserted_at)
>   VALUES(123, quantity, now())
>   ON DUPLICATE (key)
>     UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

That seems a lot cleaner than the proposal on the Wiki page.  If we
go that route, it makes sense to fire the BEFORE INSERT triggers
before attempting the insert and then fire BEFORE UPDATE triggers
before attempting the UPDATE.  If either succeeds, I think we
should fire the corresponding AFTER triggers.  We already allow a
BEFORE triggers to run and then omit the triggering operation
without an error, so I don't see that as a problem.  This makes a
lot more sense to me than attempting to add a new UPSERT trigger
type.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 18:38:39

On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> That seems a lot cleaner than the proposal on the Wiki page.  If we
> go that route, it makes sense to fire the BEFORE INSERT triggers
> before attempting the insert and then fire BEFORE UPDATE triggers
> before attempting the UPDATE.  If either succeeds, I think we
> should fire the corresponding AFTER triggers.  We already allow a
> BEFORE triggers to run and then omit the triggering operation
> without an error, so I don't see that as a problem.  This makes a
> lot more sense to me than attempting to add a new UPSERT trigger
> type.

You realize that that's exactly what my patch does, right?

AFAICT the only confusion that my patch has is with statement-level
triggers, as discussed on the Wiki page. I think that the row-level
trigger behavior is fine; a lot more thought has gone into it.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

10 октября 2014 г., 18:47:28

Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

>> I actually like this syntax reasonably well in some ways, but I don't
>> like that we're mentioning the index name, and the CONFLICTING()
>> notation is decidedly odd.
>
> People keep remarking that they don't like that you can (optionally)
> name a unique index explicitly, and I keep telling them why I've done
> it that way [1]. There is a trade-off here. I am willing to go another
> way in that trade-off, but let's have a realistic conversation about
> it.

We've all read that, and your repeated arguments for that point of
view.  We disagree and have said why.  What in that is not a
realistic conversation?

To restate: to do so is conflating the logical definition of the 
database with a particular implementation detail.  As just one 
reason that is a bad idea: we can look up unique indexes on the 
specified columns, but if we implement a other storage techniques 
where there is no such thing as a unique index on the columns, yet 
manage to duplicate the semantics (yes, stranger things have 
happened), people can't migrate to the new structure without 
rewriting their queries.  If the syntax references logical details 
(like column names) there is no need to rewrite.  We don't want to 
be painted into a corner.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 18:47:52

On Fri, Oct 10, 2014 at 11:38 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> That seems a lot cleaner than the proposal on the Wiki page.  If we
>> go that route, it makes sense to fire the BEFORE INSERT triggers
>> before attempting the insert and then fire BEFORE UPDATE triggers
>> before attempting the UPDATE.

By the way, there is no problem with failing to UPDATE, because we
lock rows ahead of the UPDATE. Once a row is locked, the UPDATE cannot
conflict. There is no danger of UPDATE before row-level triggers
firing without then updating (unless the xact aborts, but you know
what I mean). In general, there is no danger of triggers firing more
often than you might consider that they should, with the sole
exception of the fact that we always fire before insert row-level
triggers, even when an UPDATE is the ultimate outcome.

Restarting for conflicts (e.g. handling concurrent insertions with
promise tuples) necessitates a restart, but to the point after before
row-level insert triggers fire, and from a point before any other
triggers have the opportunity to fire.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 18:50:37

On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> We've all read that, and your repeated arguments for that point of
> view.  We disagree and have said why.  What in that is not a
> realistic conversation?

Because, as I've said many times, there are problems with naming
columns directly that need to be addressed. There are corner-cases.
Are you going to complain about those once I go implement something?
Let's talk about that.

I think we can just throw an error in these edge-cases, but let's
decide if we're comfortable with that.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

10 октября 2014 г., 18:58:32

Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

>> That seems a lot cleaner than the proposal on the Wiki page.  If we
>> go that route, it makes sense to fire the BEFORE INSERT triggers
>> before attempting the insert and then fire BEFORE UPDATE triggers
>> before attempting the UPDATE.  If either succeeds, I think we
>> should fire the corresponding AFTER triggers.  We already allow a
>> BEFORE triggers to run and then omit the triggering operation
>> without an error, so I don't see that as a problem.  This makes a
>> lot more sense to me than attempting to add a new UPSERT trigger
>> type.
>
> You realize that that's exactly what my patch does, right?

I haven't read the patch, but the discussion has made that clear,
yes.  Try to take agreement on a point gracefully, will ya?  ;-)

> AFAICT the only confusion that my patch has is with statement-level
> triggers, as discussed on the Wiki page. I think that the row-level
> trigger behavior is fine; a lot more thought has gone into it.

IMV it is clear that since the standard says that update or insert
operations which affect zero rows fire the corresponding trigger,
the statement level triggers for both should fire for an UPSERT,
even if the set of affected rows for one or the other (or both!) is
zero.  If we ever get transition relations, it will be easy to
check the count of affected rows or otherwise behave appropriately
based on what was done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

10 октября 2014 г., 19:07:50

Peter Geoghegan <pg@heroku.com> wrote:

> There is no danger of UPDATE before row-level triggers firing
> without then updating (unless the xact aborts, but you know what
> I mean).

Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
without error, right?  The BEFORE UPDATE triggers will run before
the UPDATE if a duplicate is found, and the return value will be
treated in the usual manner, replacing values and potentially
skipping the update?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Robert Haas

Дата:

10 октября 2014 г., 19:09:42

On Fri, Oct 10, 2014 at 2:29 PM, Peter Geoghegan <pg@heroku.com> wrote:
> People keep remarking that they don't like that you can (optionally)
> name a unique index explicitly, and I keep telling them why I've done
> it that way [1]. There is a trade-off here. I am willing to go another
> way in that trade-off, but let's have a realistic conversation about
> it.

I think what's realistic here is that the patch isn't going to get
committed the way you have it today, because nobody else likes that
design.  That may be harsh, but I think it's accurate.

But on the substance of the issue, I'm totally unconvinced by your
argument in the linked email.  Let's suppose you have this:

create table foo (a int, b int);
create unique index foo1 on foo (a);
create unique index foo2 on foo (a);
create unique index foo3 on foo (a) where b = 1;

Anything that conflicts in foo1 or foo2 is going to conflict in the
other one, and with foo3.  Anything that conflicts in foo3 is perhaps
also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
what to do: when we hit a conflict in any of those three indexes,
update the row.  No matter which index has the conflict, updating is
the right answer, and solves the conflict in all three indexes.  I
dunno what you'd expect someone to write in your syntax to solve this
problem - presumably either (1) they can list any of the indexes that
might conflict, (2) they must list all of the indexes that might
conflict, or (3) it just doesn't work.  Whatever the actual behavior
of your patch is today, it seems to me that the conflict is,
fundamentally, over a set of column values, NOT over a particular
index.  The index is simply a mechanism for noticing that conflict.

If you want to allow this to work with expression indexes, that's not
really a problem; you can let the user write INSERT .. ON CONFLICT
(upper(foo)) UPDATE ... and match that to the index.  It wouldn't
bother me if the first version of this feature couldn't target
expression indexes anyway, but if you want to include that, having the
user mention the actual expression rather than the index name
shouldn't make much difference.

>> Also, how about making the SET clause optional, with the semantics
>> that we just update all of the fields for which a value is explicitly
>> specified:
>>
>> INSERT INTO overwrite_with_abandon (key, value)
>>     VALUES (42, 'meaning of life')
>>     ON DUPLICATE (key) UPDATE;
>>
>> While the ability to specify a SET clause there explicitly is useful,
>> I bet a lot of key-value store users would love the heck out of a
>> syntax that let them omit it when they want to overwrite.
>
> While I initially like that idea, now I'm not so sure about it [2].
> MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
> command allows it, since it is defined as a DELETE followed by an
> INSERT (or something like that), but I think that in general that
> feature has been a disaster for them.

Your syntax allows the exact same thing; it simply require the user to
be more verbose in order to get that behavior.  If we think that
people wanting that behavior will be rare, then it's fine to require
them to spell it out when they want it.  If we think it will be the
overwhelming common application of this feature, and I do, then making
people spell it out when we could just as well infer it is pointless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

10 октября 2014 г., 19:16:33

Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

>> [discussion on committers rejecting the notion of a syntax 
>>  involving specification of an index name]

> as I've said many times, there are problems with naming
> columns directly that need to be addressed. There are corner-cases.
> Are you going to complain about those once I go implement something?

If I don't like what I see, yes.  I think it will be entirely
possible for you to write something along those lines that I won't
complain about.

> I think we can just throw an error in these edge-cases, but let's
> decide if we're comfortable with that.

If the list of columns does not allow a choice of a suitable
full-table unique index on only non-null columns, an error seems
appropriate.  What other problem cases do you see?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 19:33:55

On Fri, Oct 10, 2014 at 11:55 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> I haven't read the patch, but the discussion has made that clear,
> yes.  Try to take agreement on a point gracefully, will ya?  ;-)

Heh, sorry. I did literally mean what I said - it wasn't 100% clear to
me that you got that. It's safest to not make any assumptions like
that.

>> AFAICT the only confusion that my patch has is with statement-level
>> triggers, as discussed on the Wiki page. I think that the row-level
>> trigger behavior is fine; a lot more thought has gone into it.
>
> IMV it is clear that since the standard says that update or insert
> operations which affect zero rows fire the corresponding trigger,
> the statement level triggers for both should fire for an UPSERT,
> even if the set of affected rows for one or the other (or both!) is
> zero.  If we ever get transition relations, it will be easy to
> check the count of affected rows or otherwise behave appropriately
> based on what was done.

I think that you're probably right about that. Note that UPDATE rules
will not be applied to the "UPDATE portion" of the statement. Not sure
what to do about that, but I'm pretty sure that it's inevitable,
because the UPDATE is effectively the same top-level statement for the
purposes of rules.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 19:37:26

On Fri, Oct 10, 2014 at 12:04 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Peter Geoghegan <pg@heroku.com> wrote:
>
>> There is no danger of UPDATE before row-level triggers firing
>> without then updating (unless the xact aborts, but you know what
>> I mean).
>
> Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
> ... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
> without error, right?  The BEFORE UPDATE triggers will run before
> the UPDATE if a duplicate is found, and the return value will be
> treated in the usual manner, replacing values and potentially
> skipping the update?

That's exactly right, yes (in particular, you can return NULL from
before row update triggers and have that cancel the update in the
usual manner). And potentially, the final value could be affected by
both the before row insert and before row update triggers (by way of
CONFLICTING()). That's the most surprising aspect of what I've done
(although I would argue it's the least surprising behavior possible
given my constraints).

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 20:41:35

On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think what's realistic here is that the patch isn't going to get
> committed the way you have it today, because nobody else likes that
> design.  That may be harsh, but I think it's accurate.

I do think that's harsh, because it's unnecessary: I am looking for
the best design. If you want to propose alternatives, great, but there
is a reason why I've done things that way, and that should be
acknowledged. I too think naming the unique index is ugly as sin, and
have said as much multiple times. We're almost on the same page here.

> But on the substance of the issue, I'm totally unconvinced by your
> argument in the linked email.  Let's suppose you have this:
>
> create table foo (a int, b int);
> create unique index foo1 on foo (a);
> create unique index foo2 on foo (a);
> create unique index foo3 on foo (a) where b = 1;
>
> Anything that conflicts in foo1 or foo2 is going to conflict in the
> other one, and with foo3.  Anything that conflicts in foo3 is perhaps
> also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
> statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
> what to do: when we hit a conflict in any of those three indexes,
> update the row.  No matter which index has the conflict, updating is
> the right answer, and solves the conflict in all three indexes.  I
> dunno what you'd expect someone to write in your syntax to solve this
> problem - presumably either (1) they can list any of the indexes that
> might conflict, (2) they must list all of the indexes that might
> conflict, or (3) it just doesn't work.

I expect them to name exactly one unique index. They should either do
that explicitly, or have one in mind and make the behavior implicit at
the risk of miscalculating (and having a surprising outcome). It
doesn't matter if it's foo1 or foo2 in this example (but foo3 is
different, obviously).

> Whatever the actual behavior
> of your patch is today, it seems to me that the conflict is,
> fundamentally, over a set of column values, NOT over a particular
> index.  The index is simply a mechanism for noticing that conflict.

I think that this is the kernel of our disagreement. The index is not
simply a mechanism for noticing that conflict. The user had better
have one unique index in mind to provoke taking the alternative path
in the event of a would-be unique violation. Clearly it doesn't matter
much in this particular example. But what if there were two partial
unique indexes, that were actually distinct, but only in terms of the
constant appearing in their predicate? And what about having that
changed by a before insert row-level trigger? Are we to defer deciding
which unique index to use at the last possible minute?

As always with this stuff, the devil is in the details. If we work out
the details, then I can come up with something that's acceptable to
everyone. Would you be okay with this never working with partial
unique indexes? That gives me something to work with.

> If you want to allow this to work with expression indexes, that's not
> really a problem; you can let the user write INSERT .. ON CONFLICT
> (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
> bother me if the first version of this feature couldn't target
> expression indexes anyway, but if you want to include that, having the
> user mention the actual expression rather than the index name
> shouldn't make much difference.

I'm not that worried about expression indexes, actually. I'm mostly
worried about partial unique indexes, particularly when before insert
row-level triggers are in play (making *that* play nice with
expression indexes is harder still, but expression indexes on their
own are probably not that much of a problem).

>>> Also, how about making the SET clause optional, with the semantics
>>> that we just update all of the fields for which a value is explicitly
>>> specified:
>>>
>>> INSERT INTO overwrite_with_abandon (key, value)
>>>     VALUES (42, 'meaning of life')
>>>     ON DUPLICATE (key) UPDATE;

> Your syntax allows the exact same thing; it simply require the user to
> be more verbose in order to get that behavior.  If we think that
> people wanting that behavior will be rare, then it's fine to require
> them to spell it out when they want it.  If we think it will be the
> overwhelming common application of this feature, and I do, then making
> people spell it out when we could just as well infer it is pointless.

Did you consider my example? I think that people will like this idea,
too - that clearly isn't the only consideration, though. As you say,
it would be very easy to implement this. However, IMV, we shouldn't,
because it is hazardous. MySQL doesn't allow this, and they tend to
find expedients like this useful.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Jim Nasby

Дата:

10 октября 2014 г., 21:18:04

On 10/9/14, 6:59 PM, Gavin Flower wrote:
> On 10/10/14 12:38, Jim Nasby wrote:
>> On 10/8/14, 5:51 PM, Peter Geoghegan wrote:
>>> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner<kgrittn@ymail.com> wrote:
>>>> >Although the last go-around does suggest that there is at least one
>>>> >point of difference on the semantics.  You seem to want to fire the
>>>> >BEFORE INSERT triggers before determining whether this will be an
>>>> >INSERT or an UPDATE.  That seems like a bad idea to me, but if the
>>>> >consensus is that we want to do that, it does argue for your plan
>>>> >of making UPSERT a variant of the INSERT command.
>>> Well, it isn't that I'm doing it because I think that it is a great
>>> idea, with everything to recommend it. It's more like I don't see any
>>> practical alternative. We need the before row insert triggers to fire
>>> to figure out what to insert (or if we should update instead). No way
>>> around that. At the same time, those triggers are at liberty to do
>>> almost anything, and so in general we have no way of totally
>>> nullifying their effects (or side effects). Surely you see the
>>> dilemma.
>>
>> FWIW, if each row was handled in a subtransaction then an insert that turned out to be an update could be rolled
back...but the performance impact of going that route might be pretty horrid. :( There's also the potential to get
stuckin a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into
anINSERT.
 
> Perhaps you need an UPSERT trigger?

I would think that a BEFORE UPSERT trigger would very likely want to know whether we were inserting or updating, which
basicallyputs us back where we started.
 

That said, since the use case for UPSERT is different than both INSERT and UPDATE maybe it would be a good idea to have
aseparate trigger for them anyway.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: UPSERT wiki page, and SQL MERGE syntax

От

Kevin Grittner

Дата:

10 октября 2014 г., 21:19:40

Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think what's realistic here is that the patch isn't going to get
>> committed the way you have it today, because nobody else likes that
>> design.  That may be harsh, but I think it's accurate.
>
> I do think that's harsh, because it's unnecessary: I am looking for
> the best design. If you want to propose alternatives, great, but there
> is a reason why I've done things that way, and that should be
> acknowledged. I too think naming the unique index is ugly as sin, and
> have said as much multiple times. We're almost on the same page here.

I hope you can adjust to the feedback, because it would be
disappointing to have this feature slip from the release, which is
what will happen if the index name remains part of the syntax.

> Would you be okay with this never working with partial unique
> indexes? That gives me something to work with.

That seems like the only sensible course, to me.

>> If you want to allow this to work with expression indexes, that's not
>> really a problem; you can let the user write INSERT .. ON CONFLICT
>> (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
>> bother me if the first version of this feature couldn't target
>> expression indexes anyway, but if you want to include that, having the
>> user mention the actual expression rather than the index name
>> shouldn't make much difference.
>
> I'm not that worried about expression indexes, actually. I'm mostly
> worried about partial unique indexes, particularly when before insert
> row-level triggers are in play (making *that* play nice with
> expression indexes is harder still, but expression indexes on their
> own are probably not that much of a problem).

There is a problem if any column of the index allows nulls, since
that would allow multiple rows in the target table to match an
argument. Proving that an expression does not could be tricky.  I
suggest that, at least for a first cut, we restrict this to
matching an index on NOT NULL columns.

>>>> Also, how about making the SET clause optional, with the semantics
>>>> that we just update all of the fields for which a value is explicitly
>>>> specified:
>>>>
>>>> INSERT INTO overwrite_with_abandon (key, value)
>>>>    VALUES (42, 'meaning of life')
>>>>    ON DUPLICATE (key) UPDATE;
>
>> Your syntax allows the exact same thing; it simply require the user to
>> be more verbose in order to get that behavior.  If we think that
>> people wanting that behavior will be rare, then it's fine to require
>> them to spell it out when they want it.  If we think it will be the
>> overwhelming common application of this feature, and I do, then making
>> people spell it out when we could just as well infer it is pointless.

+1

> Did you consider my example? I think that people will like this idea,
> too - that clearly isn't the only consideration, though. As you say,
> it would be very easy to implement this. However, IMV, we shouldn't,
> because it is hazardous.

To quote the cited case:

> 1. Developer writes the query, and it works fine.
>
> 2. Some time later, the DBA adds an inserted_at column (those are
> common). The DBA is not aware of the existence of this particular
> query. The new column has a default value of now(), say.
>
> Should we UPDATE the inserted_at column to be NULL? Or (more
> plausibly) the default value filled in by the INSERT? Or leave it be?
> I think there is a case to be made for all of these behaviors, and
> that tension makes me prefer to not do this at all. It's like
> encouraging "SELECT *" queries in production, only worse.

That does point out the problem, in general, with carrying values
from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
fails the UPDATE phase should start with the values specified in
the first place, and not try to use anything returned by the BEFORE
INSERT triggers?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 21:24:55

On Fri, Oct 10, 2014 at 2:17 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> That said, since the use case for UPSERT is different than both INSERT and
> UPDATE maybe it would be a good idea to have a separate trigger for them
> anyway.

-1. I think that having a separate UPSERT trigger is a bad idea. I'll
need to change the statement-level behavior to match what Kevin
described (fires twice, always, when INSERT ON CONFLICT UPDATE is
used).

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

10 октября 2014 г., 22:03:20

On Fri, Oct 10, 2014 at 2:16 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Would you be okay with this never working with partial unique
>> indexes? That gives me something to work with.
>
> That seems like the only sensible course, to me.

Okay, great. If that's the consensus, then that's all I need to know.

>> I'm not that worried about expression indexes, actually. I'm mostly
>> worried about partial unique indexes, particularly when before insert
>> row-level triggers are in play

> There is a problem if any column of the index allows nulls, since
> that would allow multiple rows in the target table to match an
> argument. Proving that an expression does not could be tricky.  I
> suggest that, at least for a first cut, we restrict this to
> matching an index on NOT NULL columns.

Hmm. That could definitely be a problem. But the problem is clearly
the user's fault. It could allow multiple rows to "match", as you say
- but you're talking about a different concept of matching. Unlike
with certain other systems, that isn't a would-be unique violation,
and so it isn't a "match" in the sense I intend, and so you insert.
You don't match more than one row, because (in the sense peculiar to
this feature) there are no matches, and insertion really is the
correct outcome.

As a user, you have no more right to be surprised by this then by the
fact that you wouldn't see a dup violation with a similar vanilla
INSERT - some people are surprised by that, with regularity.

Maybe I need to emphasize the distinction in the documentation between
the two slightly different ideas of "matching" that are in tension
here. I think it would be a mistake to forcibly normalize things to
remove that distinction (by adding an artificial restriction), at any
rate.

>> Did you consider my example? I think that people will like this idea,
>> too - that clearly isn't the only consideration, though. As you say,
>> it would be very easy to implement this. However, IMV, we shouldn't,
>> because it is hazardous.
>
> To quote the cited case:
>
>> 1. Developer writes the query, and it works fine.
>>
>> 2. Some time later, the DBA adds an inserted_at column (those are
>> common). The DBA is not aware of the existence of this particular
>> query. The new column has a default value of now(), say.
>>
>> Should we UPDATE the inserted_at column to be NULL? Or (more
>> plausibly) the default value filled in by the INSERT? Or leave it be?
>> I think there is a case to be made for all of these behaviors, and
>> that tension makes me prefer to not do this at all. It's like
>> encouraging "SELECT *" queries in production, only worse.
>
> That does point out the problem, in general, with carrying values
> from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
> fails the UPDATE phase should start with the values specified in
> the first place, and not try to use anything returned by the BEFORE
> INSERT triggers?

I think that that behavior is far more problematic for numerous other
reasons, though. Potentially, you're not seeing what *really* caused
the conflict at all. I just don't think it's worth it to save a bit of
typing.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Matthew Woodcraft

Дата:

12 октября 2014 г., 12:37:12

On 2014-10-10 19:44, Kevin Grittner wrote:
> Peter Geoghegan <pg@heroku.com> wrote:
>> People keep remarking that they don't like that you can (optionally)
>> name a unique index explicitly, 

[...]

> To restate: to do so is conflating the logical definition of the 
> database with a particular implementation detail.  As just one 
> reason that is a bad idea: we can look up unique indexes on the 
> specified columns, but if we implement a other storage techniques 
> where there is no such thing as a unique index on the columns, yet 
> manage to duplicate the semantics (yes, stranger things have 
> happened), people can't migrate to the new structure without 
> rewriting their queries

Wouldn't it be good enough to define the 'WITHIN' as expecting a
unique-constraint name rather than an index name (even though those
happen to be the same strings)?

I think constraints are part of the logical definition of the database,
and a new storage technique which doesn't use indexes should still have
names for its unique constraints.

-M-

Re: UPSERT wiki page, and SQL MERGE syntax

От

Marko Tiikkaja

Дата:

12 октября 2014 г., 12:40:42

On 10/12/14, 2:36 PM, Matthew Woodcraft wrote:
> On 2014-10-10 19:44, Kevin Grittner wrote:
>> To restate: to do so is conflating the logical definition of the
>> database with a particular implementation detail.  As just one
>> reason that is a bad idea: we can look up unique indexes on the
>> specified columns, but if we implement a other storage techniques
>> where there is no such thing as a unique index on the columns, yet
>> manage to duplicate the semantics (yes, stranger things have
>> happened), people can't migrate to the new structure without
>> rewriting their queries
>
> Wouldn't it be good enough to define the 'WITHIN' as expecting a
> unique-constraint name rather than an index name (even though those
> happen to be the same strings)?
>
> I think constraints are part of the logical definition of the database,
> and a new storage technique which doesn't use indexes should still have
> names for its unique constraints.

What about partial indexes?  Indexes on expressions or functions calls?


.marko

Re: UPSERT wiki page, and SQL MERGE syntax

От

Matthew Woodcraft

Дата:

12 октября 2014 г., 12:47:53

On 2014-10-12 13:40, Marko Tiikkaja wrote:
> On 10/12/14, 2:36 PM, Matthew Woodcraft wrote:
>> On 2014-10-10 19:44, Kevin Grittner wrote:
>>> To restate: to do so is conflating the logical definition of the
>>> database with a particular implementation detail.  As just one
>>> reason that is a bad idea: we can look up unique indexes on the
>>> specified columns, but if we implement a other storage techniques
>>> where there is no such thing as a unique index on the columns, yet
>>> manage to duplicate the semantics (yes, stranger things have
>>> happened), people can't migrate to the new structure without
>>> rewriting their queries
>>
>> Wouldn't it be good enough to define the 'WITHIN' as expecting a
>> unique-constraint name rather than an index name (even though those
>> happen to be the same strings)?
>>
>> I think constraints are part of the logical definition of the database,
>> and a new storage technique which doesn't use indexes should still have
>> names for its unique constraints.
> 
> What about partial indexes?  Indexes on expressions or functions calls?

On this theory, you'd be allowed to use them with 'WITHIN' (or whatever
it would be called) if and when PostgreSQL gains the ability to create
and manage them using a form of the CONSTRAINT clause.

-M-

Re: UPSERT wiki page, and SQL MERGE syntax

От

Robert Haas

Дата:

13 октября 2014 г., 14:46:53

On Fri, Oct 10, 2014 at 4:41 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think what's realistic here is that the patch isn't going to get
>> committed the way you have it today, because nobody else likes that
>> design.  That may be harsh, but I think it's accurate.
>
> I do think that's harsh, because it's unnecessary: I am looking for
> the best design. If you want to propose alternatives, great, but there
> is a reason why I've done things that way, and that should be
> acknowledged. I too think naming the unique index is ugly as sin, and
> have said as much multiple times. We're almost on the same page here.

Come on.  You've had the syntax that way for a while, multiple people
have objected to it, and it didn't get changed.  When people renewed
their objections, you said that they were not having a "realistic
conversation".  I'm tired of getting critiqued because there's some
fine point of one of the scores of emails you've sent on this topic
that you feel I haven't adequately appreciated.  I would like to avoid
getting into a flame-war here where we spend a lot of time arguing
about who should have said what in exactly what way, but I think it is
fair to say that your emails have come across to me, and to a number
of other people here, as digging in your heels and refusing to budge.
I would go so far as to say that said perception is the primary reason
why this patch is making so little progress, far outstripping whatever
the actual technical issues are.

>> Whatever the actual behavior
>> of your patch is today, it seems to me that the conflict is,
>> fundamentally, over a set of column values, NOT over a particular
>> index.  The index is simply a mechanism for noticing that conflict.
>
> I think that this is the kernel of our disagreement. The index is not
> simply a mechanism for noticing that conflict. The user had better
> have one unique index in mind to provoke taking the alternative path
> in the event of a would-be unique violation. Clearly it doesn't matter
> much in this particular example. But what if there were two partial
> unique indexes, that were actually distinct, but only in terms of the
> constant appearing in their predicate? And what about having that
> changed by a before insert row-level trigger? Are we to defer deciding
> which unique index to use at the last possible minute?

I don't immediately see why this would cause a problem.  IIUC, the
scenario we're talking about is:

create table foo (a int, b int);
create unique index foo1 on foo (a) where b = 1;
create unique index foo2 on foo (a) where b = 2;

If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
before-insert triggers and then inspect the tuple to be inserted.  If
b is neither 1 nor 2, then we'll fail with an error saying that we
can't support ON DUPLICATE without a relevant index to enforce
uniqueness.  (This can presumably re-use the same error message that
would have popped out if the user done ON DUPLICATE (b), which is
altogether un-indexed.)  But if b is 1 or 2, then we'll search the
matching index for a conflicting tuple; finding none, we'll insert;
finding one, we'll do the update as per the user's instructions.

> As always with this stuff, the devil is in the details. If we work out
> the details, then I can come up with something that's acceptable to
> everyone. Would you be okay with this never working with partial
> unique indexes? That gives me something to work with.

If it can't be made to work with them in a reasonable fashion, I would
probably be OK with that, but so far I see no reason for such
pessimism.

>>>> Also, how about making the SET clause optional, with the semantics
>>>> that we just update all of the fields for which a value is explicitly
>>>> specified:
>>>>
>>>> INSERT INTO overwrite_with_abandon (key, value)
>>>>     VALUES (42, 'meaning of life')
>>>>     ON DUPLICATE (key) UPDATE;
>
>> Your syntax allows the exact same thing; it simply require the user to
>> be more verbose in order to get that behavior.  If we think that
>> people wanting that behavior will be rare, then it's fine to require
>> them to spell it out when they want it.  If we think it will be the
>> overwhelming common application of this feature, and I do, then making
>> people spell it out when we could just as well infer it is pointless.
>
> Did you consider my example? I think that people will like this idea,
> too - that clearly isn't the only consideration, though. As you say,
> it would be very easy to implement this. However, IMV, we shouldn't,
> because it is hazardous. MySQL doesn't allow this, and they tend to
> find expedients like this useful.

I'm considering your points in this area as well as I can, but as far
as I can tell you haven't actually set what's bad about it, just that
it might be hazardous in some way that you don't appear to have
specified, and that MySQL doesn't allow it.  I am untroubled by the
latter point; it is not our goal to confine our implementation to a
subset of MySQL.

But it would be fine to leave this kind of shortcut out of the initial
patch.  If other people agree that it's useful, it will get
re-proposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

13 октября 2014 г., 18:03:02

On Mon, Oct 13, 2014 at 7:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Come on.  You've had the syntax that way for a while, multiple people
> have objected to it, and it didn't get changed.  When people renewed
> their objections, you said that they were not having a "realistic
> conversation".  I'm tired of getting critiqued because there's some
> fine point of one of the scores of emails you've sent on this topic
> that you feel I haven't adequately appreciated.

We're in the fine point business. Obviously the issues with partial
unique indexes *are* pretty esoteric. But worrying about these edge
cases are the kind of thing that will get us an implementation of high
enough quality to commit. They're esoteric until they affect you.

> I would like to avoid
> getting into a flame-war here where we spend a lot of time arguing
> about who should have said what in exactly what way, but I think it is
> fair to say that your emails have come across to me, and to a number
> of other people here, as digging in your heels and refusing to budge.

I am not refusing to budge (in point of fact, on this question I have
already budged; see my remarks below). I am saying: Hey, there is a
reason why you're getting push back here. The reason isn't that I like
"WITHIN unique_index_name" so much - obviously I don't. Who could?

> I don't immediately see why this would cause a problem.  IIUC, the
> scenario we're talking about is:
>
> create table foo (a int, b int);
> create unique index foo1 on foo (a) where b = 1;
> create unique index foo2 on foo (a) where b = 2;
>
> If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
> before-insert triggers and then inspect the tuple to be inserted.  If
> b is neither 1 nor 2, then we'll fail with an error saying that we
> can't support ON DUPLICATE without a relevant index to enforce
> uniqueness.  (This can presumably re-use the same error message that
> would have popped out if the user done ON DUPLICATE (b), which is
> altogether un-indexed.)  But if b is 1 or 2, then we'll search the
> matching index for a conflicting tuple; finding none, we'll insert;
> finding one, we'll do the update as per the user's instructions.

Before row insert triggers might invalidate that conclusion at the
last possible moment. So you'd have to recheck, which is just messy.

>> As always with this stuff, the devil is in the details. If we work out
>> the details, then I can come up with something that's acceptable to
>> everyone. Would you be okay with this never working with partial
>> unique indexes? That gives me something to work with.
>
> If it can't be made to work with them in a reasonable fashion, I would
> probably be OK with that, but so far I see no reason for such
> pessimism.

At the very least I think that it would be a bad use of our resources
to bend over backwards to make this work for partial unique indexes,
which is what it would take. So I will proceed with the basic idea of
naming columns, while not having that ever resolve partial unique
indexes.

> I'm considering your points in this area as well as I can, but as far
> as I can tell you haven't actually set what's bad about it, just that
> it might be hazardous in some way that you don't appear to have
> specified, and that MySQL doesn't allow it.  I am untroubled by the
> latter point; it is not our goal to confine our implementation to a
> subset of MySQL.

I did - several times. I linked to the discussion [1]. I said "There
is a trade-off here. I am willing to go another way in that trade-off,
but let's have a realistic conversation about it". And Kevin
eventually said of not supporting partial unique indexes: "That seems
like the only sensible course, to me". At which point I agreed to do
it that way [2]. So you've already won this argument. All it took was
looking at my reasons for doing things that way from my perspective.
If there has been digging of heals, you should consider that it might
be for a reason, even if on balance you still disagree with my
conclusion (which was clearly not really a firm conclusion in this
instance anyway). That's all I mean here.

I have been successful at talking you out of various approaches to
this problem multiple times. This is not simple intransigence.

[1] http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93KXeHY4g@mail.gmail.com

[2] http://www.postgresql.org/message-id/CAM3SWZStdChN6-ieJbc20OGD8TwmZ6-um6O8Gz2BOFzXn9YFVg@mail.gmail.com
-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Robert Haas

Дата:

16 октября 2014 г., 13:49:04

On Mon, Oct 13, 2014 at 2:02 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
>> before-insert triggers and then inspect the tuple to be inserted.  If
>> b is neither 1 nor 2, then we'll fail with an error saying that we
>> can't support ON DUPLICATE without a relevant index to enforce
>> uniqueness.  (This can presumably re-use the same error message that
>> would have popped out if the user done ON DUPLICATE (b), which is
>> altogether un-indexed.)  But if b is 1 or 2, then we'll search the
>> matching index for a conflicting tuple; finding none, we'll insert;
>> finding one, we'll do the update as per the user's instructions.
>
> Before row insert triggers might invalidate that conclusion at the
> last possible moment. So you'd have to recheck, which is just messy.

I can't imagine that you'd decide which index to use and then change
your mind when you turn out to be wrong.  I think rather you'd compute
a list of possibly-applicable indexes based on the ON DUPLICATE column
list, and then, after firing before-insert triggers, check whether
there's one that will definitely work.  If there's a non-partial
unique index on the relevant columns, then you can put any single such
index into the list of possibly-usable indexes and leave the rest out;
otherwise, you include all the candidates and pick between them at
runtime.

If that seems too complicated, leave it out for v1: just insist that
there must be at least one unique non-partial index on the relevant
set of columns.

>> I'm considering your points in this area as well as I can, but as far
>> as I can tell you haven't actually set what's bad about it, just that
>> it might be hazardous in some way that you don't appear to have
>> specified, and that MySQL doesn't allow it.  I am untroubled by the
>> latter point; it is not our goal to confine our implementation to a
>> subset of MySQL.
>
> I did - several times. I linked to the discussion [1]. I said "There
> is a trade-off here. I am willing to go another way in that trade-off,
> but let's have a realistic conversation about it". And Kevin
> eventually said of not supporting partial unique indexes: "That seems
> like the only sensible course, to me". At which point I agreed to do
> it that way [2]. So you've already won this argument. All it took was
> looking at my reasons for doing things that way from my perspective.
> If there has been digging of heals, you should consider that it might
> be for a reason, even if on balance you still disagree with my
> conclusion (which was clearly not really a firm conclusion in this
> instance anyway). That's all I mean here.

There seems to be some confusion here.  This part was about this syntax:

>>>> INSERT INTO overwrite_with_abandon (key, value)
>>>>     VALUES (42, 'meaning of life')
>>>>     ON DUPLICATE (key) UPDATE;

That's a different issue from naming indexes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

16 октября 2014 г., 18:02:05

On Thu, Oct 16, 2014 at 6:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> If that seems too complicated, leave it out for v1: just insist that
> there must be at least one unique non-partial index on the relevant
> set of columns.

That's what I'll do.

> There seems to be some confusion here.  This part was about this syntax:
>
>>>>> INSERT INTO overwrite_with_abandon (key, value)
>>>>>     VALUES (42, 'meaning of life')
>>>>>     ON DUPLICATE (key) UPDATE;
>
> That's a different issue from naming indexes.

It is? In any case, I'm working on a revision with this syntax:

postgres=# insert into upsert values (1, 'Foo') on conflict (key)
update set val = conflicting(val);
INSERT 0 1
postgres=# insert into upsert values (1, 'Foo') on conflict (val)
update set val = conflicting(val);
ERROR:  42P10: could not infer which unique index to use from
expressions/columns provided for ON CONFLICT
LINE 1: insert into upsert values (1, 'Foo') on conflict (val) updat...                                            ^
HINT:  Partial unique indexes are not supported
LOCATION:  transformConflictClause, parse_clause.c:2365

Expression indexes work fine with this syntax.

I want to retain CONFLICTING(), although I'm thinking about changing
the spelling to EXCLUDED(). While CONFLICTING() is more or less a new
and unprecedented style of expression, and in general that's something
to be skeptical of, I think it's appropriate because what we want here
isn't quite like any existing expression. Using an alias-like syntax
is misleading, since it implies that are no effects carried from the
firing of before row insert triggers. It's also trickier to implement
alias-like referencing.

This is not a join, and I think suggesting that it is by using an
alias-like syntax to refer to excluded rows proposed for insertion
from the UPDATE is a mistake.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

16 октября 2014 г., 18:32:11

On Thu, Oct 16, 2014 at 11:01 AM, Peter Geoghegan <pg@heroku.com> wrote:
> It is? In any case, I'm working on a revision with this syntax:

By the way, in my next revision, barring any objections, the ON
CONFLICT (column/expression) syntax is mandatory in the case of ON
CONFLICT UPDATE, and optional in the case of ON CONFLICT IGNORE. If we
end up supporting exclusion constraints, it's pretty clear that
they're only useful with ON CONFLICT IGNORE anyway, and so I guess we
don't have to worry about naming those. I guess there would be some
advantage to naming an exclusion constraint directly even for the
IGNORE case (which is another reason I considered naming an index
directly), but it doesn't seem that important.

-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Robert Haas

Дата:

16 октября 2014 г., 18:39:14

On Thu, Oct 16, 2014 at 2:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I want to retain CONFLICTING(), although I'm thinking about changing
> the spelling to EXCLUDED(). While CONFLICTING() is more or less a new
> and unprecedented style of expression, and in general that's something
> to be skeptical of, I think it's appropriate because what we want here
> isn't quite like any existing expression. Using an alias-like syntax
> is misleading, since it implies that are no effects carried from the
> firing of before row insert triggers. It's also trickier to implement
> alias-like referencing.

I think that the general consensus was against that style.  I don't
like it, and IIRC a few other people commented as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

16 октября 2014 г., 19:00:15

On Thu, Oct 16, 2014 at 11:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 16, 2014 at 2:01 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> I want to retain CONFLICTING(), although I'm thinking about changing
>> the spelling to EXCLUDED(). While CONFLICTING() is more or less a new
>> and unprecedented style of expression, and in general that's something
>> to be skeptical of, I think it's appropriate because what we want here
>> isn't quite like any existing expression. Using an alias-like syntax
>> is misleading, since it implies that are no effects carried from the
>> firing of before row insert triggers. It's also trickier to implement
>> alias-like referencing.
>
> I think that the general consensus was against that style.  I don't
> like it, and IIRC a few other people commented as well.

I think that's accurate, but I'd ask those that didn't like the style
to reconsider. Also, I am willing to use any type of special
expression, and any keyword or keywords. It doesn't have to be
function-like at all. But I believe that an alias-like syntax presents
certain difficulties.

There is the fact that this isn't a join, and so misleads users, which
I've explained. But if I have to add a range table entry for the alias
to make parse analysis of the UPDATE work in the more or less
conventional way (which is something I like a lot about the current
design), then that creates considerable headaches. I have to
discriminate against those in the optimizer, when time comes to
disallow params in the auxiliary plan. I have to think about each case
then, and I think that could add a lot more complexity than you'd
think. I'm not really sure.

Basically, it's difficult to make this work for technical reasons
precisely because what I have here isn't join-like. Can I easily
disallow "OLD.*" in a RETURNING clause (recall that we only project
inserted tuples, as always)? Even if you think that's okay, I'd rather
give an error message indicating that that makes no sense, which is
what happens right now.

Besides all that, there may also be an advantage to having something
similar to MySQL.
-- 
Peter Geoghegan

Re: UPSERT wiki page, and SQL MERGE syntax

От

Greg Stark

Дата:

19 октября 2014 г., 21:53:04

On Thu, Oct 16, 2014 at 8:00 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Basically, it's difficult to make this work for technical reasons
> precisely because what I have here isn't join-like. Can I easily
> disallow "OLD.*" in a RETURNING clause (recall that we only project
> inserted tuples, as always)? Even if you think that's okay, I'd rather
> give an error message indicating that that makes no sense, which is
> what happens right now.

Well OLD and NEW are also not joins yet we expose them this way. It
always seemed like a hack to me but better one hack than two different
inconsistent hacks, no?

Independent of all that though. CONFLICTING() isn't clear to me --
conflicting is a relative property of two or more objects which are
both (or all) conflicting with each other. Which one does
"CONFLICTING" refer to here?

-- 
greg

Re: UPSERT wiki page, and SQL MERGE syntax

От

Peter Geoghegan

Дата:

19 октября 2014 г., 23:26:15

On Sun, Oct 19, 2014 at 2:52 PM, Greg Stark <stark@mit.edu> wrote:
> Well OLD and NEW are also not joins yet we expose them this way. It
> always seemed like a hack to me but better one hack than two different
> inconsistent hacks, no?

In my opinion, no. Those hacks do not appear in the parse analysis of
ordinary optimizable statements, only utility statements concerning
rules. ChangeVarNodes() does actual rewriting of magic Vars for stored
rules during rewriting, much later. Within the parser code's
directory, only transformRuleStmt() knows about the magic RTE entries
appearing in rules (PRS2_OLD_VARNO and PRS2_NEW_VARNO), and that's in
the file parse_utilcmd.c. OLD.* and NEW.* are dynamically expanded
placeholders that only exist for the purposes of stored rules. You can
see why at the very least it's a PITA to teach more or less ordinary
parse analysis - my new invocation of transformUpdateStmt() - to care
about that. Note that there are hardly any changes to
transformUpdateStmt() in the patch, and as I've said, I like that a
lot.

Even with stored rules, we only see ChangeVarNodes() changing those
OLD.*/NEW.* placeholders (that have dummy relation RTEs) to refer to
actual, existing RTEs (I think that these are typically added by rule
invocation, but I'm not an expert on the rewriter). We're not talking
about creating a new RTE artificially during ordinary parse analysis
of conventional/optimizable statements. In the patch, parse analysis
thinks that CONFLICTING(my_var) is actually an ordinary reference to
target.my_var that happens to involve ConflictingExpr, which I think
is appropriate. Making available the post-before-insert-row-triggers
tuple is driven by new code called by ExecInsert(). The executor
drives all this. All I require from the planner is a dead simple
sequential scan based update plan, that I can use the EvalPlanQual()
infrastructure with (no more, no less), that doesn't attempt to apply
any optimization (or refer to something external) that isn't
compatible with how the plan needs to be used here.

I want to ensure that the optimizer gives me only that, and magic RTEs
that need to last until execution are bad news, in my (admittedly
non-expert) opinion.

I also happen to think that the two situations (stored rules versus
CONFLICTING()) are fundamentally dissimilar from a user's perspective,
as I've said.

> Independent of all that though. CONFLICTING() isn't clear to me --
> conflicting is a relative property of two or more objects which are
> both (or all) conflicting with each other. Which one does
> "CONFLICTING" refer to here?

I think you're right about that. I thought about changing the name to
"EXCLUDED()", which reflects that the tuple is the actual tuple
excluded from insertion. In particular, a version that carries forward
all of the effects of before row insert triggers. What do you think of
that?

-- 
Peter Geoghegan

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: UPSERT wiki page, and SQL MERGE syntax