Обсуждение: Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Someone suggested to me that I solicit opinion on the chosen syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE on a separate thread. I'll do that here, while also giving a formal user-level definition of the feature. I'd like to solicit feedback from a wider set of people than those participating in the main thread, while avoiding talking about arcane details around locking which have dominated discussions up until this point. The patch proposed adds an optional ON DUPLICATE KEY LOCK FOR UPDATE clause. It also extends the RETURNING clause to support projecting "REJECTS". Idiomatic wCTE usage, through which it's possible to build an "UPSERT", might look like: CREATE TABLE foo ( a int4 PRIMARY KEY, b int4, c text ); WITH rej AS ( INSERT INTO foo (a, b, c) -- Multiple rows proposed for insertion - may insert or update each VALUES (44, 1078, 'insert'), (55, 1088, 'insert') ON DUPLICATE KEY LOCK FOR UPDATE RETURNING REJECTS a) UPDATE foo SET c = 'update' FROM rej WHERE foo.a = rej.a; This has what I like to call the fundamental upsert property: You always either get an insert, or an update. People are already incorrectly using wCTEs like this; I thought I'd make it work correctly. I believe that we'll be well served by the flexibility of what I've proposed. In many respects it is similar to SQL MERGE. For example, I could have deleted rather than updated, or I could have had a more complex predicate in the update clause, so that updates did not occur on all rejecting rows. While I'm not particularly attached to the syntax, I do think that the flexibility is a key strength. I recently suggested that rather than RETURNING REJECTS, we could have a REJECTING clause, which would see a DML statement project strictly the complement of what RETURNING projects in the same context. So perhaps you could also see what RETURNING would not have projected because a before row trigger returned NULL (i.e. when a before trigger indicates to not proceed with insertion). That is certainly more general, and so is perhaps preferable. It's also less verbose, and it seems less likely to matter that we'll need to make REJECTING a fully reserved keyword, as compared to REJECTS. (RETURNING is already a fully reserved keyword not described by the standard, so this makes a certain amount of sense to me). If nothing else, REJECTING is more terse than RETURNING REJECTS. Like the MySQL feature "INSERT...ON DUPLICATE KEY UPDATE", duplicates are defined as would-be violators of any and all unique indexes. Like the MySQL feature, the user is generally obligated to make sure they know ahead of time which unique index any violation may come from, or else they may update the wrong row (i.e. the row they ultimately update is not already locked). Unlike the MySQL feature, expert users have some capacity to recover if that problem is anticipated, because we also may project out the rejecting row's ctid. Notably the wCTE pattern, with a ctid join condition doesn't work very well, because, as src/backend/optimizer/path/tidpath.c says: * There is currently no special support for joins involving CTID; in* particular nothing corresponding to best_inner_indexscan(). Since it's* not very useful to store TIDs of one table in another table, there* doesn't seem tobe enough use-case to justify adding a lot of code* for that. You end up with a seqscan, not a tidscan, so I don't think every novice user is going to try this as a premature optimization, without appreciating the hazards of tid updates. Projecting the rejecting row's tid is an expert level feature, added mostly with things like multi-master replication conflict resolution in mind. Those use-cases will find this feature quite important, and will particularly value the flexibility. In fact, I think for that use-case, it's even more useful than SQL MERGE for a couple of reasons, in particular the capability to defer doing anything with the locked row until later commands. This feature is not supposed to fully satisfy those calling for SQL MERGE. I anticipate that we'll still get MERGE in a future release, and the implementation anticipates this as well. More formally, what the feature does is: * Ensure that a row is either inserted successfully, or that if an effort to do so was unsuccessful, the first conclusively committed tuple with a conflicting value is exclusive locked. Not all conflicting tuples are locked, just the first, although the order in which we check unique indexes for conflicts is well defined (at least with my implementation, where when we find a conflicting TID, it must really be the first one at that juncture, because all previous unique indexes are value locked). * Ensure that in READ COMMITTED mode, the locked row is always visible. Since we can get a lock on the tuple blamed for insertion not proceeding, it must be conclusively committed and not updated or deleted by anyone else, but in and of itself that isn't sufficient. The tuple's xact may logically be still-in-progress to our snapshot, and it would not be acceptable to have it be impossible to update for that reason. So there is a special case adjustment to the general semantics of MVCC snapshots, just to facilitate this feature. Higher isolation levels actively forbid themselves from proceeding when they observe that they'll have to avail of this special rule to update, by throwing a serialization failure. -- Peter Geoghegan
On 01/12/2014 06:42 AM, Peter Geoghegan wrote: > Someone suggested to me that I solicit opinion on the chosen syntax of > INSERT...ON DUPLICATE KEY LOCK FOR UPDATE on a separate thread. I'll > do that here, while also giving a formal user-level definition of the > feature. I'd like to solicit feedback from a wider set of people than > those participating in the main thread, while avoiding talking about > arcane details around locking which have dominated discussions up > until this point. After reading the following trimmed text I can see that the idea behind this solves a real user problem. It's not beautiful - but we've been rejecting things that solve the upsert problem for users for ages. This looks technically solid enough to do the job, useful, and reasonably usable with guidance from the documentation. My main question is how (if) this interacts with COPY. Would users need to continue to COPY to an UNLOGGED or TEMP table, then use this for an upsert-like operation into the target table? Or is it likely that COPY can be extended to expose similar functionality - or wrap this into a copy-on-duplicate-key-update ? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/11/2014 11:42 PM, Peter Geoghegan wrote: > I recently suggested that rather than RETURNING REJECTS, we could have > a REJECTING clause, which would see a DML statement project strictly > the complement of what RETURNING projects in the same context. So > perhaps you could also see what RETURNING would not have projected > because a before row trigger returned NULL (i.e. when a before trigger > indicates to not proceed with insertion). That is certainly more > general, and so is perhaps preferable. It's also less verbose, and it > seems less likely to matter that we'll need to make REJECTING a fully > reserved keyword, as compared to REJECTS. (RETURNING is already a > fully reserved keyword not described by the standard, so this makes a > certain amount of sense to me). If nothing else, REJECTING is more > terse than RETURNING REJECTS. I do not entirely understand what you are proposing here. Any example how this would look compared to your RETURNING REJECTS proposal? -- Andreas Karlsson
On Sun, Jan 12, 2014 at 8:12 AM, Andreas Karlsson <andreas@proxel.se> wrote: > On 01/11/2014 11:42 PM, Peter Geoghegan wrote: >> I recently suggested that rather than RETURNING REJECTS, we could have >> a REJECTING clause, which would see a DML statement project strictly >> the complement of what RETURNING projects in the same context. So >> perhaps you could also see what RETURNING would not have projected >> because a before row trigger returned NULL (i.e. when a before trigger >> indicates to not proceed with insertion). That is certainly more >> general, and so is perhaps preferable. It's also less verbose, and it >> seems less likely to matter that we'll need to make REJECTING a fully >> reserved keyword, as compared to REJECTS. (RETURNING is already a >> fully reserved keyword not described by the standard, so this makes a >> certain amount of sense to me). If nothing else, REJECTING is more >> terse than RETURNING REJECTS. > > I do not entirely understand what you are proposing here. Any example how > this would look compared to your RETURNING REJECTS proposal? It's very similar - REJECTING is a total generalization of what I already have. The difference is only that REJECTING is accepted in all contexts that RETURNING is, and not just with INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. So you could potentially have REJECTING project the slot proposed for insertion on an UPDATE where RETURNING would not. If for example a BEFORE ROW trigger fired, and returned NULL, perhaps it'd then be possible to project the slot as it was before being passed to the trigger. Perhaps there is no real demand for that, but, as I said, from a usability perspective it may be easier to reason about a feature that projects strictly the complement of what RETURNING would project in the same context. -- Peter Geoghegan
On 01/12/2014 11:20 PM, Peter Geoghegan wrote: > On Sun, Jan 12, 2014 at 8:12 AM, Andreas Karlsson <andreas@proxel.se> wrote: >> On 01/11/2014 11:42 PM, Peter Geoghegan wrote: >>> I recently suggested that rather than RETURNING REJECTS, we could have >>> a REJECTING clause, which would see a DML statement project strictly >>> the complement of what RETURNING projects in the same context. So >>> perhaps you could also see what RETURNING would not have projected >>> because a before row trigger returned NULL (i.e. when a before trigger >>> indicates to not proceed with insertion). That is certainly more >>> general, and so is perhaps preferable. It's also less verbose, and it >>> seems less likely to matter that we'll need to make REJECTING a fully >>> reserved keyword, as compared to REJECTS. (RETURNING is already a >>> fully reserved keyword not described by the standard, so this makes a >>> certain amount of sense to me). If nothing else, REJECTING is more >>> terse than RETURNING REJECTS. >> >> I do not entirely understand what you are proposing here. Any example how >> this would look compared to your RETURNING REJECTS proposal? > > It's very similar - REJECTING is a total generalization of what I > already have. The difference is only that REJECTING is accepted in all > contexts that RETURNING is, and not just with INSERT...ON DUPLICATE > KEY LOCK FOR UPDATE. So you could potentially have REJECTING project > the slot proposed for insertion on an UPDATE where RETURNING would > not. If for example a BEFORE ROW trigger fired, and returned NULL, > perhaps it'd then be possible to project the slot as it was before > being passed to the trigger. Perhaps there is no real demand for that, > but, as I said, from a usability perspective it may be easier to > reason about a feature that projects strictly the complement of what > RETURNING would project in the same context. So simply this? WITH rej AS ( INSERT INTO foo (a, b, c) VALUES (44, 1078, 'insert'), (55, 1088, 'insert') REJECTING a) UPDATE foo SET c = 'update' FROM rej WHERE foo.a = rej.a; Another question: have you given any thought on the case where you want to use both the successfully inserted tuples and the rejected and use in the CTE? Is that even something anyone would want? Would perhaps MERGE be more suited for that? -- Andreas Karlsson
On 1/12/14, 9:35 PM, Andreas Karlsson wrote: > On 01/12/2014 11:20 PM, Peter Geoghegan wrote: >> On Sun, Jan 12, 2014 at 8:12 AM, Andreas Karlsson <andreas@proxel.se> wrote: >>> On 01/11/2014 11:42 PM, Peter Geoghegan wrote: >>>> I recently suggested that rather than RETURNING REJECTS, we could have >>>> a REJECTING clause, which would see a DML statement project strictly >>>> the complement of what RETURNING projects in the same context. So >>>> perhaps you could also see what RETURNING would not have projected >>>> because a before row trigger returned NULL (i.e. when a before trigger >>>> indicates to not proceed with insertion). That is certainly more >>>> general, and so is perhaps preferable. It's also less verbose, and it >>>> seems less likely to matter that we'll need to make REJECTING a fully >>>> reserved keyword, as compared to REJECTS. (RETURNING is already a >>>> fully reserved keyword not described by the standard, so this makes a >>>> certain amount of sense to me). If nothing else, REJECTING is more >>>> terse than RETURNING REJECTS. >>> >>> I do not entirely understand what you are proposing here. Any example how >>> this would look compared to your RETURNING REJECTS proposal? >> >> It's very similar - REJECTING is a total generalization of what I >> already have. The difference is only that REJECTING is accepted in all >> contexts that RETURNING is, and not just with INSERT...ON DUPLICATE >> KEY LOCK FOR UPDATE. So you could potentially have REJECTING project >> the slot proposed for insertion on an UPDATE where RETURNING would >> not. If for example a BEFORE ROW trigger fired, and returned NULL, >> perhaps it'd then be possible to project the slot as it was before >> being passed to the trigger. Perhaps there is no real demand for that, >> but, as I said, from a usability perspective it may be easier to >> reason about a feature that projects strictly the complement of what >> RETURNING would project in the same context. > > So simply this? > > WITH rej AS ( > INSERT INTO foo (a, b, c) > VALUES (44, 1078, 'insert'), (55, 1088, 'insert') > REJECTING a) > UPDATE foo SET c = 'update' FROM rej WHERE foo.a = rej.a; > > Another question: have you given any thought on the case where you want to use both the successfully inserted tuples andthe rejected and use in the CTE? Is that even something anyone would want? Would perhaps MERGE be more suited for that? Well, a common case for INSERT RETURNING is to get your set of surrogate keys back; so I think users would want the abilityto RETURN what finally made it into the table. Also, if we want to support the case of identifying tuples where a BEFORE trigger disallowed the insert, we probably wantto expose that that's why those tuples were rejected (as opposed to them being rejected due to a duplicate key violation). -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Mon, Jan 13, 2014 at 2:20 PM, Jim Nasby <jim@nasby.net> wrote: > Well, a common case for INSERT RETURNING is to get your set of surrogate > keys back; so I think users would want the ability to RETURN what finally > made it into the table. Your update can also have a RETURNING clause. I'm not necessarily that attached to fully generalizing RETURNING REJECTS as REJECTING. It was just an idea. When an insert is rejected and you lock a conflicting row, it hardly matters what your surrogate key might have been had that insert succeeded. To get the surrogate key when it upsert inserts, do a regular INSERT....RETURNING..., and break the work up into multiple commands. That will almost always be sufficient, because you'll almost always know ahead of time where the conflict might be (certainly, the MySQL feature mandates that you do know). > Also, if we want to support the case of identifying tuples where a BEFORE > trigger disallowed the insert, we probably want to expose that that's why > those tuples were rejected (as opposed to them being rejected due to a > duplicate key violation). The ctid *won't* indicate a specific rejecting row then, I guess, which will do it. -- Peter Geoghegan
On 1/13/14, 5:05 PM, Peter Geoghegan wrote: > On Mon, Jan 13, 2014 at 2:20 PM, Jim Nasby <jim@nasby.net> wrote: >> Well, a common case for INSERT RETURNING is to get your set of surrogate >> keys back; so I think users would want the ability to RETURN what finally >> made it into the table. > > Your update can also have a RETURNING clause. I'm not necessarily that > attached to fully generalizing RETURNING REJECTS as REJECTING. It was > just an idea. When an insert is rejected and you lock a conflicting > row, it hardly matters what your surrogate key might have been had > that insert succeeded. > > To get the surrogate key when it upsert inserts, do a regular > INSERT....RETURNING..., and break the work up into multiple commands. > That will almost always be sufficient, because you'll almost always > know ahead of time where the conflict might be (certainly, the MySQL > feature mandates that you do know). As long as there's a way to get back what was ultimately inserted or updated that'd work... there might be some cases whereyou'd actually want to know what the result of the REJECTING command was (ie: did the update do something fancy?). Actually, you'd also want to know if triggers did anything. So we definitely want to keep the existing RETURNING behavior(sorry, I don't know offhand if you've kept that or not). >> Also, if we want to support the case of identifying tuples where a BEFORE >> trigger disallowed the insert, we probably want to expose that that's why >> those tuples were rejected (as opposed to them being rejected due to a >> duplicate key violation). > > The ctid *won't* indicate a specific rejecting row then, I guess, > which will do it. Yeah, the only other thing you might want is the name of the trigger that returned NULL... that would allow you to do somethingdifferent based on which trigger it was. Part of me thinks that'd be useful... part of me thinks it's just a foot-gun... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net