Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
От | Peter Geoghegan |
---|---|
Тема | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Дата | |
Msg-id | CAM3SWZTe2BjohvWmD5fAV1+3G-CQO4TzvBoSLwK0qG4wio_fwg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Stephen Frost <sfrost@snowman.net>) |
Список | pgsql-hackers |
Hi Stephen, On Fri, Sep 20, 2013 at 6:55 PM, Stephen Frost <sfrost@snowman.net> wrote: > I'm not sure I follow this completely- you're saying that a definition > of 'upsert' which includes having to lock rows which aren't in your > current snapshot (for reasons stated) isn't a useful one. Is the > implication that a useful definition of 'upsert' is that it *doesn't* > have to lock rows which aren't in your current snapshot, and if so, then > what would the semantics of that upsert look like? No, I'm suggesting that the useful semantics are that it does potentially lock rows not yet visible to our snapshot that have committed - the latest row version. I see no alternative (we can't throw a serialization failure at read committed isolation level), and Andres seemed to agree that this was the way forward. Robert described problems he saw with this a few years ago [1]. It *is* a problem (we need to think very carefully about it), but, as I've said, it is a problem that anyone implementing this feature for a Snapshot Isolation/MVCC database would have to deal with, and several have. So, what the patch does right now is (if you squint) analogous to how SELECT FOR UPDATE uses EvalPlanQual already. However, instead of re-verifying a qual, we're re-verifying that the value locking has identified the right tid (there will probably be a different one in the subsequent iteration, or maybe we *can* go insert this time). We need consensus across unique indexes to go ahead with insertion, but once we know that we can't (and have a tid to lock), value locks can go away - we'll know if anything has changed about the tid's logical row that we need to care about when row locking. Besides, holding value locks while row locking has deadlock hazards, and, because value locks only stop insertions *finishing*, holding on to them is at best pointless. The tid we get from locking, that points to a would-be duplicate heap tuple has always committed - otherwise we'd never return from locking, because that blocks pending the outcome of a duplicate-inserting-xact (and only returns the tid when that xact commits). Even though this tuple is known to be visible, it may be deleted in the interim before row locking, in which case restarting from before value locking is appropriate. It might also be updated, which would necessitate locking a later row version in order to prevent race conditions. But it seems dangerous, invasive, and maybe even generally impossible to try and wait for the transaction that updated to commit or abort so that we can lock that later version the usual way (the usual EvalPlanQual looping thing) - better to restart value locking. The fundamental observation about value locking (at least for any half-way reasonable implementation), that I'd like to emphasize, is that short of a radical overhaul that would have many downsides, it can only ever prevent insertion from *finishing*. The big picture of my design is that it tries to quickly grab value locks, release them and grab a row lock (or insert heap tuples, index tuples, and then release value locks). If row locking fails, it waits for the conflicter xact to finish, and then restarts before the value locking of the current slot. If you think that's kind of questionable, maybe you have a point, but consider: 1. How else are you going to handle it if row locking needs to handle conflicts? You might say "I can re-verify that no unique index columns were affected instead", and maybe you can, but what if that doesn't help because they *were* changed? Besides, doesn't this break the amcanunique contract? Surely judging what's really a duplicate is the AM's job. You're back to "I need to throw an error to get out of this but I have no good excuse to do so at read committed" -- you've lost the usual duplicate key error "excuse". I don't think you can expect holding the value locks throughout row locking to help, because, as I've said, that causes morally indefensible deadlocks, and besides, it doesn't stop what row locking would consider to be a conflict, it just stops insertion from *finishing*. 2. In the existing btree index insertion code, the order that retries occur in the event of unique index tuple insertion finding an unfinished conflicting xact *is* undefined. Yes, that's right - the first waiter is not guaranteed to be the first to get a second chance. It's not even particularly probable! See remarks from my last mail to Robert for more information. 3. People with a real purist's view on the (re)ordering of value locking must already think that EvalPlanQual() is completely ghetto for very similar reasons, and as such should just go use a higher isolation level. For the rest of us, what concurrency control anomaly can allowing this cause over and above what's already possible there? Are lock starvation risks actually appreciably raised at all? What Andres and Robert seem to expect generally - that value locks only be released when we the locker has a definitive answer - actually *can* be ensured at the higher isolation levels, where the system has license to bail out by throwing a serialization failure. The trick there is just to throw an error if the first *retry* at cross-index value locking is unsuccessful or blocks on a whole other xact -- a serialization error (and not a unique constraint violation error, as would often but not necessarily otherwise occur for non-upserters). Naturally, it could also happen that at > read committed, row locking throws a serialization failure (as is always mandated over using EvalPlanQual() or other monkeying around at higher isolation levels). > This I am generally in agreement with, to the extent that 'upsert' is > something we really want and we should figure out a way to get there > from here, but it wouldn't be the first time that we worked out a > better solution than existing implementations. So, another '+1' from me > wrt your working this issue and please don't get too discouraged that > there's a lot of pressure to find a magic bullet Thanks for the encouragement! [1] http://www.postgresql.org/message-id/AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: