Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
От | Robert Haas |
---|---|
Тема | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Дата | |
Msg-id | CA+TgmoaBXekYM5j8hbrvUSwV90Au8VPmwVTtCXba=CPaYHDHsw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
(Peter Geoghegan <pg@heroku.com>)
|
Список | pgsql-hackers |
On Mon, Sep 23, 2013 at 7:05 PM, Peter Geoghegan <pg@heroku.com> wrote: > It suits my purposes to have the value locks be held for only an > instant, because: > > [ detailed explanation ] I don't really disagree with any of that. TBH, I think the question of how long value locks (as you call them) are held is going to boil down to a question of how they end up being implemented. As I mentioned to you at PG Open (going through the details here for those following along at home), we could optimistically insert the new heap tuple, then go add index entries for it, and if we find a conflict, then instead of erroring out, we mark the tuple we were inserting dead and go try update the conflicting tuple instead. In that implementation, if we find that we have to wait for some other transaction along the way, it's probably not worth reversing out the index entries already inserted, because getting them into the index in the first place was a WAL-logged operation, and therefore relatively expensive, and IMHO it's most likely better to just hope things work out than to risk having to redo all of that. On the other hand, if the locks are strictly in-memory, then the cost of releasing them all before we go to wait, and of reacquiring them after we finish waiting, is pretty low. There might be some modularity issues to work through there, but they might not turn out to be very painful, and the advantages you mention are certainly worth accruing if it turns out to be fairly straightforward. Personally, I think that trying to keep it all in-memory is going to be hard. The problem is that we can't de-optimize regular inserts or updates to any significant degree to cater to this feature - because as valuable as this feature is, the number of times it gets used is still going to be a whole lot smaller than the number of times it doesn't get used. Also, I tend to think that we might want to define the operation as a REPLACE-type operation with respect to a certain set of key columns; and so we'll do the insert-or-update behavior with respect only to the index on those columns and let the chips fall where they may with respect to any others. In that case this all becomes much less urgent. > Even *considering* this is largely academic, though, because without > some kind of miracle guaranteeing serial ordering, a miracle that > doesn't allow for serialization failures and also doesn't seriously > slow down, for example, updates (by making them care about value locks > *before* they do anything, or in the case of HOT updates *at all*), > all of this is _impossible_. So, I say let's just do the > actually-serial-ordering for value lock acquisition with serialization > failures where we're > read committed. I've seriously considered what > it would take to do it any other way so things would work how you and > Andres expect for read committed, and it makes my head hurt, because > apart from seeming unnecessary to me, it also seems completely > hopeless. > > Am I being too negative here? Well, I guess that's possible. The fact > is that it's really hard to judge, because all of this is really hard > to reason about. That's what I really don't like about it. Suppose we define the operation as REPLACE rather than INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. Then we could do something like this: 1. Try to insert a tuple. If no unique index conflicts occur, stop. 2. Note the identity of the conflicting tuple and mark the inserted heap tuple dead. 3. If the conflicting tuple's inserting transaction is still in progress, wait for the inserting transaction to end. 4. If the conflicting tuple is dead (e.g. because the inserter aborted), start over. 5. If the conflicting tuple's key columns no longer match the key columns of the REPLACE operation, start over. 6. If the conflicting tuple has a valid xmax, wait for the deleting or locking transaction to end. If xmax is still valid, follow the CTID chain to the updated tuple, let that be the new conflicting tuple, and resume from step 5. 7. Update the tuple, even though it may be invisible to our snapshot (a deliberate MVCC violation!). While this behavior is admittedly wonky from an MVCC perspective, I suspect that it would make a lot of people happy. >>> I respectfully >>> suggest that that exact definition of upsert isn't a useful one, >>> because other snapshot isolation/MVCC systems operating within the >>> same constraints must have the same issues, and yet they manage to >>> implement something that could be called upsert that people seem happy >>> with. >> >> Yeah. I wonder how they do that. > > My guess is that they have some fancy snapshot type that is used by > the equivalent of ModifyTable subplans, that is appropriately paranoid > about the Halloween problem and so on. How that actually might work is > far from clear, but it's a question that I have begun to consider. As > I said, a concern is that it would be in tension with the generalized, > composable syntax, where we don't explicitly have a "special update". > I'd really like to hear other opinions, though. The tension here feels fairly fundamental to me; I don't think our implementation is to blame. I think the problem isn't so much to figure out a clever trick that will make this all work in a truly elegant fashion as it is to decide exactly how we're going to compromise MVCC semantics in the least blatant way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: