Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
От | Peter Geoghegan |
---|---|
Тема | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Дата | |
Msg-id | CAM3SWZSdjNTnUWhxxXKutSQ6beXKUc5j2rRno+32+jt2tHx2QQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
(Stephen Frost <sfrost@snowman.net>)
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Peter Geoghegan <pg@heroku.com>) Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Tue, Sep 17, 2013 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Sep 14, 2013 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Note that today there is no guarantee that the original waiter for a >> duplicate-inserting xact to complete will be the first one to get a >> second chance > ProcLockWakeup() only wakes as many waiters from the head of the queue > as can all be granted the lock without any conflicts. So I don't > think there is a race condition in that path. Right, but what about XactLockTableWait() itself? It only acquires a ShareLock on the xid of the got-there-first inserter that potentially hasn't yet committed/aborted. There will be no conflicts between multiple second-chance-seeking blockers trying to acquire this lock concurrently, and so in fact there is (what I guess you'd consider to be) a race condition in the current btree insertion code. So my earlier point about according an upsert implementation license to optimize ordering of retries across multiple unique indexes -- that it isn't really inconsistent with the current code when dealing with only one unique index insertion -- has not been invalidated. EvalPlanQualFetch() and Do_MultiXactIdWait() also call XactLockTableWait(), for similar reasons. In my patch, the later row locking code used by INSERT...ON DUPLICATE KEY LOCK FOR UPDATE calls XactLockTableWait() too. >> So far it's >> been a slippery slope type argument that can be equally well used to >> argue against some facet of almost any substantial patch ever >> proposed. > > I don't completely agree with that characterization, but you do have a > point. Obviously, if the differences in the area of interruptibility, > starvation, deadlock risk, etc. can be made small enough relative to > the status quo can be made small enough, then those aren't reasons to > reject the approach. That all seems fair to me. That's the standard that I'd apply as a reviewer myself. > But I'm skeptical that you're going to be able to accomplish that, > especially without adversely affecting maintainability. I think the > way that you're proposing to use lwlocks here is sufficiently > different from what the rest of the system does that it's going to be > hard to avoid system-wide affects that can't easily be caught during > code review; Fair enough. In case it isn't already totally clear to someone, I concede that it isn't going to be workable to hold even shared buffer locks across all these operations. Let's get past that, though. > and like Andres, I don't share your skepticism about > alternative approaches. Well, I expressed skepticism about one alternative approach in particular, which is the promise tuples approach. Andres seems to think that I'm overly concerned about bloat, but I'm not sure he appreciates why I'm so sensitive to it in this instance. I'll be particularly sensitive to it if value locks need to be held indefinitely rather than there being a speculative grab-the-value-locks attempt (because that increases the window in which another session can necessitate that we retry at row locking time quite considerably - see below). > I think the fundamental > problem with UPSERT, MERGE, and this proposal is what happens when the > conflicting tuple is present but not visible to your scan, either > because it hasn't committed yet or because it has committed but is not > visible to your snapshot. Yeah, you're right. As I mentioned to Andres already, when row locking happens and there is this kind of conflict, my approach is to retry from scratch (go right back to before value lock acquisition) in the sort of scenario that generally necessitates EvalPlanQual() looping, or to throw a serialization failure where that's appropriate. After an unsuccessful attempt at row locking there could well be an interim wait for another xact to finish, before retrying (at read committed isolation level). This is why I think that value locking/retrying should be cheap, and should avoid bloat if at all possible. Forgive me if I'm making a leap here, but it seems like what you're saying is that the semantics of upsert that one might naturally expect are *arguably* fundamentally impossible, because they entail potentially locking a row that isn't current to your snapshot, and you cannot throw a serialization failure at read committed. I respectfully suggest that that exact definition of upsert isn't a useful one, because other snapshot isolation/MVCC systems operating within the same constraints must have the same issues, and yet they manage to implement something that could be called upsert that people seem happy with. > I also discovered, after it was committed, that it didn't help in the > way I expected: > > http://www.postgresql.org/message-id/CA+TgmoY8P3sD=oUViG+xZjmZk5-phuNV39rtfyzUQxU8hJtZxw@mail.gmail.com Well, at the time you didn't also provide raw commit latency benchmark results for your hardware using a tool like pg_test_fsync, which I'd consider absolutely essential to such a discussion. That's mostly or entirely what the group commit stuff does - amortize that cost among concurrently flushing transactions. Around this time, the patch was said by Heikki to just relieve lock contention around WALWriteLock - the 9.2 release notes say much the same. I never understood it that way, though Heikki disagreed with that [1]. Certainly, if relieving contention was all the patch did, then you wouldn't expect the 9.3 commit_delay implementation to help anyone, but it does: with a slow fsync holding the lock 50% *longer* can actually help tremendously. So I *always* agreed with you that there was hardware where group commit would barely help with a moderately sympathetic benchmark like the pgbench default. Not that it matters much now. > It's true that I didn't raise those concerns contemporaneously with > the commit, but I didn't understand the situation well enough at that > time to realize how narrow the benefit was. > > I've wished, on a number of occasions, to be able to add more lwlock > primitives. The problem with that is that if everybody does it, we'll > pretty soon end up with a mess. I wouldn't go that far. The number of possible additional primitives that are useful isn't that high, unless we decide that LWLocks are going to be a fundamentally different thing, which I consider unlikely. > http://www.postgresql.org/message-id/CA+Tgmob4YE_k5dpO0T07PNf1SOKPybo+wj4m4FryOS7Z4_yOzg@mail.gmail.com > > ...but nobody (including me) was very sure that was the right way > forward, and it never went anywhere. However, I think the basic issue > remains. I was sad to discover last week that Heikki handled this > problem for the WAL scalability patch by basically copy-and-pasting > much of the lwlock code and then hacking it up. I think we're well on > our way to an unmaintainable mess already, and I don't want it to get > worse. :-( I hear what you're saying about LWLocks. I did follow the FlexLocks stuff at the time myself. Obviously we aren't going to add new lwlock operations if they have exactly no clients. However, I think that the semantics implemented (weakening and strengthening of locks) may well be handy somewhere else. So while I wouldn't go and commit that stuff on the off chance that it will be useful, it's worth bearing in mind going forward that it's quite possible to weaken/strengthen locks. [1] http://www.postgresql.org/message-id/4FB0A673.7040002@enterprisedb.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "MauMau"Дата:
Сообщение: Re: UTF8 national character data type support WIP patch and list of open issues.