Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
От | Peter Geoghegan |
---|---|
Тема | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Дата | |
Msg-id | CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
(Heikki Linnakangas <hlinnakangas@vmware.com>)
|
Список | pgsql-hackers |
On Wed, Dec 18, 2013 at 8:39 PM, Peter Geoghegan <pg@heroku.com> wrote: > Empirically, retrying because ExecInsertIndexTuples() returns some > recheckIndexes occurs infrequently, so maybe that makes all of this > okay. Or maybe it happens infrequently *because* we don't give up on > insertion when it looks like the current iteration is futile. Maybe > just inserting into every unique index, and then blocking on an xid > within ExecCheckIndexConstraints(), works out fairly and performs > reasonably in all common cases. It's pretty damn subtle, though, and I > worry about the worst case performance, and basic correctness issues > for these reasons. I realized that it's possible to create the problem that I'd previously predicted with "promise tuples" [1] some time ago, that are similar in some regards to what Heikki has here. At the time, Robert seemed to agree that this was a concern [2]. I have a very simple testcase attached, much simpler that previous testcases, that reproduces deadlock for the patch exclusion_insert_on_dup.2013_12_12.patch.gz at scale=1 frequently, and occasionally when scale=10 (for tiny, single-statement transactions). With scale=100, I can't get it to deadlock on my laptop (60 clients in all cases), at least in a reasonable time period. With the patch btreelock_insert_on_dup.2013_12_12.patch.gz, it will never deadlock, even with scale=1, simply because value locks are not held on to across row locking. This is why I characterized the locking as "opportunistic" on several occasions in months past. The test-case is actually much simpler than the one I describe in [1], and much simpler than all previous test-cases, as there is only one unique index, though the problem is essentially the same. It is down to old "value locks" held across retries - with "exclusion_...", we can't *stop* locking things from previous locking attempts (where a locking attempt is btree insertion with the UNIQUE_CHECK_PARTIAL flag), because dirty snapshots still see inserted-then-deleted-in-other-xact tuples. This deadlocking seems unprincipled and unjustified, which is a concern that I had all along, and a concern that Heikki seemed to share more recently [3]. This is why I felt strongly all along that value locks ought to be cheap to both acquire and _release_, and it's unfortunate that so much time was wasted on tangential issues, though I do accept some responsibility for that. So, I'd like to request as much scrutiny as possible from as wide as possible a cross section of contributors of this test case specifically. This feature's development is deadlocked on resolving this unprincipled deadlocking controversy. This is a relatively easy thing to have an opinion on, and I'd like to hear as many as possible. Is this deadlocking something we can live with? What is a reasonable path forward? Perhaps I am being pedantic in considering unnecessary deadlocking as ipso facto unacceptable (after all, MySQL lived with this kind of problem for long enough, even if it has gotten better for them recently), but there is a very real danger of painting ourselves into a corner with these concurrency issues. I aim to have the community understand ahead of time the exact set of trade-offs/semantics implied by our chosen implementation, whatever the outcome. That seems very important. I myself lean towards this being a blocker for the "exclusion_" approach at least as presented. Now, you might say to yourself "why should I assume that this isn't just attributable to btree page buffer locks being coarser than other approaches to value locking?". That's a reasonable point, and indeed it's why I avoided lower scale values in prior, more complicated test-cases, but that doesn't actually account for the problem highlighted: In this test-case we do not hold buffer locks across other buffer locks within a single backends (at least in any new way), nor do we lock rows while holding buffer locks within a single backend. Quite simply, the conventional btree value locking approach doesn't attempt to lock 2 things within a backend at the same time, and you need to do that to get a deadlock, so there are no deadlocks. Importantly, the "btree_..." implementation can release value locks. Thanks P.S. In the interest of reproducibility, I attach new revisions of each patch, even though there is no reason to believe that any changes since the last revision posted are significant to the test-case. There was some diff minimization, plus I incorporated some (but not all) unrelated feedback from Tom. It wasn't immediately obvious, at least to me, that "rejects" can be made to be an unreserved keyword, due to shift/reduce conflicts, but I did document AM changes. Hopefully this gives some indication of the essential nature or intent of my design that we may work towards refining (depending on the outcome of discussion here, of course). P.P.S. Be careful not to fall afoul of the shift/reduce conflicts when applying either patch on top of commit 1b4f7f93b4693858cb983af3cd557f6097dab67b. I'm working on a fix that allows a clean rebase. [1] http://www.postgresql.org/message-id/CAM3SWZRfrw+zXe7CKt6-QTCuvKQ-Oi7gnbBOPqQsvddU=9M7_g@mail.gmail.com [2] http://www.postgresql.org/message-id/CA+TgmobwDZSVcKWTmVNBxeHSe4LCnW6zon2soH6L7VoO+7tAzw@mail.gmail.com [3] http://www.postgresql.org/message-id/528B640F.50601@vmware.com -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: