Re: A third lock method
От | Kevin Grittner |
---|---|
Тема | Re: A third lock method |
Дата | |
Msg-id | 4B3BC24A020000250002DB1D@gw.wicourts.gov обсуждение исходный текст |
Ответ на | A third lock method ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Ответы |
Re: A third lock method
|
Список | pgsql-hackers |
Bruce Momjian wrote: > I must be missing something but I thought the only problem with our > existing snapshot system was that you could see a row updated after > your snapshot was created, and that the solution to that was to > abort the transaction that would see the new row. Can you tell me > what I am missing? Well, that's roughly on target as a 30,000 foot overview, although I think you're also working in an issue from the read committed implementation. The issue with read committed is that you start with one snapshot but may wind up combining views of data from that initial snapshot with one or more updated versions of rows from subsequent commits -- if you blocked on a conflicting write which was subsequently committed. I recently started a thread which drifted into a discussion of that issue, but it's almost completely orthogonal to the issue of implementing truly serializable transactions -- the overlap is that a fix for the read committed anomalies might provide groundwork for an optimization to the serializable implementation. After re-reading the thread on the read committed issue and pondering it a bit more, I'm inclined to think that we should worry about that if and when serializable is working and we actually see problems that the aforementioned optimization might fix. (Unless someone cares enough about the read committed anomalies to want to champion that as a separate issue.) What I'm trying to stay focused on is the serializable implementation. I'd settle for the traditional Strict 2 Phase Locking (S2PL) approach, but since there is a new technique which appears to perform much better than that for most workloads, and has most of the benefits of snapshot isolation (reads don't block writes and vice versa), I figure why not go for the gold. You are right in that the new technique will basically work like snapshot isolation but will roll back a transaction when there is one transaction which reads data which is modified by a concurrent transaction and writes data which is read by a concurrent transaction (assuming all are serializable). Years of research by Dr Alan Fekete et al has established that only when there is a cycle of such read-write dependencies among transactions, the graph contains a "pivot" transaction which has dependencies in both directions (to the same or different transactions), and a transaction which is on the write end of such an edge commits first, can snapshot isolation mode show behavior inconsistent with fully serializable behavior. The SSI technique doesn't attempt to build and analyze a complete graph of such dependencies because the overhead would be prohibitive; rather, it looks for the "pivot" which must be present in such a graph. The SSI technique requires that serializable transactions (but not transactions in other isolation levels) use a non-blocking SIREAD lock to track reads, so that the read-write dependencies among concurrent serializable transactions can be identified. The hard part is that this does require some implementation of predicate locking to prevent problems with phantom rows. Honestly, that is probably going to be over half the battle. I had been nibbling around the edges of the serializable issue trying to address optimizations which should help performance of serializable transactions by reducing the rollback rate; but came to the conclusion that that was all premature optimization and a distraction. I think the thing to do is start with the hard part, so I've been reading up on practical techniques for implementing predicate locking, and reading through the PostgreSQL locking code, trying to get my head around this. Once this hardest part is solved, I really think that a workable serializable mode is not too far past it, and *then* would be the time to look at optimizations. It's just a little bit of a stretch to call SILOCKs locks, because they don't actually block anything. They are used at various points to see where a transaction is reading data which has been modified by another transaction or vice versa. And they do need to be kept until all concurrent transactions have completed. Other than those quirks, they behave pretty much like read locks, though, so it seems to make sense to use the locking system for them. The differences are such that I thought a new lock method might be appropriate. This thread is to try to solicit opinions on whether that makes sense to anyone but me. :-) Once I sort out the subject issue, I'm about ready to try to start generating a very rough prototype of predicate locking. I don't want to start a discussion of those details on this thread, because it seems to me that a decision on the subject issue affects significant details about how I go about that. Consider this the 5,000 foot overview. Previous thread contain much more, and references to more authoritative documents. This is not a small effort. I'll be surprised if it comes in at less than 2,000 lines of code, plus documentation and tests. I'm not really expecting it to make the next release after 8.5. With some luck, maybe the next release after that. And that's all conditional on our CIO approving programming time for the effort. I'm currently trying to dig in far enough to provide more accurate estimates and time lines to facilitate approval. Hopefully I haven't just scared you off of your offer to help. ;-) -Kevin
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: Hot Standy introduced problem with query cancel behavior