Обсуждение: Select For Update and Left Outer Join
[I previously posted this to pgsql-sql, but received no response as of yet... it's more of a development team oriented message in any case.] In ORMs like NHibernate, there are a few strategies for mapping inheritance to SQL. One of these is "Joined Subclass," which allows for the elimination of duplicate data and clean separation of class contents. With a class hierarchy such as this: Pet Dog : Pet Cat : Pet The query to get all the pets is as follows: select * from Pet left join Dog on Dog.Id = Pet.Id left join Cat on Cat.Id = Pet.Id Now suppose you want to lock to ensure that your Cat is not updated concurrently. You add FOR UPDATE, but then PostgreSQL gets upset and complains that locking on the nullable side of an outer join is not allowed. From our data model, we know that for every single Pet, there can never be a Dog or Cat that spontaneously appears, so locking in this case is totally safe. Unfortunately, PostgreSQL doesn't seem to provide any mechanism to lock just the rows involved in this query. Any advice? I'd be happy if such a thing was implemented in the engine, as it's supported by other databases without trouble. As another note, I'm one of the NHibernate developers and I'm working to get all the NHibernate tests working with PostgreSQL. The two significant cases I've had to avoid testing are the "FOR UPDATE" mentioned above and null characters in UTF strings. Storing a UTF "char" which defaults to zero doesn't work on PostgreSQL because it's apparently still using zero-terminated string functions. :( Aside from those two things, it looks like PostgreSQL is going to be passing all the tests soon, so that's good news. :) Patrick Earl
Patrick Earl <patearl@patearl.net> writes: > The query to get all the pets is as follows: > select * from Pet > left join Dog on Dog.Id = Pet.Id > left join Cat on Cat.Id = Pet.Id > Now suppose you want to lock to ensure that your Cat is not updated > concurrently. You add FOR UPDATE, but then PostgreSQL gets upset and > complains that locking on the nullable side of an outer join is not > allowed. Quite. What would it mean to lock the absence of a row? > From our data model, we know that for every single Pet, there can > never be a Dog or Cat that spontaneously appears, so locking in this > case is totally safe. You might know that, but you didn't explain how you know that or how the database could be expected to know it. regards, tom lane
On Sun, May 1, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Patrick Earl <patearl@patearl.net> writes: >> The query to get all the pets is as follows: > >> select * from Pet >> left join Dog on Dog.Id = Pet.Id >> left join Cat on Cat.Id = Pet.Id > >> Now suppose you want to lock to ensure that your Cat is not updated >> concurrently. You add FOR UPDATE, but then PostgreSQL gets upset and >> complains that locking on the nullable side of an outer join is not >> allowed. > > Quite. What would it mean to lock the absence of a row? > >> From our data model, we know that for every single Pet, there can >> never be a Dog or Cat that spontaneously appears, so locking in this >> case is totally safe. > > You might know that, but you didn't explain how you know that or how > the database could be expected to know it. > > regards, tom lane > I would argue that SELECT FOR UPDATE never locks on the absence of a row. For example, if I do: SELECT * FROM Table WHERE Column = 10 The existing rows are locked, but somebody could come along and add another unlocked row with Column = 10. Whether I'm specifying a secondary set of records (via a criteria involving a join) or a primary set of records (via a regular where criteria), FOR UPDATE always allows new rows to appear. The fact that new rows are not locked is common and expected. Whether they appear in the result set due to the join or due to the where clause seems unimportant to me. Patrick Earl
Patrick Earl <patearl@patearl.net> writes: > On Sun, May 1, 2011 at 4:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Quite. What would it mean to lock the absence of a row? > I would argue that SELECT FOR UPDATE never locks on the absence of a > row. For example, if I do: > SELECT * FROM Table WHERE Column = 10 > The existing rows are locked, but somebody could come along and add > another unlocked row with Column = 10. Addition of new rows certainly isn't supposed to be prevented by a SELECT FOR UPDATE, but that's not the problem here. What I *do* expect a SELECT FOR UPDATE to promise is that the rows it did return can't change or be deleted by someone else for the life of my transaction. This is not possible to promise for null-extended rows unless you somehow lock out addition of a matching row on the inside of the join. Without that, a row that looked like <pet fields, nulls> when you selected it might suddenly start looking like <pet fields, cat fields> due to someone else's modification. And after that, since you still haven't got a lock on the cat row, the cat fields could keep on changing. I'm prepared to believe that there are some applications where that can't happen due to other interlocking, or doesn't matter to the application, but SELECT FOR UPDATE really can't assume that. I think what you're proposing is to fundamentally break the semantics of SELECT FOR UPDATE for the sake of convenience. You didn't explain exactly why your application doesn't care about this, but I wonder whether it's because you know that a lock on the parent "pet" row is sufficient due to application coding rules. If so, you could just tell SELECT FOR UPDATE to only lock the "pet" rows, and be happy: select * from pet left join cat ... for update of pet; regards, tom lane
On Sun, May 1, 2011 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Addition of new rows certainly isn't supposed to be prevented by a > SELECT FOR UPDATE, but that's not the problem here. What I *do* expect > a SELECT FOR UPDATE to promise is that the rows it did return can't > change or be deleted by someone else for the life of my transaction. > This is not possible to promise for null-extended rows unless you > somehow lock out addition of a matching row on the inside of the join. > Without that, a row that looked like <pet fields, nulls> when you > selected it might suddenly start looking like <pet fields, cat fields> > due to someone else's modification. And after that, since you still > haven't got a lock on the cat row, the cat fields could keep on > changing. > > I'm prepared to believe that there are some applications where that > can't happen due to other interlocking, or doesn't matter to the > application, but SELECT FOR UPDATE really can't assume that. I think > what you're proposing is to fundamentally break the semantics of SELECT > FOR UPDATE for the sake of convenience. While I don't have access to the SQL standard to check if the semantics are documented, the semantics don't seem clear cut. The question is whether the thing that you've promised won't change is the result row of the query or the source rows that built the result. I would like to know how other databases handle this, but I do know that it works on MSSQL. Perhaps it's using something like the source row locking semantics I mentioned. > You didn't explain exactly why your application doesn't care about this, > but I wonder whether it's because you know that a lock on the parent > "pet" row is sufficient due to application coding rules. If so, you > could just tell SELECT FOR UPDATE to only lock the "pet" rows, and be > happy: > > select * from pet left join cat ... for update of pet; If I select a collection that contains both dogs and cats, run a polymorphic operation that affects dogs and cats differently, then save my results back, I would need locks on all rows, not just the pet information. When "parts" of a "row" are stored in different tables, the results from both tables need to be locked. I see a couple possible workarounds, neither of which are particularly appealing: * Run many queries, being careful to ensure the appropriate rows from each table are locked. * Lock all related tables. Perhaps the syntax could be extended to indicate to the database that it's safe to lock on just the selected rows. select * from pet left join cat ... for update of pet, cat; On a conceptual level, I still tend to think of select for update as "locking the data returned by the query." If no data is returned by the query, I don't get a lock on that non-data. Is this an area that is covered by the standard? Thanks for your thoughts on this Tom. Your time is much appreciated. Patrick Earl
Patrick Earl <patearl@patearl.net> writes: > On Sun, May 1, 2011 at 9:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Addition of new rows certainly isn't supposed to be prevented by a >> SELECT FOR UPDATE, but that's not the problem here. �What I *do* expect >> a SELECT FOR UPDATE to promise is that the rows it did return can't >> change or be deleted by someone else for the life of my transaction. > While I don't have access to the SQL standard to check if the > semantics are documented, the semantics don't seem clear cut. The specific syntax of SELECT FOR UPDATE isn't in the standard at all --- the standard considers FOR UPDATE to be a clause you can attach to DECLARE CURSOR, not a bare SELECT, and also the items that can be mentioned in FOR UPDATE OF are individual column names not table names. But ignoring that little detail, as best I can tell the standard only allows FOR UPDATE to be applied to columns for which the cursor output is guaranteed to be one-to-one with elements of the underlying table, ie you could say UPDATE tab SET col = <value> WHERE CURRENT OF <cursor> and expect that a single well-defined table cell would get updated. This is certainly not the case for columns in the nullable side of an outer join, where there might be no existing cell that could be updated. The question of whether the cell is locked against concurrent updates isn't something that the spec addresses, so far as I've ever seen; but it is perfectly clear that there should be something there that could be updated. So your proposal doesn't work from that standpoint either. regards, tom lane
On May 1, 2011, at 12:27 PM, Patrick Earl wrote: > In ORMs like NHibernate, there are a few strategies for mapping > inheritance to SQL. One of these is "Joined Subclass," which allows > for the elimination of duplicate data and clean separation of class > contents. > > With a class hierarchy such as this: > > Pet > Dog : Pet > Cat : Pet > > The query to get all the pets is as follows: > > select * from Pet > left join Dog on Dog.Id = Pet.Id > left join Cat on Cat.Id = Pet.Id Since FOR UPDATE seems to be a dead end here... Is that construct something that NHibernate natively understands? If so, could you use Postgres table inheritance insteadof joins? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Okay, I finally had time to install a bunch of databases and see what the "industry" thinks. I tested the four seemingly most pertinent databases. The ability to "select for update" with a left outer join is present on these databases: * Oracle * SQL Server (not the same syntax, but does support the functionality) * MySQL It is not present in DB2, though I didn't check if it offered workarounds. Oracle and SQL Server did not limit the creation of the row that would fill the empty join. The ability to lock on outer joins is quite useful. I've even been contacted to ask if I was aware of any progress in this area. So, given that the majority of the important databases support it, I would propose that it would be a reasonable addition to PostgreSQL. It seems there is a difference of opinion about the locking model, but it seems that there are two possibilities: 1. Disallow it, hoping to save a naive user from an error. 2. Allow it, providing a significant piece of functionality. I would argue that the ability to perform a significant function such as this outweighs the needs of the naive user to be informed of his potential errors. Thanks for your consideration in this matter. Patrick Earl
On 11.07.2011 05:45, Patrick Earl wrote: > The ability to lock on outer joins is quite useful. I've even been > contacted to ask if I was aware of any progress in this area. 9.1 has a truly serializable isolation level, so I would suggest using that instead of SELECT FOR UPDATE. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 11.07.2011 05:45, Patrick Earl wrote: >> The ability to lock on outer joins is quite useful. I've even >> been contacted to ask if I was aware of any progress in this >> area. > > 9.1 has a truly serializable isolation level, so I would suggest > using that instead of SELECT FOR UPDATE. Heikki beat me to the big point, but I'll elaborate a bit. First, 9.1 is in beta testing, and will probably be released this summer Next, when using this feature be sure to use transactional annotations and set things up so that a transaction which fails with SQLSTATE 40001 is retried from the start. If you use serializable transactions consistently, you can drop all FOR UPDATE and FOR SHARE clauses, and most likely all explicit locks. (In our in-house testing I've so far found one place where we needed to take an explicit lock on a dummy table we created just to control access to a sequence -- sequences don't follow normal transactional semantics.) Third, review this section, and consider the performance tips there: http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-SERIALIZABLE I'd be interested in hearing how it goes. -Kevin
On 11.07.2011 18:44, Kevin Grittner wrote: > (In our in-house > testing I've so far found one place where we needed to take an > explicit lock on a dummy table we created just to control access to > a sequence -- sequences don't follow normal transactional > semantics.) Hmm, is that something we should do something about? Can you give an example of that? Not in 9.1, except in the docs if we don't mention that already, but in the future... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Jul 11, 2011, at 10:44 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> On 11.07.2011 05:45, Patrick Earl wrote: >>> The ability to lock on outer joins is quite useful. I've even >>> been contacted to ask if I was aware of any progress in this >>> area. >> >> 9.1 has a truly serializable isolation level, so I would suggest >> using that instead of SELECT FOR UPDATE. > > Heikki beat me to the big point, but I'll elaborate a bit. > > First, 9.1 is in beta testing, and will probably be released this > summer > > Next, when using this feature be sure to use transactional > annotations and set things up so that a transaction which fails with > SQLSTATE 40001 is retried from the start. If you use serializable > transactions consistently, you can drop all FOR UPDATE and FOR SHARE > clauses, and most likely all explicit locks. (In our in-house > testing I've so far found one place where we needed to take an > explicit lock on a dummy table we created just to control access to > a sequence -- sequences don't follow normal transactional > semantics.) > > Third, review this section, and consider the performance tips there: > > http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-SERIALIZABLE > > I'd be interested in hearing how it goes. I find these responses to be a bit off point. Not everyone can or will want to use SERIALIZABLE. The OP's point is thatwe - particularly Tom - have argued in the past that we shouldn't allow this because it's too ill-defined and/or confusing.Evidently our competition does not agree, and I think that's a point worth noting. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I find these responses to be a bit off point. Not everyone can or will > want to use SERIALIZABLE. The OP's point is that we - particularly > Tom - have argued in the past that we shouldn't allow this because > it's too ill-defined and/or confusing. Evidently our competition does > not agree, and I think that's a point worth noting. Has anyone looked into what the competition thinks the appropriate definition is, or whether they all agree on the details? regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote: > I find these responses to be a bit off point. The OP is basically looking for what Florian tried to implement. This is perhaps a *bit* off point, but arguably not more than pointing someone who is requesting planner hints in another direction. And someone thought the issues were related here: http://archives.postgresql.org/pgsql-hackers/2010-12/msg01792.php ;-) > Not everyone can or will want to use SERIALIZABLE. No argument on that. It's just that it is the only feature we have now (or soon) which solves the problem short of a table lock. > The OP's point is that we - particularly Tom - have argued in the > past that we shouldn't allow this because it's too ill-defined > and/or confusing. And I have argued that what Florian wanted would be a valuable addition. The approach foundered on technical details, although in re-reading the thread I'm wondering if it wouldn't make sense to dodge all that by having SELECT FOR UPDATE simple *do* a no-op UPDATE RETURNING. This would cause behavior matching Oracle and MS SQL Server (when the latter is using MVCC without S2PL). DB2 is more strict, acquiring a predicate lock over the selected range, but we can't be compatible with both behaviors at the same time. > Evidently our competition does not agree Neither on this nor on planner hints. ;-) -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I'm wondering if it wouldn't make sense to dodge all that by > having SELECT FOR UPDATE simple *do* a no-op UPDATE RETURNING. Hmm. Patrick, would it be possible to change the PostgreSQL code for Hibernate to use UPDATE RETURNING instead of SELECT FOR UPDATE? That might allow portable Hibernate applications to work properly with recent PostgreSQL versions without going to SERIALIZABLE transactions. -Kevin
On Jul11, 2011, at 18:55 , Kevin Grittner wrote: > Robert Haas <robertmhaas@gmail.com> wrote: >> I find these responses to be a bit off point. > > The OP is basically looking for what Florian tried to implement. > This is perhaps a *bit* off point, but arguably not more than > pointing someone who is requesting planner hints in another > direction. And someone thought the issues were related here: > > http://archives.postgresql.org/pgsql-hackers/2010-12/msg01792.php Hm, I'm not so sure we're looking for the same thing here. It seems to me that what the OP (or actually Hibernate) wants are two related, but different, things. (A) First, for a way to UPDATE a row that was returned by a previous SELECT, without the need to know a set of fields which comprise a candidate or primary key. So far, this has nothing to do with locking, and everything with to do with convenience. The postgres way of doing that is including the ctid of all to-be-updated relations in the SELECT's target lest, and using UPDATE ... WHERE ctid = <ctid from select>. Not sure how that behaves if the row has been updated after the SELECT but before the UPDATE, though... (B) Secondly (but I don't know if this is even Hibernate's intention, I'm no Hibernate expert), it might that Hibernate is trying to get true serializability by doing S2PL, i.e. taking a lock on every row it reads. That seems like a rather unwise thing to do on postgres, because of the way we handle row locks. Part (B) has some relationship to what I tried to archive by changing the way REPEATABLE READ transactions and row locks interact. Though my intention wasn't full serializability, only enough protection to make user-space FOREIGN KEYS work safely for REPEATABLE READ transactions. @OP, could you explain whether it is (A) or (B) or both that Hibernate tries to archive with "FOR UPDATE". >> The OP's point is that we - particularly Tom - have argued in the >> past that we shouldn't allow this because it's too ill-defined >> and/or confusing. > > And I have argued that what Florian wanted would be a valuable > addition. The approach foundered on technical details, although in > re-reading the thread I'm wondering if it wouldn't make sense to > dodge all that by having SELECT FOR UPDATE simple *do* a no-op > UPDATE RETURNING. This would cause behavior matching Oracle and MS > SQL Server (when the latter is using MVCC without S2PL). DB2 is > more strict, acquiring a predicate lock over the selected range, but > we can't be compatible with both behaviors at the same time. That'd make FOR UPDATE much more expensive than it is today. As it stands, FOR UPDATE causes the page containing to tuple to be dirtied, but it doesn't require new index entries and generate no new tuple version which must be cleaned up by VACUUM. (To be fair, however, HOT would reduce the impact somewhat, but still...). BTW, the technical issues that prevented my patch from working correctly are tightly related to the issues that plague the combination of sub-transactions and row locks. Namely that UPDATE overwrites the information about previous lock holders, and the information stays gone even if the UPDATE later rolls back. But so far, how to fix that evades me, at least without major hacks or changes to the on-disk format. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> wrote: > Part (B) has some relationship to what I tried to archive by > changing the way REPEATABLE READ transactions and row locks > interact. Though my intention wasn't full serializability, only > enough protection to make user-space FOREIGN KEYS work safely for > REPEATABLE READ transactions. Florian, I know that you looked at Oracle's treatment of SELECT FOR UPDATE, so could you respond to Tom's question about the semantics of that? (From what you and Patrick have posted I gather that from a user visible logical perspective SELECT FOR UPDATE is the same as a no-op UPDATE RETURNING, although there may be performance differences. From Patrick's recent post I gather that MS SQL Server [at least in some configuration -- it has many settings which might affect this] behaves the same as Oracle in this regard; while DB2 is more strict, using a predicate lock on the selected range. But my take on that is second-hand, based on those posts and discussions with Oracle users a PGEast -- it'd be better for a report from someone who looked at it directly.) -Kevin
On Jul11, 2011, at 20:16 , Kevin Grittner wrote: > Florian Pflug <fgp@phlo.org> wrote: >> Part (B) has some relationship to what I tried to archive by >> changing the way REPEATABLE READ transactions and row locks >> interact. Though my intention wasn't full serializability, only >> enough protection to make user-space FOREIGN KEYS work safely for >> REPEATABLE READ transactions. > > Florian, I know that you looked at Oracle's treatment of SELECT FOR > UPDATE, so could you respond to Tom's question about the semantics > of that? (From what you and Patrick have posted I gather that from > a user visible logical perspective SELECT FOR UPDATE is the same as > a no-op UPDATE RETURNING, although there may be performance > differences. (CC'ing Tom now, hope thats OK) I can only comment with certainty on the behaviour of FOR UPDATE regarding serialization conflicts. There, Oracle treats FOR UPDATE exactly like UPDATE, i.e. UPDATE raises a serialization error if it encounters a row locked FOR UPDATE by a transaction invisible to the UPDATEing one. What Tom wanted to know, I believe, was whether FOR UPDATE locks only existing *rows* (i.e., locks nothing in case of a LEFT JOIN without a matching right row), or whether it actually locks the *fact* that no such row exists (i.e., prevents future inserts of matching rows). Now, I cannot comment on that with absolute certainty, and currently don't have an Oracle instance available to test, but I can say so much: I'd very *very*, *very* surprised if they did anything other than simply locking nothing in the case of a LEFT join without a matching right row. As far as I'm aware, Oracle simply doesn't do predicate locking, and doesn't do true serializability. Their SERIALIZABLE mode is actually snapshot isolation, just like ours used to be. It'd be very strange to do yet, but yet to do predicate locking when it comes to SELECT FOR UPDATE. > From Patrick's recent post I gather that MS SQL Server > [at least in some configuration -- it has many settings which might > affect this] Yeah MS-SQL really isn't the idea target for comparison here. You can override pretty much any lock that MS-SQL takes with a stronger or weaker one from what I've seen. I wouldn't be at all surprised if you could convince it to work either way by putting some (probably rather obscure) incantations into your SQL statements. best regards, Florian Pflug
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 11.07.2011 18:44, Kevin Grittner wrote: >> (In our in-house testing I've so far found one place where we >> needed to take an explicit lock on a dummy table we created just >> to control access to a sequence -- sequences don't follow normal >> transactional semantics.) > > Hmm, is that something we should do something about? Can you give > an example of that? Sequences behave in a non-transactional way for good reason; I certainly wouldn't advocate changing that. This came up in some fairly specialized code in our replication system. Existing PostgreSQL features were fine for dealing with it, although the ability to take out a lock on a sequence (just as one would on a table) would have been convenient (as it would have allowed us to avoid using a dummy table). The more important issue is probably around the docs making clear that the serializable transaction isolation level doesn't make certain things beyond DML serializable. > Not in 9.1, except in the docs if we don't mention that already, > but in the future... I thought it was mentioned in the docs, but in a scan through the Concurrency Control chapter I'm not seeing it. Basically, SSI is only going to work with objects and statements which are currently using MVCC snapshots for snapshot isolation; anything else behaves exactly as it did in 9.0 at the serializable isolation level. I'm not quite sure where this should be mentioned. Ideas? -Kevin
On Jul 11, 2011, at 11:55 AM, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> I find these responses to be a bit off point. > > The OP is basically looking for what Florian tried to implement. > This is perhaps a *bit* off point, but arguably not more than > pointing someone who is requesting planner hints in another > direction. And someone thought the issues were related here: > > http://archives.postgresql.org/pgsql-hackers/2010-12/msg01792.php > > ;-) Well, fair enough. I thought of the connection between this request and Florian's work, too. I would very much like to supportwhat he proposed, but it doesn't appear viable without a heapam rewrite, or maybe a lock manager rewrite. However,I think that's a somewhat separate question from whether we need to forbid SFU on the outer side of a join. Tom's question seems to me to be right on target: what semantics do our competitors assign to this construct? And do theybroadly agree with each other? >> Evidently our competition does not agree > > Neither on this nor on planner hints. ;-) Well, we are a pretty smart group of people. But I don't think we should completely ignore what other people are doing, onany topic. ...Robert
Florian Pflug <fgp@phlo.org> wrote: > Yeah MS-SQL really isn't the idea target for comparison here. You > can override pretty much any lock that MS-SQL takes with a > stronger or weaker one from what I've seen. I wouldn't be at all > surprised if you could convince it to work either way by putting > some (probably rather obscure) incantations into your SQL > statements. I was thinking of some of the ALTER DATABASE SET options, like COMPATIBILITY_LEVEL or ALLOW_SNAPSHOT_ISOLATION, but you have a point about what overrides can be used at the statement level, too. -Kevin