Обсуждение: Add cassert-only checks against unlocked use of relations
Hi, There frequently have been bugs where (heap|relation|index)_open(NoLock) was used without a previous locks which in some circumstances is an easy mistake to make and which is hard to notice. The attached patch adds --use-cassert only WARNINGs against doing so: Add cassert-only checks against unlocked use of relations. Specifically relation_open(), which also covers heap/index_open(), and RelationIdGetRelation() WARN if the relation is opened without being locked. index_open() now checks whether the heap relation the index is covering is locked. To make those checks possible add StrongestLocalRelationLock() which returns the strongest locally held lock over a relation. It relies on the also added StrongestLocalLock() which searches the local locktable sequentially for matching locks. After 1) 2a781d57dcd027df32d15ee2378b84d0c4d005d1 2) http://archives.postgresql.org/message-id/1383681385.79520.YahooMailNeo%40web162902.mail.bf1.yahoo.com 3) http://archives.postgresql.org/message-id/CAEZATCVCgboHKu_%2BK0nakrXW-AFEz_18icE0oWEQHsM-OepWhw%40mail.gmail.com HEAD doesn't generate warnings anymore. I think Kevin will commit something akin to 2), but 3) is an actual open bug, so that patch will need to get applied beforehand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
Andres Freund <andres@2ndquadrant.com> writes: > There frequently have been bugs where (heap|relation|index)_open(NoLock) > was used without a previous locks which in some circumstances is an easy > mistake to make and which is hard to notice. > The attached patch adds --use-cassert only WARNINGs against doing so: While I agree that there seems to be a problem here, I'm not convinced that this is the solution. The implication of a heap_open(NoLock) is that the programmer believes that some previous action must have taken a lock on the relation; if he's wrong, then the causal link that he thought existed doesn't really. But this patch is not checking for a causal link; it'll be fooled just as easily as the programmer is by a happenstance (that is, unrelated) previous lock on the relation. What's more, it entirely fails to check whether the previous lock is really strong enough for what we're going to do. I also find it unduly expensive to search the whole lock hashtable on every relation open. That's going to be a O(N^2) cost for a transaction touching N relations, and that doesn't sound acceptable, not even for assert-only code. If we're sufficiently worried by this type of bug, ISTM we'd be better off just disallowing heap_open(NoLock). At the time we invented that, every lock request went to shared memory; but now that we have the local lock table, re-locking just requires a local hash lookup followed by incrementing a local counter. That's probably pretty cheap --- certainly a lot cheaper than what you've got here. regards, tom lane
On 2013-11-05 16:25:53 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > There frequently have been bugs where (heap|relation|index)_open(NoLock) > > was used without a previous locks which in some circumstances is an easy > > mistake to make and which is hard to notice. > > The attached patch adds --use-cassert only WARNINGs against doing so: > > While I agree that there seems to be a problem here, I'm not convinced > that this is the solution. The implication of a heap_open(NoLock) is that > the programmer believes that some previous action must have taken a lock > on the relation; if he's wrong, then the causal link that he thought > existed doesn't really. But this patch is not checking for a causal link; > it'll be fooled just as easily as the programmer is by a happenstance > (that is, unrelated) previous lock on the relation. What's more, it > entirely fails to check whether the previous lock is really strong enough > for what we're going to do. Sure. But it already found several bugs as evidenced by the referenced thread, so it seems to be helpful enough. > I also find it unduly expensive to search the whole lock hashtable on > every relation open. That's going to be a O(N^2) cost for a transaction > touching N relations, and that doesn't sound acceptable, not even for > assert-only code. We could relatively easily optimize that to a constant factor by just iterating over the possible lockmodes. Should only take a couple more lines. I'd be really, really surprised if it even comes close to the overhead of AtEOXact_Buffers() though. Which is not to say it's a bad idea to make this check more effective though ;) > If we're sufficiently worried by this type of bug, ISTM we'd be better off > just disallowing heap_open(NoLock). At the time we invented that, every > lock request went to shared memory; but now that we have the local lock > table, re-locking just requires a local hash lookup followed by > incrementing a local counter. That's probably pretty cheap --- certainly > a lot cheaper than what you've got here. Hm. That only works though if we're using the same lockmode as before - often enough the *_open(NoLock) checks would use a weaker locklevel than the previous lock. So I think the cost of doing so would probably be noticeable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-11-05 16:25:53 -0500, Tom Lane wrote: >> If we're sufficiently worried by this type of bug, ISTM we'd be better off >> just disallowing heap_open(NoLock). At the time we invented that, every >> lock request went to shared memory; but now that we have the local lock >> table, re-locking just requires a local hash lookup followed by >> incrementing a local counter. That's probably pretty cheap --- certainly >> a lot cheaper than what you've got here. > Hm. That only works though if we're using the same lockmode as before - > often enough the *_open(NoLock) checks would use a weaker locklevel than > the previous lock. So I think the cost of doing so would probably be > noticeable. If you're not using the same lockmode as before, it's probably a bug anyway. As I said already, the entire NoLock coding technique is dependent on having a very clear idea of which previous lock-taking you're riding on the coattails of. Why wouldn't you duplicate that lock level, if we say you can't use NoLock anymore? regards, tom lane
On 2013-11-05 16:45:49 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2013-11-05 16:25:53 -0500, Tom Lane wrote: > >> If we're sufficiently worried by this type of bug, ISTM we'd be better off > >> just disallowing heap_open(NoLock). At the time we invented that, every > >> lock request went to shared memory; but now that we have the local lock > >> table, re-locking just requires a local hash lookup followed by > >> incrementing a local counter. That's probably pretty cheap --- certainly > >> a lot cheaper than what you've got here. > > > Hm. That only works though if we're using the same lockmode as before - > > often enough the *_open(NoLock) checks would use a weaker locklevel than > > the previous lock. So I think the cost of doing so would probably be > > noticeable. > > If you're not using the same lockmode as before, it's probably a bug anyway. > As I said already, the entire NoLock coding technique is dependent on > having a very clear idea of which previous lock-taking you're riding > on the coattails of. Why wouldn't you duplicate that lock level, > if we say you can't use NoLock anymore? Hm. That doesn't really seem to match my reading of the code. There's quite some code around that relies the fact that the parser has taken an appropriately strong lock already. E.g. the parser chooses the lockmode in addRangeTableEntry() - I don't think we want to add duplicate isLockedRefname() calls everywhere. Other users of that relation will likely just use AccessShareLock since that's sufficient for their own use. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-11-05 22:35:41 +0100, Andres Freund wrote: > We could relatively easily optimize that to a constant factor by just > iterating over the possible lockmodes. Should only take a couple more > lines. On that note, any chance you remember why you increased MAX_LOCKMODE by 2 to 10 back in 2001 although AccessExclusiveLock is 8? The relevant commit is E4fe42dfbc3bafa0ea615239d716a6b37d67da253 . Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On that note, any chance you remember why you increased MAX_LOCKMODE by > 2 to 10 back in 2001 although AccessExclusiveLock is 8? The relevant > commit is 4fe42dfbc3bafa0ea615239d716a6b37d67da253 . Probably because it seemed like a round number, which 9 wasn't ... keep in mind that this data structure is nominally intended to support other lock semantics besides what LockConflicts[] defines. (BTW, it's a conceptual error to imagine that the numerical values of the lock mode codes define a strict "strength" ordering, which is another reason I don't care for your patch.) regards, tom lane
On 2013-11-05 17:19:21 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On that note, any chance you remember why you increased MAX_LOCKMODE by > > 2 to 10 back in 2001 although AccessExclusiveLock is 8? The relevant > > commit is 4fe42dfbc3bafa0ea615239d716a6b37d67da253 . > > Probably because it seemed like a round number, which 9 wasn't ... > keep in mind that this data structure is nominally intended to support > other lock semantics besides what LockConflicts[] defines. Hm. Given that there are Assert()s for MAX_LOCKMODE around, that adding a new method isn't possible without editing lock.c and that we use it to in shared memory structures I am not sure I see the point of that slop. Anyway, it was just a point of curiosity. > (BTW, > it's a conceptual error to imagine that the numerical values of the > lock mode codes define a strict "strength" ordering, which is another > reason I don't care for your patch.) Well, while I don't thing it has too much practical bearing, we could just check for *any* lock already held instead, that's all we need for the added checks. I thought it might be more useful to get the strongest lock rather than any lock for other potential checks, but if that's a contentious point... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services