Обсуждение: bug in fast-path locking
On Sun, Apr 8, 2012 at 12:43 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
>> Indeed, the unpatched GIT version crashes if you enter
>>  =#lock TABLE pgbench_accounts ;
>> the second time in session 2 after the first one failed. Also,
>> manually spelling it out:
>>
>> Session 1:
>>
>> $ psql
>> psql (9.2devel)
>> Type "help" for help.
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# lock table pgbench_accounts;
>> LOCK TABLE
>> zozo=#
>>
>> Session 2:
>>
>> zozo=# begin;
>> BEGIN
>> zozo=# savepoint a;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> ERROR:  canceling statement due to statement timeout
>> zozo=# rollback to a;
>> ROLLBACK
>> zozo=# savepoint b;
>> SAVEPOINT
>> zozo=# lock table pgbench_accounts;
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>> Server log after the second lock table:
>>
>> TRAP: FailedAssertion("!(locallock->holdsStrongLockCount == 0)", File:
>> "lock.c", Line: 749)
>> LOG:  server process (PID 12978) was terminated by signal 6: Aborted
>
>
> Robert, the Assert triggering with the above procedure
> is in your "fast path" locking code with current GIT.
Yes, that sure looks like a bug.  It seems that if the top-level
transaction is aborting, then LockReleaseAll() is called and
everything gets cleaned up properly; or if a subtransaction is
aborting after the lock is fully granted, then the locks held by the
subtransaction are released one at a time using LockRelease(), but if
the subtransaction is aborted *during the lock wait* then we only do
LockWaitCancel(), which doesn't clean up the LOCALLOCK.  Before the
fast-lock patch, that didn't really matter, but now it does, because
that LOCALLOCK is tracking the fact that we're holding onto a shared
resource - the strong lock count.  So I think that LockWaitCancel()
needs some kind of adjustment, but I haven't figured out exactly what
it is yet.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
			
		On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Robert, the Assert triggering with the above procedure >> is in your "fast path" locking code with current GIT. > > Yes, that sure looks like a bug. It seems that if the top-level > transaction is aborting, then LockReleaseAll() is called and > everything gets cleaned up properly; or if a subtransaction is > aborting after the lock is fully granted, then the locks held by the > subtransaction are released one at a time using LockRelease(), but if > the subtransaction is aborted *during the lock wait* then we only do > LockWaitCancel(), which doesn't clean up the LOCALLOCK. Before the > fast-lock patch, that didn't really matter, but now it does, because > that LOCALLOCK is tracking the fact that we're holding onto a shared > resource - the strong lock count. So I think that LockWaitCancel() > needs some kind of adjustment, but I haven't figured out exactly what > it is yet. I looked at this more. The above analysis is basically correct, but the problem goes a bit beyond an error in LockWaitCancel(). We could also crap out with an error before getting as far as LockWaitCancel() and have the same problem. I think that a correct statement of the problem is this: from the time we bump the strong lock count, up until the time we're done acquiring the lock (or give up on acquiring it), we need to have an error-cleanup hook in place that will unbump the strong lock count if we error out. Once we're done updating the shared and local lock tables, the special handling ceases to be needed, because any subsequent lock release will go through LockRelease() or LockReleaseAll(), which will do the appropriate clenaup. The attached patch is an attempt at implementing that; any reviews appreciated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Robert Haas <robertmhaas@gmail.com> writes:
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
Haven't looked at the code, but maybe it'd be better to not bump the
strong lock count in the first place until the final step of updating
the lock tables?
        regards, tom lane
			
		On Mon, Apr 9, 2012 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I looked at this more. The above analysis is basically correct, but >> the problem goes a bit beyond an error in LockWaitCancel(). We could >> also crap out with an error before getting as far as LockWaitCancel() >> and have the same problem. I think that a correct statement of the >> problem is this: from the time we bump the strong lock count, up until >> the time we're done acquiring the lock (or give up on acquiring it), >> we need to have an error-cleanup hook in place that will unbump the >> strong lock count if we error out. Once we're done updating the >> shared and local lock tables, the special handling ceases to be >> needed, because any subsequent lock release will go through >> LockRelease() or LockReleaseAll(), which will do the appropriate >> clenaup. > > Haven't looked at the code, but maybe it'd be better to not bump the > strong lock count in the first place until the final step of updating > the lock tables? Well, unfortunately, that would break the entire mechanism. The idea is that we bump the strong lock count first. That prevents anyone from taking any more fast-path locks on the target relation. Then, we go through and find any existing fast-path locks that have already been taken, and turn them into regular locks. Finally, we resolve the actual lock request and either grant the lock or block, depending on whether conflicts exist. So there's some necessary separation between the action of bumping the strong lock count and updating the lock tables; the entire mechanism relies on being able to do non-trivial processing in between. I thought that I had nailed down the error exit cases in the original patch, but this test case, and some code reading with fresh eyes, shows that I didn't do half so good a job as I had thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 9, 2012 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Haven't looked at the code, but maybe it'd be better to not bump the
>> strong lock count in the first place until the final step of updating
>> the lock tables?
> Well, unfortunately, that would break the entire mechanism.  The idea
> is that we bump the strong lock count first.  That prevents anyone
> from taking any more fast-path locks on the target relation.  Then, we
> go through and find any existing fast-path locks that have already
> been taken, and turn them into regular locks.  Finally, we resolve the
> actual lock request and either grant the lock or block, depending on
> whether conflicts exist.
OK.  (Is that explained somewhere in the comments?  I confess I've not
paid any attention to this patch up to now.)  I wonder though whether
you actually need a *count*.  What if it were just a flag saying "do not
take any fast path locks on this object", and once set it didn't get
unset until there were no locks left at all on that object?  In
particular, it's not clear from what you're saying here why it's okay
to let the value revert once you've changed some of the FP locks to
regular locks.
        regards, tom lane
			
		On Mon, Apr 9, 2012 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Apr 9, 2012 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Haven't looked at the code, but maybe it'd be better to not bump the >>> strong lock count in the first place until the final step of updating >>> the lock tables? > >> Well, unfortunately, that would break the entire mechanism. The idea >> is that we bump the strong lock count first. That prevents anyone >> from taking any more fast-path locks on the target relation. Then, we >> go through and find any existing fast-path locks that have already >> been taken, and turn them into regular locks. Finally, we resolve the >> actual lock request and either grant the lock or block, depending on >> whether conflicts exist. > > OK. (Is that explained somewhere in the comments? I confess I've not > paid any attention to this patch up to now.) There's a new section in src/backend/storage/lmgr/README on Fast Path Locking, plus comments at various places in the code. It's certainly possible I've missed something that should be updated, but I did my best. > I wonder though whether > you actually need a *count*. What if it were just a flag saying "do not > take any fast path locks on this object", and once set it didn't get > unset until there were no locks left at all on that object? I think if you read the above-referenced section of the README you'll be deconfused. The short version is that we divide up the space of lockable objects into 1024 partitions and the strong lock counts are actually a count of all locks in the partition. It is therefore theoretically possible for locking to get slower on table A because somebody's got an AccessExclusiveLock on table B, if the low-order 10 bits of the locktag hashcodes happen to collide. In such a case, all locks on both relations would be forced out of the fast path until the AccessExclusiveLock was released. If it so happens that table A is getting pounded with something that looks a lot like pgbench -S -c 32 -j 32 on a system with more than a couple of cores, the user will be sad. I judge that real-world occurrences of this problem will be quite rare, since most people have adequate reasons for long-lived strong table locks anyway, and 1024 partitions seemed like enough to keep most people from suffering too badly. I don't see any way to eliminate the theoretical possibility of this while still having the basic mechanism work, either, though we could certainly crank up the partition count. > In > particular, it's not clear from what you're saying here why it's okay > to let the value revert once you've changed some of the FP locks to > regular locks. It's always safe to convert a fast-path lock to a regular lock; it just costs you some performance. The idea is that everything that exists as a fast-path lock is something that's certain not to have any lock conflicts. As soon as we discover that a particular lock might be involved in a lock conflict, we have to turn it into a "real" lock.So if backends 1, 2, and 3 take fast-path locks onA (to SELECT from it, for example) and then backend 4 wants an AccessExclusiveLock, it will pull the locks from those backends out of the fast-path mechanism and make regular lock entries for them before checking for lock conflicts. Then, it will discover that there are in fact conflicts and go to sleep. When those backends go to release their locks, they will notice that their locks have been moved to the main lock table and will release them there, eventually waking up backend 4 to go do his thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2012-04-09 at 16:11 -0400, Robert Haas wrote: > > I wonder though whether > > you actually need a *count*. What if it were just a flag saying "do not > > take any fast path locks on this object", and once set it didn't get > > unset until there were no locks left at all on that object? > > I think if you read the above-referenced section of the README you'll > be deconfused. My understanding: The basic reason for the count is that we need to preserve the information that a strong lock is being acquired between the time it's put in FastPathStrongRelationLocks and the time it actually acquires the lock in the lock manager. By definition, the lock manager doesn't know about it yet (so we can't use a simple rule like "there are no locks on the object so we can unset the flag"). Therefore, the backend must indicate whether it's in this code path or not; meaning that it needs to do something on the error path (in this case, decrement the count). That was the source of this bug. There may be a way around this problem, but nothing occurs to me right now. Regards,Jeff Davis PS: Oops, I missed this bug in the review, too. PPS: In the README, FastPathStrongRelationLocks is referred to as FastPathStrongLocks. Worth a quick update for easier symbol searching.
On 4/9/12 12:32 PM, Robert Haas wrote: > I looked at this more. The above analysis is basically correct, but > the problem goes a bit beyond an error in LockWaitCancel(). We could > also crap out with an error before getting as far as LockWaitCancel() > and have the same problem. I think that a correct statement of the > problem is this: from the time we bump the strong lock count, up until > the time we're done acquiring the lock (or give up on acquiring it), > we need to have an error-cleanup hook in place that will unbump the > strong lock count if we error out. Once we're done updating the > shared and local lock tables, the special handling ceases to be > needed, because any subsequent lock release will go through > LockRelease() or LockReleaseAll(), which will do the appropriate > clenaup. > > The attached patch is an attempt at implementing that; any reviews appreciated. Dumb question... should operations in the various StrongLock functions take place in a critical section? Or is that alreadyensure outside of these functions? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Mon, 2012-04-09 at 17:42 -0500, Jim Nasby wrote: > Dumb question... should operations in the various StrongLock functions > take place in a critical section? Or is that already ensure outside of > these functions? Do you mean CRITICAL_SECTION() in the postgres sense (that is, avoid error paths by making all ERRORs into PANICs and preventing interrupts); or the sense described here: http://en.wikipedia.org/wiki/Critical_section ? If you mean in the postgres sense, you'd have to hold the critical section open from the time you incremented the strong lock count all the way until you decremented it (which is normally at the time the lock is released); which is a cure worse than the disease. Regards,Jeff Davis
On Mon, 2012-04-09 at 13:32 -0400, Robert Haas wrote:
> I looked at this more.  The above analysis is basically correct, but
> the problem goes a bit beyond an error in LockWaitCancel().  We could
> also crap out with an error before getting as far as LockWaitCancel()
> and have the same problem.  I think that a correct statement of the
> problem is this: from the time we bump the strong lock count, up until
> the time we're done acquiring the lock (or give up on acquiring it),
> we need to have an error-cleanup hook in place that will unbump the
> strong lock count if we error out.   Once we're done updating the
> shared and local lock tables, the special handling ceases to be
> needed, because any subsequent lock release will go through
> LockRelease() or LockReleaseAll(), which will do the appropriate
> clenaup.
> 
> The attached patch is an attempt at implementing that; any reviews appreciated.
> 
This path doesn't have an AbortStrongLockAcquire:
 if (!(proclock->holdMask & LOCKBIT_ON(lockmode))) {   ...   elog(ERROR,...)
but other similar paths do:
 if (!proclock) {   AbortStrongLockAcquire();
I don't think it's necessary outside of LockErrorCleanup(), right?
I think there could be some more asserts. There are three sites that
decrement FastPathStrongRelationLocks, but in two of them
StrongLockInProgress should always be NULL.
Other than that, it looks good to me.
Regards,Jeff Davis
			
		On Mon, 2012-04-09 at 22:47 -0700, Jeff Davis wrote:
> but other similar paths do:
> 
>   if (!proclock)
>   {
>     AbortStrongLockAcquire();
> 
> I don't think it's necessary outside of LockErrorCleanup(), right?
I take that back, it's necessary for the dontwait case, too.
Regards,Jeff Davis
			
		2012-04-09 19:32 keltezéssel, Robert Haas írta: > On Sun, Apr 8, 2012 at 9:37 PM, Robert Haas<robertmhaas@gmail.com> wrote: >>> Robert, the Assert triggering with the above procedure >>> is in your "fast path" locking code with current GIT. >> Yes, that sure looks like a bug. It seems that if the top-level >> transaction is aborting, then LockReleaseAll() is called and >> everything gets cleaned up properly; or if a subtransaction is >> aborting after the lock is fully granted, then the locks held by the >> subtransaction are released one at a time using LockRelease(), but if >> the subtransaction is aborted *during the lock wait* then we only do >> LockWaitCancel(), which doesn't clean up the LOCALLOCK. Before the >> fast-lock patch, that didn't really matter, but now it does, because >> that LOCALLOCK is tracking the fact that we're holding onto a shared >> resource - the strong lock count. So I think that LockWaitCancel() >> needs some kind of adjustment, but I haven't figured out exactly what >> it is yet. > I looked at this more. The above analysis is basically correct, but > the problem goes a bit beyond an error in LockWaitCancel(). We could > also crap out with an error before getting as far as LockWaitCancel() > and have the same problem. I think that a correct statement of the > problem is this: from the time we bump the strong lock count, up until > the time we're done acquiring the lock (or give up on acquiring it), > we need to have an error-cleanup hook in place that will unbump the > strong lock count if we error out. Once we're done updating the > shared and local lock tables, the special handling ceases to be > needed, because any subsequent lock release will go through > LockRelease() or LockReleaseAll(), which will do the appropriate > clenaup. > > The attached patch is an attempt at implementing that; any reviews appreciated. This patch indeed fixes the scenario discovered by Cousin Marc. Reading this patch also made me realize that my lock_timeout patch needs adjusting, i.e. needs an AbortStrongLockAcquire() call if waiting for a lock timed out. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig& Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/
On 4/9/12 6:12 PM, Jeff Davis wrote: > On Mon, 2012-04-09 at 17:42 -0500, Jim Nasby wrote: >> Dumb question... should operations in the various StrongLock functions >> take place in a critical section? Or is that already ensure outside of >> these functions? > > Do you mean CRITICAL_SECTION() in the postgres sense (that is, avoid > error paths by making all ERRORs into PANICs and preventing interrupts); > or the sense described here: Postgres sense. I thought there was concern about multiple people trying to increment or decrement the count at the sametime, and if that was the case perhaps there was an issue with it not being in a CRITICAL_SECTION as well. But I couldcertainly be wrong about this. :) And yes, we'd definitely not want to be in a CRITICAL_SECTION for the duration of the operation... -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net