Обсуждение: Anyone seen this kind of lock pileup?
All,
Having an interesting issue on one 8.4 database. Due to poor
application design, the application is requesting 8-15 exclusive
(update) locks on the same row on parallel connections pretty much
simultaneously (i.e. < 50ms apart).
What's odd about this is that the resulting "lock pileup" takes a
mysterious 2-3.5 seconds to clear, despite the fact that none of the
connections are *doing* anything during that time, nor are there
deadlock errors. In theory at least, the locks should clear out in
reverse order in less than a second; none of the individual statements
takes more than 10ms to execute.
Has anyone else seen something like this? Any idea what causes it?
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
> Having an interesting issue on one 8.4 database. Due to poor
> application design, the application is requesting 8-15 exclusive
> (update) locks on the same row on parallel connections pretty much
> simultaneously (i.e. < 50ms apart).
> What's odd about this is that the resulting "lock pileup" takes a
> mysterious 2-3.5 seconds to clear, despite the fact that none of the
> connections are *doing* anything during that time, nor are there
> deadlock errors. In theory at least, the locks should clear out in
> reverse order in less than a second; none of the individual statements
> takes more than 10ms to execute.
Hmm ... can you extract a test case? Or at least strace the backends
involved?
regards, tom lane
> Hmm ... can you extract a test case? Or at least strace the backends
> involved?
No, and no. Strace was the first thing I thought of, but I'd have to
somehow catch one of these backends in the 3 seconds it's locked. Not
really feasible.
It might be possible to construct a test case, depending on how much the
user wants to spend on the problem. I'd estimate that a test case would
take 8-12 hours of my time to get working, given the level of activity
and concurrency required.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On 11/17/10 18:37, Josh Berkus wrote: > All, > > Having an interesting issue on one 8.4 database. Due to poor application > design, the application is requesting 8-15 exclusive (update) locks on > the same row on parallel connections pretty much simultaneously (i.e. < > 50ms apart). > > What's odd about this is that the resulting "lock pileup" takes a > mysterious 2-3.5 seconds to clear, despite the fact that none of the > connections are *doing* anything during that time, nor are there > deadlock errors. In theory at least, the locks should clear out in > reverse order in less than a second; none of the individual statements > takes more than 10ms to execute. Just a random guess: a timeout-supported livelock? (of course if there is any timeout-and-retry protocol going on and the timeout intervals are non-randomized).
>> What's odd about this is that the resulting "lock pileup" takes a
>> mysterious 2-3.5 seconds to clear, despite the fact that none of the
>> connections are *doing* anything during that time, nor are there
>> deadlock errors. In theory at least, the locks should clear out in
>> reverse order in less than a second; none of the individual statements
>> takes more than 10ms to execute.
Ok, I've collected more data. Looks like the case I was examining was
idiosyncratic; most of these lock pile-ups involve 400 or more locks
waiting held by around 20 different backends. Given this, taking 3
seconds to sort that all out doesn't seem that unreasonable.
Presumably there's a poll cycle of some sort for waiting statements?
Anyway, the obvious answer is for the user to fix their application.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
> Ok, I've collected more data. Looks like the case I was examining was
> idiosyncratic; most of these lock pile-ups involve 400 or more locks
> waiting held by around 20 different backends. Given this, taking 3
> seconds to sort that all out doesn't seem that unreasonable.
> Presumably there's a poll cycle of some sort for waiting statements?
No ... but if the lock requests were mutually exclusive, I could believe
it taking 3 seconds for all of the waiting backends to get their turn
with the lock, do whatever they were gonna do, commit, and release the
lock to the next guy.
> Anyway, the obvious answer is for the user to fix their application.
Probably.
regards, tom lane