Обсуждение: Anyone seen this kind of lock pileup?
All, Having an interesting issue on one 8.4 database. Due to poor application design, the application is requesting 8-15 exclusive (update) locks on the same row on parallel connections pretty much simultaneously (i.e. < 50ms apart). What's odd about this is that the resulting "lock pileup" takes a mysterious 2-3.5 seconds to clear, despite the fact that none of the connections are *doing* anything during that time, nor are there deadlock errors. In theory at least, the locks should clear out in reverse order in less than a second; none of the individual statements takes more than 10ms to execute. Has anyone else seen something like this? Any idea what causes it? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Having an interesting issue on one 8.4 database. Due to poor > application design, the application is requesting 8-15 exclusive > (update) locks on the same row on parallel connections pretty much > simultaneously (i.e. < 50ms apart). > What's odd about this is that the resulting "lock pileup" takes a > mysterious 2-3.5 seconds to clear, despite the fact that none of the > connections are *doing* anything during that time, nor are there > deadlock errors. In theory at least, the locks should clear out in > reverse order in less than a second; none of the individual statements > takes more than 10ms to execute. Hmm ... can you extract a test case? Or at least strace the backends involved? regards, tom lane
> Hmm ... can you extract a test case? Or at least strace the backends > involved? No, and no. Strace was the first thing I thought of, but I'd have to somehow catch one of these backends in the 3 seconds it's locked. Not really feasible. It might be possible to construct a test case, depending on how much the user wants to spend on the problem. I'd estimate that a test case would take 8-12 hours of my time to get working, given the level of activity and concurrency required. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 11/17/10 18:37, Josh Berkus wrote: > All, > > Having an interesting issue on one 8.4 database. Due to poor application > design, the application is requesting 8-15 exclusive (update) locks on > the same row on parallel connections pretty much simultaneously (i.e. < > 50ms apart). > > What's odd about this is that the resulting "lock pileup" takes a > mysterious 2-3.5 seconds to clear, despite the fact that none of the > connections are *doing* anything during that time, nor are there > deadlock errors. In theory at least, the locks should clear out in > reverse order in less than a second; none of the individual statements > takes more than 10ms to execute. Just a random guess: a timeout-supported livelock? (of course if there is any timeout-and-retry protocol going on and the timeout intervals are non-randomized).
>> What's odd about this is that the resulting "lock pileup" takes a >> mysterious 2-3.5 seconds to clear, despite the fact that none of the >> connections are *doing* anything during that time, nor are there >> deadlock errors. In theory at least, the locks should clear out in >> reverse order in less than a second; none of the individual statements >> takes more than 10ms to execute. Ok, I've collected more data. Looks like the case I was examining was idiosyncratic; most of these lock pile-ups involve 400 or more locks waiting held by around 20 different backends. Given this, taking 3 seconds to sort that all out doesn't seem that unreasonable. Presumably there's a poll cycle of some sort for waiting statements? Anyway, the obvious answer is for the user to fix their application. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Ok, I've collected more data. Looks like the case I was examining was > idiosyncratic; most of these lock pile-ups involve 400 or more locks > waiting held by around 20 different backends. Given this, taking 3 > seconds to sort that all out doesn't seem that unreasonable. > Presumably there's a poll cycle of some sort for waiting statements? No ... but if the lock requests were mutually exclusive, I could believe it taking 3 seconds for all of the waiting backends to get their turn with the lock, do whatever they were gonna do, commit, and release the lock to the next guy. > Anyway, the obvious answer is for the user to fix their application. Probably. regards, tom lane