Обсуждение: proposal: lock_time for pg_stat_database
Hi all,
some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.On 1/16/15 11:00 AM, Pavel Stehule wrote: > Hi all, > > some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information.Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_databasestatistics. High number can be signal about lock issues. Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle allcases; we'd still need a "lock_time_other" somewhere). Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way toknow if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sensebut we might be able to count how often we fail initial acquisition or something. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
 
On 1/16/15 11:00 AM, Pavel Stehule wrote:Hi all,
some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.
Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).
it can use a current existing stats mechanisms 
I afraid so isn't possible to assign waiting time to table - because it depends on order
Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.
now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories. 
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/16/15 11:35 AM, Pavel Stehule wrote: > > > 2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>: > > On 1/16/15 11:00 AM, Pavel Stehule wrote: > > Hi all, > > some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show thisinformation. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_databasestatistics. High number can be signal about lock issues. > > > Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handleall cases; we'd still need a "lock_time_other" somewhere). > > > > it can use a current existing stats mechanisms > > I afraid so isn't possible to assign waiting time to table - because it depends on order Huh? Order of what? > Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some wayto know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't makesense but we might be able to count how often we fail initial acquisition or something. > > > now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting,waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages,and others categories. So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here.Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
 
 
On 1/16/15 11:35 AM, Pavel Stehule wrote:
2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 11:00 AM, Pavel Stehule wrote:
Hi all,
some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.
Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).
it can use a current existing stats mechanisms
I afraid so isn't possible to assign waiting time to table - because it depends on order
Huh? Order of what?
when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?
DDL statements are exception - there is almost simple mapping between relations and lock time reason.
Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.
now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.
So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.
For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.
My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 1/16/15 11:35 AM, Pavel Stehule wrote:
2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 11:00 AM, Pavel Stehule wrote:
Hi all,
some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.
Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).
it can use a current existing stats mechanisms
I afraid so isn't possible to assign waiting time to table - because it depends on order
Huh? Order of what?when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?DDL statements are exception - there is almost simple mapping between relations and lock time reason.Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.
now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.
So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).
and this proposal has sense only for heavyweight locks - because others locks are everywhere 
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/16/15 12:30 PM, Pavel Stehule wrote: > > > 2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>: > > > > 2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>: > > On 1/16/15 11:35 AM, Pavel Stehule wrote: > > > > 2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:Jim.Nasby@bluetreble.__com<mailto:Jim.Nasby@bluetreble.com>>>: > > On 1/16/15 11:00 AM, Pavel Stehule wrote: > > Hi all, > > some time ago, I proposed a lock time measurement related to query. A main issue was a method, howto show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per databasein pg_stat_database statistics. High number can be signal about lock issues. > > > Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize thatwon't handle all cases; we'd still need a "lock_time_other" somewhere). > > > > it can use a current existing stats mechanisms > > I afraid so isn't possible to assign waiting time to table - because it depends on order > > > Huh? Order of what? > > > when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is notimportant -- so what I have to cont as lock time for T1 and T2? If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has thelock. > Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we hadwas some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probablywouldn't make sense but we might be able to count how often we fail initial acquisition or something. > > > now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time shouldnot be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight,pages, and others categories. > > > So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggestneed here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace. > > > For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularitybut I am not sure, if a common statistics are best tool. Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table ora row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn'ttell anyone anything. > My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lotof waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everydayusage - simple check if locks are not a issue (or if locking is stable). Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging. If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granularlocking later), but I can't really get excited about it either. > and this proposal has sense only for heavyweight locks - because others locks are everywhere So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what'sgoing on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every timewe attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically reportthat. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-16 20:33 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
 
On 1/16/15 12:30 PM, Pavel Stehule wrote:
2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:
2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 11:35 AM, Pavel Stehule wrote:
2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:
On 1/16/15 11:00 AM, Pavel Stehule wrote:
Hi all,
some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.
Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).
it can use a current existing stats mechanisms
I afraid so isn't possible to assign waiting time to table - because it depends on order
Huh? Order of what?
when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?
If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has the lock.Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.
now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.
So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.
For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.
Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table or a row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn't tell anyone anything.My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).
Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging.
If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granular locking later), but I can't really get excited about it either.and this proposal has sense only for heavyweight locks - because others locks are everywhere
So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what's going on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every time we attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically report that.
I have a plan to update statistics when all necessary keys are acquired - so it is once per statement - it is similar press on stats system like now.
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Hi
			
		2015-01-16 20:33 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
 
On 1/16/15 12:30 PM, Pavel Stehule wrote:
2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:
2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
On 1/16/15 11:35 AM, Pavel Stehule wrote:
2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:
On 1/16/15 11:00 AM, Pavel Stehule wrote:
Hi all,
some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.
Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).
it can use a current existing stats mechanisms
I afraid so isn't possible to assign waiting time to table - because it depends on order
Huh? Order of what?
when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?
If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has the lock.Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.
now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.
So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.
For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.
Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table or a row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn't tell anyone anything.My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).
Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging.
If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granular locking later), but I can't really get excited about it either.and this proposal has sense only for heavyweight locks - because others locks are everywhere
So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what's going on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every time we attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically report that.
I was wrong - probably is possible to attach lock waiting time per table
Regards
Pavel
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com