Обсуждение: proposal: lock_time for pg_stat_database

Поиск
Список
Период
Сортировка

proposal: lock_time for pg_stat_database

От
Pavel Stehule
Дата:
Hi all,

some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.

Comments, ideas, notices?

Regards

Pavel

Re: proposal: lock_time for pg_stat_database

От
Jim Nasby
Дата:
On 1/16/15 11:00 AM, Pavel Stehule wrote:
> Hi all,
>
> some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this
information.Today proposal is little bit simpler, but still useful. We can show a total lock time per database in
pg_stat_databasestatistics. High number can be signal about lock issues.
 

Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle
allcases; we'd still need a "lock_time_other" somewhere).
 

Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way
toknow if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make
sensebut we might be able to count how often we fail initial acquisition or something.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: proposal: lock_time for pg_stat_database

От
Pavel Stehule
Дата:


2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 11:00 AM, Pavel Stehule wrote:
Hi all,

some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).


it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it depends on order
 

Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: lock_time for pg_stat_database

От
Jim Nasby
Дата:
On 1/16/15 11:35 AM, Pavel Stehule wrote:
>
>
> 2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
>
>     On 1/16/15 11:00 AM, Pavel Stehule wrote:
>
>         Hi all,
>
>         some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show
thisinformation. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in
pg_stat_databasestatistics. High number can be signal about lock issues.
 
>
>
>     Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't
handleall cases; we'd still need a "lock_time_other" somewhere).
 
>
>
>
> it can use a current existing stats mechanisms
>
> I afraid so isn't possible to assign waiting time to table - because it depends on order

Huh? Order of what?

>     Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some
wayto know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't
makesense but we might be able to count how often we fail initial acquisition or something.
 
>
>
> now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be
interesting,waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight,
pages,and others categories.
 

So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need
here.Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: proposal: lock_time for pg_stat_database

От
Pavel Stehule
Дата:


2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 11:35 AM, Pavel Stehule wrote:


2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:

    On 1/16/15 11:00 AM, Pavel Stehule wrote:

        Hi all,

        some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.


    Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).



it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?

DDL statements are exception - there is almost simple mapping between relations and lock time reason.
 

    Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.


now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.

For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.

My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).
 

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: lock_time for pg_stat_database

От
Pavel Stehule
Дата:


2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 11:35 AM, Pavel Stehule wrote:


2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:

    On 1/16/15 11:00 AM, Pavel Stehule wrote:

        Hi all,

        some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.


    Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).



it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?

DDL statements are exception - there is almost simple mapping between relations and lock time reason.
 

    Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.


now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.

For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.

My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).

and this proposal has sense only for heavyweight locks - because others locks are everywhere
 

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: proposal: lock_time for pg_stat_database

От
Jim Nasby
Дата:
On 1/16/15 12:30 PM, Pavel Stehule wrote:
>
>
> 2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:
>
>
>
>     2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:
>
>         On 1/16/15 11:35 AM, Pavel Stehule wrote:
>
>
>
>             2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>
<mailto:Jim.Nasby@bluetreble.__com<mailto:Jim.Nasby@bluetreble.com>>>:
 
>
>                  On 1/16/15 11:00 AM, Pavel Stehule wrote:
>
>                      Hi all,
>
>                      some time ago, I proposed a lock time measurement related to query. A main issue was a method,
howto show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per
databasein pg_stat_database statistics. High number can be signal about lock issues.
 
>
>
>                  Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize
thatwon't handle all cases; we'd still need a "lock_time_other" somewhere).
 
>
>
>
>             it can use a current existing stats mechanisms
>
>             I afraid so isn't possible to assign waiting time to table - because it depends on order
>
>
>         Huh? Order of what?
>
>
>     when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is
notimportant -- so what I have to cont as lock time for T1 and T2?
 

If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has
thelock.
 

>                  Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we
hadwas some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time
probablywouldn't make sense but we might be able to count how often we fail initial acquisition or something.
 
>
>
>             now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time
shouldnot be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use
Heavyweight,pages, and others categories.
 
>
>
>         So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the
biggestneed here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.
 
>
>
>     For these global statistics I see as important a common total waiting time for locks - we can use a more detailed
granularitybut I am not sure, if a common statistics are best tool.
 

Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table
ora row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that
doesn'ttell anyone anything.
 

>     My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue,
lotof waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for
everydayusage - simple check if locks are not a issue (or if locking is stable).
 

Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock
logging.

If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more
granularlocking later), but I can't really get excited about it either.
 

> and this proposal has sense only for heavyweight locks - because others locks are everywhere

So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know
what'sgoing on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every
timewe attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically
reportthat.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: proposal: lock_time for pg_stat_database

От
Pavel Stehule
Дата:


2015-01-16 20:33 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 12:30 PM, Pavel Stehule wrote:


2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:



    2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:

        On 1/16/15 11:35 AM, Pavel Stehule wrote:



            2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:

                 On 1/16/15 11:00 AM, Pavel Stehule wrote:

                     Hi all,

                     some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.


                 Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).



            it can use a current existing stats mechanisms

            I afraid so isn't possible to assign waiting time to table - because it depends on order


        Huh? Order of what?


    when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?

If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has the lock.

                 Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.


            now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.


        So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.


    For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.

Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table or a row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn't tell anyone anything.

    My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).

Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging.

If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granular locking later), but I can't really get excited about it either.

and this proposal has sense only for heavyweight locks - because others locks are everywhere

So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what's going on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every time we attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically report that.

I have a plan to update statistics when all necessary keys are acquired - so it is once per statement - it is similar press on stats system like now.

Pavel
 

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: proposal: lock_time for pg_stat_database

От
Pavel Stehule
Дата:
Hi

2015-01-16 20:33 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/16/15 12:30 PM, Pavel Stehule wrote:


2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:



    2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:

        On 1/16/15 11:35 AM, Pavel Stehule wrote:



            2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:

                 On 1/16/15 11:00 AM, Pavel Stehule wrote:

                     Hi all,

                     some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.


                 Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).



            it can use a current existing stats mechanisms

            I afraid so isn't possible to assign waiting time to table - because it depends on order


        Huh? Order of what?


    when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?

If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has the lock.

                 Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.


            now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.


        So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.


    For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.

Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table or a row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn't tell anyone anything.

    My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).

Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging.

If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granular locking later), but I can't really get excited about it either.

and this proposal has sense only for heavyweight locks - because others locks are everywhere

So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what's going on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every time we attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically report that.


I was wrong - probably is possible to attach lock waiting time per table

Regards

Pavel
 

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com