Обсуждение: [HACKERS] pg_stat_activity.waiting_start
Hi hackers, We already have xact_start, query_start and backend_start to get the timestamptz for when different things happened. I would like to propose adding a fourth such column, "waiting_start", which would tell how long time a backend has been waiting. The column would be NULL when waiting=FALSE. While it's trivial to write a script that just polls pg_stat_activity every second and keeps tack of when a backend started waiting by just checking for any new waiting=TRUE rows, it would be more convenient to just get the information from pg_stat_activity directly. The use-case would be e.g. monitoring tools when you want to know how long time queries are waiting. -- Joel Jacobson
Actually, "wait_start" is a better name to match the others ("wait_event_type" and "wait_event"). On Sat, Dec 24, 2016 at 8:20 AM, Joel Jacobson <joel@trustly.com> wrote: > Hi hackers, > > We already have xact_start, query_start and backend_start > to get the timestamptz for when different things happened. > > I would like to propose adding a fourth such column, "waiting_start", > which would tell how long time a backend has been waiting. > > The column would be NULL when waiting=FALSE. > > While it's trivial to write a script that just polls pg_stat_activity > every second and keeps tack of when a backend started > waiting by just checking for any new waiting=TRUE rows, > it would be more convenient to just get the information from > pg_stat_activity directly. > > The use-case would be e.g. monitoring tools > when you want to know how long time queries are waiting. > > -- > Joel Jacobson -- Joel Jacobson Mobile: +46703603801 Trustly.com | Newsroom | LinkedIn | Twitter
Joel Jacobson <joel@trustly.com> writes: > We already have xact_start, query_start and backend_start > to get the timestamptz for when different things happened. > I would like to propose adding a fourth such column, "waiting_start", > which would tell how long time a backend has been waiting. The difficulty with that is it'd require a gettimeofday() call for every wait start. Even on platforms where those are relatively cheap, the overhead would be nasty --- and on some platforms, it'd be astonishingly bad. We sweated quite a lot to get the overhead of pg_stat_activity wait monitoring down to the point where it would be tolerable for non-heavyweight locks, but I'm afraid this would push it back into the not-tolerable range. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Joel Jacobson <joel@trustly.com> writes: > > We already have xact_start, query_start and backend_start > > to get the timestamptz for when different things happened. > > > I would like to propose adding a fourth such column, "waiting_start", > > which would tell how long time a backend has been waiting. > > The difficulty with that is it'd require a gettimeofday() call for > every wait start. Even on platforms where those are relatively cheap, > the overhead would be nasty --- and on some platforms, it'd be > astonishingly bad. We sweated quite a lot to get the overhead of > pg_stat_activity wait monitoring down to the point where it would be > tolerable for non-heavyweight locks, but I'm afraid this would push > it back into the not-tolerable range. Could we handle this like log_lock_waits..? Thanks! Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> The difficulty with that is it'd require a gettimeofday() call for >> every wait start. Even on platforms where those are relatively cheap, >> the overhead would be nasty --- and on some platforms, it'd be >> astonishingly bad. We sweated quite a lot to get the overhead of >> pg_stat_activity wait monitoring down to the point where it would be >> tolerable for non-heavyweight locks, but I'm afraid this would push >> it back into the not-tolerable range. > Could we handle this like log_lock_waits..? Well, that only applies to heavyweight locks, which do a gettimeofday anyway in order to schedule the deadlock-check timeout. If you were willing to populate this new column only for heavyweight locks, maybe it could be done for minimal overhead. But that would be backsliding quite a lot compared to what we just did to extend pg_stat_activity's coverage of lock types. regards, tom lane
On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I would like to propose adding a fourth such column, "waiting_start", >> which would tell how long time a backend has been waiting. > > The difficulty with that is it'd require a gettimeofday() call for > every wait start. Even on platforms where those are relatively cheap, > the overhead would be nasty --- and on some platforms, it'd be > astonishingly bad. We sweated quite a lot to get the overhead of > pg_stat_activity wait monitoring down to the point where it would be > tolerable for non-heavyweight locks, but I'm afraid this would push > it back into the not-tolerable range. I don't think we need the microsecond resolution provided by gettimeofday() via GetCurrentTimestamp() It would be enough to know which second the waiting started, so we could use time(). gettimeofday() takes 42 cycles. time() only takes 3 cycles. [1] [1] http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday
On Sat, Dec 24, 2016 at 9:56 AM, Joel Jacobson <joel@trustly.com> wrote: >> The difficulty with that is it'd require a gettimeofday() call for >> every wait start. Even on platforms where those are relatively cheap, I just realized how this can be optimized. We only need to set wait_start for every new waiting period, not for every wait start, i.e. not for every call to pgstat_report_wait_start(): Example: In pgstat_report_wait_start(): if (proc->wait_start == 0) proc->wait_start = (pg_time_t) time(NULL); And then in pgstat_report_wait_end(): proc->wait_start = 0; This means we only need to call time() or gettimeofday() once per waiting period. -- Joel Jacobson
Attached is a patch implementing the seconds-resolution wait_start, but presented as a timestamptz to the user, just like the other *_start fields:
commit c001e5c537e36d2683a7e55c7c8bfcc154de4c9d
Author: Joel Jacobson <joel@trustly.com>
Date: Sat Dec 24 13:20:09 2016 +0700
Add OUT parameter "wait_start" timestamptz to pg_stat_get_activity() and pg_catalog.pg_stat_activity
This is set to the timestamptz with seconds resolution
when the process started waiting, and reset to NULL
when it's not waiting any longer.
This is useful if you want to know not only what the process is
waiting for, but also how long it has been waiting,
which can be useful in situations when it might be
normal for different users/applications to wait for some amount of time,
but abnormal if they are waiting longer than some threshold.
When such a threshold is exceeded, monitoring applications
could then alert the user or possibly cancel/terminate
the blocking processes.
Without information on how long time processes have been waiting,
the monitoring applications would have no other option than
to cancel/terminate a process as soon as something is waiting,
or keep track of how long time processes have been waiting
by polling and keeping track on a per process basis,
which is less user-friendly than if PostgreSQL would provide
the information directly to the user.
src/backend/catalog/system_views.sql | 3 ++-
src/backend/postmaster/pgstat.c | 1 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 7 ++++++-
src/include/catalog/pg_proc.h | 2 +-
src/include/pgstat.h | 6 +++++-
src/include/storage/proc.h | 3 +++
7 files changed, 19 insertions(+), 4 deletions(-)
commit c001e5c537e36d2683a7e55c7c8bfcc154de4c9d
Author: Joel Jacobson <joel@trustly.com>
Date: Sat Dec 24 13:20:09 2016 +0700
Add OUT parameter "wait_start" timestamptz to pg_stat_get_activity() and pg_catalog.pg_stat_activity
This is set to the timestamptz with seconds resolution
when the process started waiting, and reset to NULL
when it's not waiting any longer.
This is useful if you want to know not only what the process is
waiting for, but also how long it has been waiting,
which can be useful in situations when it might be
normal for different users/applications to wait for some amount of time,
but abnormal if they are waiting longer than some threshold.
When such a threshold is exceeded, monitoring applications
could then alert the user or possibly cancel/terminate
the blocking processes.
Without information on how long time processes have been waiting,
the monitoring applications would have no other option than
to cancel/terminate a process as soon as something is waiting,
or keep track of how long time processes have been waiting
by polling and keeping track on a per process basis,
which is less user-friendly than if PostgreSQL would provide
the information directly to the user.
src/backend/catalog/system_views.sql | 3 ++-
src/backend/postmaster/pgstat.c | 1 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 7 ++++++-
src/include/catalog/pg_proc.h | 2 +-
src/include/pgstat.h | 6 +++++-
src/include/storage/proc.h | 3 +++
7 files changed, 19 insertions(+), 4 deletions(-)
On Sat, Dec 24, 2016 at 12:32 PM, Joel Jacobson <joel@trustly.com> wrote:
On Sat, Dec 24, 2016 at 9:56 AM, Joel Jacobson <joel@trustly.com> wrote:
>> The difficulty with that is it'd require a gettimeofday() call for
>> every wait start. Even on platforms where those are relatively cheap,
I just realized how this can be optimized.
We only need to set wait_start for every new waiting period,
not for every wait start, i.e. not for every call to pgstat_report_wait_start():
Example:
In pgstat_report_wait_start():
if (proc->wait_start == 0)
proc->wait_start = (pg_time_t) time(NULL);
And then in pgstat_report_wait_end():
proc->wait_start = 0;
This means we only need to call time() or gettimeofday() once per
waiting period.
--
Joel Jacobson
Вложения
Joel Jacobson <joel@trustly.com> writes: > On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The difficulty with that is it'd require a gettimeofday() call for >> every wait start. > I don't think we need the microsecond resolution provided by > gettimeofday() via GetCurrentTimestamp() > It would be enough to know which second the waiting started, so we > could use time(). For some use-cases, perhaps ... > gettimeofday() takes 42 cycles. > time() only takes 3 cycles. [1] ... on some platforms, perhaps, back in 2011. My own quick testing flatly contradicts that. On Linux (RHEL6) x86_64, I see gettimeofday() taking about 40ns (or circa 100 CPU cycles @ 2.4GHz), time() only a shade better at 38ns. On macOS x86_64 2.7GHz, I see gettimeofday() taking about 37ns, time() way behind at 178ns. I think we'd need at least an order of magnitude cheaper to consider putting timing calls into spinlock or lwlock paths, and that's just not available at all, let alone portably. This is not an easy problem. See our most recent discussion at https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us I'm prepared to consider an argument that wait timing might have weaker requirements than EXPLAIN ANALYZE (which certainly needs to measure short durations) but you didn't actually make that argument. regards, tom lane Linux: $ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c $ time ./time-timing-calls real 0m37.678s user 0m37.685s sys 0m0.002s $ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c $ time ./time-timing-calls real 0m39.964s user 0m39.971s sys 0m0.001s macOS: $ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c $ time ./time-timing-calls real 2m58.683s user 2m58.515s sys 0m0.131s $ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c $ time ./time-timing-calls real 0m37.004s user 0m36.993s sys 0m0.006s #include <stdio.h> #include <time.h> #include <sys/time.h> int main(int argc, char **argv) { int i; for (i=0; i<1000000000; i++) { #ifdef USE_GT struct timeval tv; gettimeofday(&tv, NULL); #else time_t tv; tv = time(NULL); #endif } return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> This is not an easy problem. See our most recent discussion at > https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us Thanks for the small test program. I tested it on my MacBook Pro and gettimeofday() was way faster than time(). The clock_gettime() used by the patch clock_gettime_1.patch in the mailing thread from Hari Babu apparently doesn't work on OS X. Instead, I tested the OS X specific mach_absolute_time() which was the fastest: gcc -Wall -O2 -o time-timing-calls -DUSE_MACH_ABSOLUTE_TIME time-timing-calls.c time ./time-timing-calls real 0m16.806s user 0m16.781s sys 0m0.012s gcc -Wall -O2 -o time-timing-calls -DUSE_GETTIMEOFDAY time-timing-calls.c time ./time-timing-calls real 0m35.466s user 0m35.062s sys 0m0.393s Code: #ifdef __MACH__ #include <mach/mach_time.h> #endif #ifdef USE_MACH_ABSOLUTE_TIME uint64_t tv; tv = mach_absolute_time(); #endif > I'm prepared to consider an argument that wait timing might have weaker > requirements than EXPLAIN ANALYZE (which certainly needs to measure short > durations) but you didn't actually make that argument. I can see why timing overload is a problem in EXPLAIN ANALYZE and at other places, and that would of course be a great thing to fix. However, I'm not sure I fully understand how it can be that much of a problem in pgstat_report_wait_start()? As far as I can tell from reading the source code, it only appears pgstat_report_wait_start() is only entered when a process is waiting? Is it not likely the time spent waiting will vastly exceed the amount of extra time for the gettimeofday() call? Is it really a typical real-life scenario that processes can be waiting extremely often for extremely short periods of time, where the timing overhead would be significant?
On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel@trustly.com> wrote: >Is it really a typical real-life scenario that processes can be >waiting extremely often for extremely short periods of time, >where the timing overhead would be significant? Yes. Consider WAL insertion, procarray or other similar contended locks. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sun, Dec 25, 2016 at 8:01 PM, Andres Freund <andres@anarazel.de> wrote: > On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel@trustly.com> wrote: > >>Is it really a typical real-life scenario that processes can be >>waiting extremely often for extremely short periods of time, >>where the timing overhead would be significant? > > Yes. Consider WAL insertion, procarray or other similar contended locks. Ah, I see, then I understand it has to be blazingly fast. Maybe a good tradeoff then would be to let "wait_start" represent the very first time the txn started waiting? That way gettimeofday() would only be called once per txn, and the value would be remembered, but not exposed when the txn is not waiting. If the txn is waiting/not waiting multiple times during it's life-time, the same "wait_start" value would be exposed when it's waiting, and NULL when it's not. Sounds good? As long as the documentation is clear on "wait_start" meaning when the first wait started in the txn, I think that's useful enough to improve the situation, as one could then ask a query like "select all processes that have possibly been waiting for at least 5 seconds", which you cannot do today. The best you can do today is ask a query like "select all processes that are waiting and have been running for at least 5 seconds", but during those 5 seconds they have been running, they might only have been waiting for the very last few milliseconds, which might not be a problem at all. If instead knowing they were waiting 5 seconds ago, and are still waiting, but might have had periods in between where they were not waiting, I would say that is close enough to what I as a user want to know, and can use that information for automatic decision-making on e.g. if I want to terminate other blocking processes.
Joel Jacobson <joel@trustly.com> writes: > Maybe a good tradeoff then would be to let "wait_start" represent the > very first time the txn started waiting? > ... > As long as the documentation is clear on "wait_start" meaning when the > first wait started in the txn, I think that's useful enough to improve > the situation, as one could then ask a query like "select all > processes that have possibly been waiting for at least 5 seconds", > which you cannot do today. Meh. You *can* do that now: query pg_stat_activity for waiting processes, wait a couple seconds, query again, intersect the results. I think really the only useful improvement on that would be to be able to tell that the process has been blocked continuously for X seconds, and what you're proposing here won't do that. In practice, there should never be waits on LWLocks (much less spinlocks) that exceed order-of-milliseconds; if there are, either we chose the wrong lock type or the system is pretty broken in general. So maybe it's sufficient if we provide a wait start time for heavyweight locks ... though that still seems kind of ugly. (Also, I don't recall the existing code factorization there, but getting the start time into pg_stat_activity without an extra gettimeofday call might be hard. As I said, there is one being done, but I'm not sure how accessible its result is.) I did a bit more research over the weekend into the cost of gettimeofday and clock_gettime, and have some new results that I'll post into that thread shortly. But the short answer is still "they're damn expensive on some platforms, and not really cheap anywhere". regards, tom lane
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > In practice, there should never be waits on LWLocks (much less spinlocks) > that exceed order-of-milliseconds; if there are, either we chose the wrong > lock type or the system is pretty broken in general. So maybe it's > sufficient if we provide a wait start time for heavyweight locks ... > though that still seems kind of ugly. While I agree that it's a bit ugly, if the alternative is "don't have anything", then I'd argue that it's worth it. The use-case for this feature, as I see it, is for admins to be able to go look at how long something has been waiting and monitoring scripts to which fire only every minute or more, and order-of-milliseconds differences aren't significant there. It's terribly ugly, but from a practical standpoint, we could probably make it "waiting after deadlock timeout" and just set the time when the deadlock timeout fires and the use-case for this would be satisfied. Thanks! Stephen
On Dec 24, 2016 5:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
I think we'd need at least an order ofmagnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.
For spinlocks we could conceivably just bite the bullet and use a raw rdtsc or the equivalent for other platforms. It might be pretty easy to distinguish sane numbers from numbers that result after a process reschedule and we could just discard data when that happens (or count occurrences).
That may possibly work for spinlocks but it won't work for anything heavier where process reschedules are routine.
On 12/27/16 11:17 AM, Greg Stark wrote: > > > On Dec 24, 2016 5:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > I think we'd need at least an order of > magnitude cheaper to consider putting timing calls into spinlock or > lwlock > paths, and that's just not available at all, let alone portably. > > > For spinlocks we could conceivably just bite the bullet and use a raw > rdtsc or the equivalent for other platforms. It might be pretty easy to > distinguish sane numbers from numbers that result after a process > reschedule and we could just discard data when that happens (or count > occurrences). > > That may possibly work for spinlocks but it won't work for anything > heavier where process reschedules are routine. Anything heavier and a ~50ns gettimeofday() call is probably not that bad anymore... I also wonder if setting an alarm is cheap enough to happen during a spinlock? Set a fairly short alarm on entry, clear it on exit. If the alarm fires, do the gettimeofday(). Add the alarm time back in and you're going to be close enough to when the wait started for any practical use of pg_stat_activity (perhaps this is what Stephen is suggesting about deadlock timeout...) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Sat, Dec 24, 2016 at 7:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> The difficulty with that is it'd require a gettimeofday() call for >>> every wait start. Even on platforms where those are relatively cheap, >>> the overhead would be nasty --- and on some platforms, it'd be >>> astonishingly bad. We sweated quite a lot to get the overhead of >>> pg_stat_activity wait monitoring down to the point where it would be >>> tolerable for non-heavyweight locks, but I'm afraid this would push >>> it back into the not-tolerable range. > >> Could we handle this like log_lock_waits..? > > Well, that only applies to heavyweight locks, which do a gettimeofday > anyway in order to schedule the deadlock-check timeout. If you were > willing to populate this new column only for heavyweight locks, maybe it > could be done for minimal overhead. But that would be backsliding > quite a lot compared to what we just did to extend pg_stat_activity's > coverage of lock types. > Can we think of introducing new guc trace_system_waits or something like that which will indicate that the sessions will report the value of wait_start in pg_stat_activity? The default value of such a parameter can be false which means wait_start will be shown as NULL in pg_stat_activity and when it is enabled the wait_start can show the time as proposed in this thread. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 12/28/16 7:10 AM, Amit Kapila wrote: > Can we think of introducing new guc trace_system_waits or something > like that which will indicate that the sessions will report the value > of wait_start in pg_stat_activity? The default value of such a > parameter can be false which means wait_start will be shown as NULL in > pg_stat_activity and when it is enabled the wait_start can show the > time as proposed in this thread. In my experience the problem with those kind of settings is that they're never enabled when you actually need them. I think it'd be much better to find a way to always capture wait_starts that are over some minimum duration, where collection overhead won't matter but you still have some good info about what's going on. For pg_stat_activity I'd think that threshold would be on the order of 50-100ms, though maybe there's other places where a tighter tolerance would help. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 12/28/16 7:10 AM, Amit Kapila wrote: >> Can we think of introducing new guc trace_system_waits or something >> like that which will indicate that the sessions will report the value >> of wait_start in pg_stat_activity? > In my experience the problem with those kind of settings is that they're > never enabled when you actually need them. Yeah. And they especially won't be enabled in production situations, if people find out that they cause lots of overhead. > I think it'd be much better > to find a way to always capture wait_starts that are over some minimum > duration, where collection overhead won't matter but you still have some > good info about what's going on. For pg_stat_activity I'd think that > threshold would be on the order of 50-100ms, though maybe there's other > places where a tighter tolerance would help. The idea of just capturing the wait start for heavyweight locks, and not other lock types, still seems superior to any of the alternatives that have been suggested ... regards, tom lane
On 12/28/16 11:25 AM, Tom Lane wrote: > The idea of just capturing the wait start for heavyweight locks, and > not other lock types, still seems superior to any of the alternatives > that have been suggested ... Is some kind of alarm a viable option for the others? If setting the alarm is cheap, you could just set one for say 5ms when you have to wait on a lock. If setting one is too expensive perhaps one could be set at transaction start that looks at a global (and the global would be set when a wait started). Obviously that means there's inaccuracy to the true time spent waiting, but I think that's certainly fine for pg_stat_activity. Most importantly, it would mean that if something has gone horribly wrong you'd at least have some kind of relatively accurate duration to work from. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > On 12/28/16 11:25 AM, Tom Lane wrote: >> The idea of just capturing the wait start for heavyweight locks, and >> not other lock types, still seems superior to any of the alternatives >> that have been suggested ... > Is some kind of alarm a viable option for the others? If setting the > alarm is cheap, Setting an alarm is certain to require a gettimeofday and/or a kernel call. It is far from cheap. regards, tom lane
On Wed, Dec 28, 2016 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> On 12/28/16 7:10 AM, Amit Kapila wrote: >>> Can we think of introducing new guc trace_system_waits or something >>> like that which will indicate that the sessions will report the value >>> of wait_start in pg_stat_activity? > >> In my experience the problem with those kind of settings is that they're >> never enabled when you actually need them. > I agree with that, but I think it might be better than nothing especially for LWLock waits. > Yeah. And they especially won't be enabled in production situations, > if people find out that they cause lots of overhead. > What kind of overhead are we expecting here? I think it probably depends on workload, but it is quite possible that in most workloads it is in single digit. I agree that in an ideal situation we should not allow even 1% overhead, however, if people are interested in doing the detail level of monitoring at the expense of some performance hit, then why not give them the option to do so. >> I think it'd be much better >> to find a way to always capture wait_starts that are over some minimum >> duration, where collection overhead won't matter but you still have some >> good info about what's going on. For pg_stat_activity I'd think that >> threshold would be on the order of 50-100ms, though maybe there's other >> places where a tighter tolerance would help. > > The idea of just capturing the wait start for heavyweight locks, and > not other lock types, still seems superior to any of the alternatives > that have been suggested ... > If we want to capture waits without any system level parameter, then that seems better option and maybe, later on, we can capture I/O waits as well. However, I feel as proposed the name of parameter wait_start can confuse users considering we have wait_event and wait_event_type for generic waits. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 28, 2016 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jim Nasby <Jim.Nasby@bluetreble.com> writes: >> On 12/28/16 11:25 AM, Tom Lane wrote: >>> The idea of just capturing the wait start for heavyweight locks, and >>> not other lock types, still seems superior to any of the alternatives >>> that have been suggested ... > >> Is some kind of alarm a viable option for the others? If setting the >> alarm is cheap, > > Setting an alarm is certain to require a gettimeofday and/or a kernel > call. It is far from cheap. If one were OK with a really-really-rough value for when the wait started, you could have a slot for a timestamp in PGPROC that is cleared by pgstat_report_wait_end() but not set by pgstat_report_wait_start(). Then you could have a background process that goes through and sets it for all processes advertising a wait event every second, or every 10 seconds, or every half second, or whatever you want. Of course this doesn't eliminate the overhead but it pushes it into the background which, if there are idle CPUs, is almost as good. I previously proposed something similar to this as a way of profiling waits and I believe you weren't very supportive of it, but I still think these kinds of ideas have some legs. We can neither take the approach that timestamp overhead is irrelevant nor the position that timestamps are expensive so let's not have any. Some third way has to be found. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 26, 2016 at 03:36:39PM -0500, Tom Lane wrote: > In practice, there should never be waits on LWLocks (much less spinlocks) > that exceed order-of-milliseconds; if there are, either we chose the wrong > lock type or the system is pretty broken in general. So maybe it's > sufficient if we provide a wait start time for heavyweight locks ... > though that still seems kind of ugly. (Also, I don't recall the existing > code factorization there, but getting the start time into pg_stat_activity > without an extra gettimeofday call might be hard. As I said, there is > one being done, but I'm not sure how accessible its result is.) Agreed. No need in adding overhead for short-lived locks because the milli-second values are going to be meaningless to users. I would be happy if we could find some weasel value for non-heavyweight locks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian <bruce@momjian.us> wrote: > Agreed. No need in adding overhead for short-lived locks because the > milli-second values are going to be meaningless to users. I would be > happy if we could find some weasel value for non-heavyweight locks. To avoid a NULL value for waiting_start, and thanks to non-heavyweight locks don't exceed order-of-milliseconds, I think it would be acceptable to just return now() whenever something wants to know waiting_start i.e. when something selects from pg_stat_activity. The exact value would only be within orders-of-milliseconds away from now() anyway, so one can argue it's not that important, as long as the documentation is clear on that point.
On Thu, Jan 5, 2017 at 06:48:17PM -1000, Joel Jacobson wrote: > On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Agreed. No need in adding overhead for short-lived locks because the > > milli-second values are going to be meaningless to users. I would be > > happy if we could find some weasel value for non-heavyweight locks. > > To avoid a NULL value for waiting_start, and thanks to non-heavyweight > locks don't exceed order-of-milliseconds, I think it would be > acceptable to just return now() whenever something wants to know > waiting_start i.e. when something selects from pg_stat_activity. > > The exact value would only be within orders-of-milliseconds away from > now() anyway, so one can argue it's not that important, as long as the > documentation is clear on that point. I don't think now() is a good value as it doesn't indicate to the user which values are real measurements and which are not. NULL is probably the best. +/-infinity is odd too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 2017-01-06 10:43:32 -0500, Bruce Momjian wrote: > On Thu, Jan 5, 2017 at 06:48:17PM -1000, Joel Jacobson wrote: > > On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Agreed. No need in adding overhead for short-lived locks because the > > > milli-second values are going to be meaningless to users. I would be > > > happy if we could find some weasel value for non-heavyweight locks. > > > > To avoid a NULL value for waiting_start, and thanks to non-heavyweight > > locks don't exceed order-of-milliseconds, I think it would be > > acceptable to just return now() whenever something wants to know > > waiting_start i.e. when something selects from pg_stat_activity. > > > > The exact value would only be within orders-of-milliseconds away from > > now() anyway, so one can argue it's not that important, as long as the > > documentation is clear on that point. > > I don't think now() is a good value as it doesn't indicate to the user > which values are real measurements and which are not. NULL is probably > the best. +/-infinity is odd too. Yea. If one wants to make NULL into now() it's trivial enough with a single coalesce(). Andres
On 6 January 2017 at 02:59, Bruce Momjian <bruce@momjian.us> wrote: > > Agreed. No need in adding overhead for short-lived locks because the > milli-second values are going to be meaningless to users. I would be > happy if we could find some weasel value for non-heavyweight locks. For what it's worth I don't think this is true. It may be true that we don't need precise measurements of short-lived locks but we do need accurate measurements even if they're in the expected range. What users need to know is in aggregate how much of the time the database is spending working on their queries is going into different states. Even if no LWLock is held for more than a few milliseconds at a time if it turns out that 80% of the time is being spend in waiting on LWLocks then there's no point in worrying about cpu speed, for example. And knowing *which* LWLock is taking up an aggregate of 80% of time would point at either configuration changes or code changes to reduce that contention. I would actually argue the reverse of the above proposal would be more useful. What we need are counts of how often LWLocks take longer than, say, 50ms and for shorter waits we need to know how long. Perhaps not precisely for individual waits but in aggregate we need the totals to be right so as long as the measurements are accurate that would be sufficient. So an accurate but imprecise measurement +/- 10ms with low overhead is better than a precise measurement with high overhead. -- greg
On Sat, Jan 7, 2017 at 01:25:08PM +0000, Greg Stark wrote: > I would actually argue the reverse of the above proposal would be more > useful. What we need are counts of how often LWLocks take longer than, > say, 50ms and for shorter waits we need to know how long. Perhaps not > precisely for individual waits but in aggregate we need the totals to > be right so as long as the measurements are accurate that would be > sufficient. So an accurate but imprecise measurement +/- 10ms with low > overhead is better than a precise measurement with high overhead. I agree those values are important, but I don't think people are going to be able to use pg_stat_activity to get them, so I don't see the point of trying to supply them there. See https://www.postgresql.org/message-id/CA+Tgmoav9Q5v5ZGT3+wP_1tQjT6TGYXrwrDcTRrWimC+ZY7RRA@mail.gmail.com for an excellent example of getting those values via polling. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Sat, Jan 7, 2017 at 3:25 AM, Greg Stark <stark@mit.edu> wrote: > What users need to know is in aggregate how much of the time the > database is spending working on their queries is going into different > states. This is a separate feature idea, but I think it's really valuable as well. Maybe something similar to pg_stat_user_functions? But instead grouping by wait_event_type, wait_event, and showing accumulated count and sum of waiting time since last stat reset, just like the other pg_stat_* views? Maybe something like this? \d pg_stat_waitingView "pg_catalog.pg_stat_waiting" Column | Type | Modifiers -----------------+------------------+-----------wait_event_type | name |wait_event | name |waiting_counter| bigint |waiting_time | double precision |
On 12/28/16 10:26 PM, Robert Haas wrote: > On Wed, Dec 28, 2016 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Jim Nasby <Jim.Nasby@bluetreble.com> writes: >>> On 12/28/16 11:25 AM, Tom Lane wrote: >>>> The idea of just capturing the wait start for heavyweight locks, and >>>> not other lock types, still seems superior to any of the alternatives >>>> that have been suggested ... >> >>> Is some kind of alarm a viable option for the others? If setting the >>> alarm is cheap, >> >> Setting an alarm is certain to require a gettimeofday and/or a kernel >> call. It is far from cheap. > > If one were OK with a really-really-rough value for when the wait > started, you could have a slot for a timestamp in PGPROC that is > cleared by pgstat_report_wait_end() but not set by > pgstat_report_wait_start(). Then you could have a background process > that goes through and sets it for all processes advertising a wait > event every second, or every 10 seconds, or every half second, or > whatever you want. Of course this doesn't eliminate the overhead but > it pushes it into the background which, if there are idle CPUs, is > almost as good. > > I previously proposed something similar to this as a way of profiling > waits and I believe you weren't very supportive of it, but I still > think these kinds of ideas have some legs. We can neither take the > approach that timestamp overhead is irrelevant nor the position that > timestamps are expensive so let's not have any. Some third way has to > be found. I like that idea as a compromise. I've spent a lot of time looking at systems with no process > 90% CPU and IO < 90% yet individual backends being slow and wished I had some idea of what the backend was doing that wasn't being accounted for. BTW, instead of a flag I'd consider using a counter. That would allow the background process to notice if backends were going into waits that ultimately got resolved in between scans of PGPROC. That's obviously not useful for pg_stat_activity, but it would be meaningful for anything that was trying to accumulate wait time stats. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 1/7/17 12:41 PM, Joel Jacobson wrote: > On Sat, Jan 7, 2017 at 3:25 AM, Greg Stark <stark@mit.edu> wrote: >> What users need to know is in aggregate how much of the time the >> database is spending working on their queries is going into different >> states. > > This is a separate feature idea, but I think it's really valuable as well. > > Maybe something similar to pg_stat_user_functions? > But instead grouping by wait_event_type, wait_event, and showing > accumulated count and sum of waiting time since last stat reset, just > like the other pg_stat_* views? > > Maybe something like this? > > \d pg_stat_waiting > View "pg_catalog.pg_stat_waiting" > Column | Type | Modifiers > -----------------+------------------+----------- > wait_event_type | name | > wait_event | name | > waiting_counter | bigint | > waiting_time | double precision | Yes, I've wanted this many times in the past. If combined with Robert's idea of a background process that does the expensive time calls this could potentially provide very useful information for even very short duration locks. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)