Обсуждение: [HACKERS] pg_stat_activity.waiting_start

Поиск
Список
Период
Сортировка

[HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
Hi hackers,

We already have xact_start, query_start and backend_start
to get the timestamptz for when different things happened.

I would like to propose adding a fourth such column, "waiting_start",
which would tell how long time a backend has been waiting.

The column would be NULL when waiting=FALSE.

While it's trivial to write a script that just polls pg_stat_activity
every second and keeps tack of when a backend started
waiting by just checking for any new waiting=TRUE rows,
it would be more convenient to just get the information from
pg_stat_activity directly.

The use-case would be e.g. monitoring tools
when you want to know how long time queries are waiting.

-- 
Joel Jacobson



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
Actually, "wait_start" is a better name to match the others
("wait_event_type" and "wait_event").


On Sat, Dec 24, 2016 at 8:20 AM, Joel Jacobson <joel@trustly.com> wrote:
> Hi hackers,
>
> We already have xact_start, query_start and backend_start
> to get the timestamptz for when different things happened.
>
> I would like to propose adding a fourth such column, "waiting_start",
> which would tell how long time a backend has been waiting.
>
> The column would be NULL when waiting=FALSE.
>
> While it's trivial to write a script that just polls pg_stat_activity
> every second and keeps tack of when a backend started
> waiting by just checking for any new waiting=TRUE rows,
> it would be more convenient to just get the information from
> pg_stat_activity directly.
>
> The use-case would be e.g. monitoring tools
> when you want to know how long time queries are waiting.
>
> --
> Joel Jacobson



-- 
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter



Re: [HACKERS] pg_stat_activity.waiting_start

От
Tom Lane
Дата:
Joel Jacobson <joel@trustly.com> writes:
> We already have xact_start, query_start and backend_start
> to get the timestamptz for when different things happened.

> I would like to propose adding a fourth such column, "waiting_start",
> which would tell how long time a backend has been waiting.

The difficulty with that is it'd require a gettimeofday() call for
every wait start.  Even on platforms where those are relatively cheap,
the overhead would be nasty --- and on some platforms, it'd be
astonishingly bad.  We sweated quite a lot to get the overhead of
pg_stat_activity wait monitoring down to the point where it would be
tolerable for non-heavyweight locks, but I'm afraid this would push
it back into the not-tolerable range.
        regards, tom lane



Re: [HACKERS] pg_stat_activity.waiting_start

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Joel Jacobson <joel@trustly.com> writes:
> > We already have xact_start, query_start and backend_start
> > to get the timestamptz for when different things happened.
>
> > I would like to propose adding a fourth such column, "waiting_start",
> > which would tell how long time a backend has been waiting.
>
> The difficulty with that is it'd require a gettimeofday() call for
> every wait start.  Even on platforms where those are relatively cheap,
> the overhead would be nasty --- and on some platforms, it'd be
> astonishingly bad.  We sweated quite a lot to get the overhead of
> pg_stat_activity wait monitoring down to the point where it would be
> tolerable for non-heavyweight locks, but I'm afraid this would push
> it back into the not-tolerable range.

Could we handle this like log_lock_waits..?

Thanks!

Stephen

Re: [HACKERS] pg_stat_activity.waiting_start

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> The difficulty with that is it'd require a gettimeofday() call for
>> every wait start.  Even on platforms where those are relatively cheap,
>> the overhead would be nasty --- and on some platforms, it'd be
>> astonishingly bad.  We sweated quite a lot to get the overhead of
>> pg_stat_activity wait monitoring down to the point where it would be
>> tolerable for non-heavyweight locks, but I'm afraid this would push
>> it back into the not-tolerable range.

> Could we handle this like log_lock_waits..?

Well, that only applies to heavyweight locks, which do a gettimeofday
anyway in order to schedule the deadlock-check timeout.  If you were
willing to populate this new column only for heavyweight locks, maybe it
could be done for minimal overhead.  But that would be backsliding
quite a lot compared to what we just did to extend pg_stat_activity's
coverage of lock types.
        regards, tom lane



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I would like to propose adding a fourth such column, "waiting_start",
>> which would tell how long time a backend has been waiting.
>
> The difficulty with that is it'd require a gettimeofday() call for
> every wait start.  Even on platforms where those are relatively cheap,
> the overhead would be nasty --- and on some platforms, it'd be
> astonishingly bad.  We sweated quite a lot to get the overhead of
> pg_stat_activity wait monitoring down to the point where it would be
> tolerable for non-heavyweight locks, but I'm afraid this would push
> it back into the not-tolerable range.

I don't think we need the microsecond resolution provided by
gettimeofday() via GetCurrentTimestamp()
It would be enough to know which second the waiting started, so we
could use time().

gettimeofday() takes 42 cycles.
time() only takes 3 cycles. [1]

[1] http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
On Sat, Dec 24, 2016 at 9:56 AM, Joel Jacobson <joel@trustly.com> wrote:
>> The difficulty with that is it'd require a gettimeofday() call for
>> every wait start.  Even on platforms where those are relatively cheap,

I just realized how this can be optimized.
We only need to set wait_start for every new waiting period,
not for every wait start, i.e. not for every call to pgstat_report_wait_start():

Example:

In pgstat_report_wait_start():
if (proc->wait_start == 0)  proc->wait_start = (pg_time_t) time(NULL);

And then in pgstat_report_wait_end():
proc->wait_start = 0;

This means we only need to call time() or gettimeofday() once per
waiting period.


-- 
Joel Jacobson



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
Attached is a patch implementing the seconds-resolution wait_start, but presented as a timestamptz to the user, just like the other *_start fields:

commit c001e5c537e36d2683a7e55c7c8bfcc154de4c9d
Author: Joel Jacobson <joel@trustly.com>
Date:   Sat Dec 24 13:20:09 2016 +0700

    Add OUT parameter "wait_start" timestamptz to pg_stat_get_activity() and pg_catalog.pg_stat_activity
   
    This is set to the timestamptz with seconds resolution
    when the process started waiting, and reset to NULL
    when it's not waiting any longer.
   
    This is useful if you want to know not only what the process is
    waiting for, but also how long it has been waiting,
    which can be useful in situations when it might be
    normal for different users/applications to wait for some amount of time,
    but abnormal if they are waiting longer than some threshold.
   
    When such a threshold is exceeded, monitoring applications
    could then alert the user or possibly cancel/terminate
    the blocking processes.
   
    Without information on how long time processes have been waiting,
    the monitoring applications would have no other option than
    to cancel/terminate a process as soon as something is waiting,
    or keep track of how long time processes have been waiting
    by polling and keeping track on a per process basis,
    which is less user-friendly than if PostgreSQL would provide
    the information directly to the user.

 src/backend/catalog/system_views.sql | 3 ++-
 src/backend/postmaster/pgstat.c      | 1 +
 src/backend/storage/lmgr/proc.c      | 1 +
 src/backend/utils/adt/pgstatfuncs.c  | 7 ++++++-
 src/include/catalog/pg_proc.h        | 2 +-
 src/include/pgstat.h                 | 6 +++++-
 src/include/storage/proc.h           | 3 +++
 7 files changed, 19 insertions(+), 4 deletions(-)



On Sat, Dec 24, 2016 at 12:32 PM, Joel Jacobson <joel@trustly.com> wrote:
On Sat, Dec 24, 2016 at 9:56 AM, Joel Jacobson <joel@trustly.com> wrote:
>> The difficulty with that is it'd require a gettimeofday() call for
>> every wait start.  Even on platforms where those are relatively cheap,

I just realized how this can be optimized.
We only need to set wait_start for every new waiting period,
not for every wait start, i.e. not for every call to pgstat_report_wait_start():

Example:

In pgstat_report_wait_start():
if (proc->wait_start == 0)
   proc->wait_start = (pg_time_t) time(NULL);

And then in pgstat_report_wait_end():
proc->wait_start = 0;

This means we only need to call time() or gettimeofday() once per
waiting period.


--
Joel Jacobson



--
Joel Jacobson

Mobile: +46703603801
Вложения

Re: [HACKERS] pg_stat_activity.waiting_start

От
Tom Lane
Дата:
Joel Jacobson <joel@trustly.com> writes:
> On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The difficulty with that is it'd require a gettimeofday() call for
>> every wait start.

> I don't think we need the microsecond resolution provided by
> gettimeofday() via GetCurrentTimestamp()
> It would be enough to know which second the waiting started, so we
> could use time().

For some use-cases, perhaps ...

> gettimeofday() takes 42 cycles.
> time() only takes 3 cycles. [1]

... on some platforms, perhaps, back in 2011.  My own quick testing flatly
contradicts that.  On Linux (RHEL6) x86_64, I see gettimeofday() taking
about 40ns (or circa 100 CPU cycles @ 2.4GHz), time() only a shade better
at 38ns.  On macOS x86_64 2.7GHz, I see gettimeofday() taking about 37ns,
time() way behind at 178ns.  I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.

This is not an easy problem.  See our most recent discussion at
https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us

I'm prepared to consider an argument that wait timing might have weaker
requirements than EXPLAIN ANALYZE (which certainly needs to measure short
durations) but you didn't actually make that argument.

            regards, tom lane


Linux:

$ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c 
$ time ./time-timing-calls

real    0m37.678s
user    0m37.685s
sys     0m0.002s
$ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c 
$ time ./time-timing-calls

real    0m39.964s
user    0m39.971s
sys     0m0.001s

macOS:

$ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c 
$ time ./time-timing-calls

real    2m58.683s
user    2m58.515s
sys     0m0.131s
$ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c 
$ time ./time-timing-calls

real    0m37.004s
user    0m36.993s
sys     0m0.006s

#include <stdio.h>
#include <time.h>
#include <sys/time.h>

int main(int argc, char **argv)
{
    int i;

    for (i=0; i<1000000000; i++)
    {
#ifdef USE_GT
        struct timeval tv;
        gettimeofday(&tv, NULL);
#else
        time_t tv;
        tv = time(NULL);
#endif
    }

    return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
> This is not an easy problem.  See our most recent discussion at
> https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us

Thanks for the small test program.
I tested it on my MacBook Pro and gettimeofday() was way faster than time().

The clock_gettime() used by the patch clock_gettime_1.patch in the
mailing thread from Hari Babu apparently doesn't work on OS X.

Instead, I tested the OS X specific mach_absolute_time() which was the fastest:

gcc -Wall -O2 -o time-timing-calls -DUSE_MACH_ABSOLUTE_TIME time-timing-calls.c
time ./time-timing-calls
real 0m16.806s
user 0m16.781s
sys 0m0.012s

gcc -Wall -O2 -o time-timing-calls -DUSE_GETTIMEOFDAY time-timing-calls.c
time ./time-timing-calls
real 0m35.466s
user 0m35.062s
sys 0m0.393s

Code:

#ifdef __MACH__
#include <mach/mach_time.h>
#endif
#ifdef USE_MACH_ABSOLUTE_TIME       uint64_t tv;       tv = mach_absolute_time();
#endif

> I'm prepared to consider an argument that wait timing might have weaker
> requirements than EXPLAIN ANALYZE (which certainly needs to measure short
> durations) but you didn't actually make that argument.

I can see why timing overload is a problem in EXPLAIN ANALYZE and at
other places,
and that would of course be a great thing to fix.

However, I'm not sure I fully understand how it can be that much of a
problem in pgstat_report_wait_start()?

As far as I can tell from reading the source code, it only appears
pgstat_report_wait_start() is only entered when a process is waiting?

Is it not likely the time spent waiting will vastly exceed the amount
of extra time for the gettimeofday() call?

Is it really a typical real-life scenario that processes can be
waiting extremely often for extremely short periods of time,
where the timing overhead would be significant?



Re: [HACKERS] pg_stat_activity.waiting_start

От
Andres Freund
Дата:

On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel@trustly.com> wrote:

>Is it really a typical real-life scenario that processes can be
>waiting extremely often for extremely short periods of time,
>where the timing overhead would be significant?

Yes. Consider WAL insertion, procarray or other similar contended locks.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
On Sun, Dec 25, 2016 at 8:01 PM, Andres Freund <andres@anarazel.de> wrote:
> On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel@trustly.com> wrote:
>
>>Is it really a typical real-life scenario that processes can be
>>waiting extremely often for extremely short periods of time,
>>where the timing overhead would be significant?
>
> Yes. Consider WAL insertion, procarray or other similar contended locks.

Ah, I see, then I understand it has to be blazingly fast.

Maybe a good tradeoff then would be to let "wait_start" represent the
very first time the txn started waiting?
That way gettimeofday() would only be called once per txn, and the
value would be remembered, but not exposed when the txn is not
waiting.
If the txn is waiting/not waiting multiple times during it's
life-time, the same "wait_start" value would be exposed when it's
waiting, and NULL when it's not. Sounds good?

As long as the documentation is clear on "wait_start" meaning when the
first wait started in the txn, I think that's useful enough to improve
the situation, as one could then ask a query like "select all
processes that have possibly been waiting for at least 5 seconds",
which you cannot do today.

The best you can do today is ask a query like "select all processes
that are waiting and have been running for at least 5 seconds", but
during those 5 seconds they have been running, they might only have
been waiting for the very last few milliseconds, which might not be a
problem at all. If instead knowing they were waiting 5 seconds ago,
and are still waiting, but might have had periods in between where
they were not waiting, I would say that is close enough to what I as a
user want to know, and can use that information for automatic
decision-making on e.g. if I want to terminate other blocking
processes.



Re: [HACKERS] pg_stat_activity.waiting_start

От
Tom Lane
Дата:
Joel Jacobson <joel@trustly.com> writes:
> Maybe a good tradeoff then would be to let "wait_start" represent the
> very first time the txn started waiting?
> ...
> As long as the documentation is clear on "wait_start" meaning when the
> first wait started in the txn, I think that's useful enough to improve
> the situation, as one could then ask a query like "select all
> processes that have possibly been waiting for at least 5 seconds",
> which you cannot do today.

Meh.  You *can* do that now: query pg_stat_activity for waiting processes,
wait a couple seconds, query again, intersect the results.  I think really
the only useful improvement on that would be to be able to tell that the
process has been blocked continuously for X seconds, and what you're
proposing here won't do that.

In practice, there should never be waits on LWLocks (much less spinlocks)
that exceed order-of-milliseconds; if there are, either we chose the wrong
lock type or the system is pretty broken in general.  So maybe it's
sufficient if we provide a wait start time for heavyweight locks ...
though that still seems kind of ugly.  (Also, I don't recall the existing
code factorization there, but getting the start time into pg_stat_activity
without an extra gettimeofday call might be hard.  As I said, there is
one being done, but I'm not sure how accessible its result is.)

I did a bit more research over the weekend into the cost of gettimeofday
and clock_gettime, and have some new results that I'll post into that
thread shortly.  But the short answer is still "they're damn expensive
on some platforms, and not really cheap anywhere".
        regards, tom lane



Re: [HACKERS] pg_stat_activity.waiting_start

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> In practice, there should never be waits on LWLocks (much less spinlocks)
> that exceed order-of-milliseconds; if there are, either we chose the wrong
> lock type or the system is pretty broken in general.  So maybe it's
> sufficient if we provide a wait start time for heavyweight locks ...
> though that still seems kind of ugly.

While I agree that it's a bit ugly, if the alternative is "don't have
anything", then I'd argue that it's worth it.  The use-case for this
feature, as I see it, is for admins to be able to go look at how long
something has been waiting and monitoring scripts to which fire only
every minute or more, and order-of-milliseconds differences aren't
significant there.

It's terribly ugly, but from a practical standpoint, we could probably
make it "waiting after deadlock timeout" and just set the time when the
deadlock timeout fires and the use-case for this would be satisfied.

Thanks!

Stephen

Re: [HACKERS] pg_stat_activity.waiting_start

От
Greg Stark
Дата:


On Dec 24, 2016 5:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.

For spinlocks we could conceivably just bite the bullet and use a raw rdtsc or the equivalent for other platforms. It might be pretty easy to distinguish sane numbers from numbers that result after a process reschedule and we could just discard data when that happens (or count occurrences).

That may possibly work for spinlocks but it won't work for anything heavier where process reschedules are routine.

Re: [HACKERS] pg_stat_activity.waiting_start

От
Jim Nasby
Дата:
On 12/27/16 11:17 AM, Greg Stark wrote:
>
>
> On Dec 24, 2016 5:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>
>     I think we'd need at least an order of
>     magnitude cheaper to consider putting timing calls into spinlock or
>     lwlock
>     paths, and that's just not available at all, let alone portably.
>
>
> For spinlocks we could conceivably just bite the bullet and use a raw
> rdtsc or the equivalent for other platforms. It might be pretty easy to
> distinguish sane numbers from numbers that result after a process
> reschedule and we could just discard data when that happens (or count
> occurrences).
>
> That may possibly work for spinlocks but it won't work for anything
> heavier where process reschedules are routine.

Anything heavier and a ~50ns gettimeofday() call is probably not that 
bad anymore...

I also wonder if setting an alarm is cheap enough to happen during a 
spinlock? Set a fairly short alarm on entry, clear it on exit. If the 
alarm fires, do the gettimeofday(). Add the alarm time back in and 
you're going to be close enough to when the wait started for any 
practical use of pg_stat_activity (perhaps this is what Stephen is 
suggesting about deadlock timeout...)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] pg_stat_activity.waiting_start

От
Amit Kapila
Дата:
On Sat, Dec 24, 2016 at 7:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>>> The difficulty with that is it'd require a gettimeofday() call for
>>> every wait start.  Even on platforms where those are relatively cheap,
>>> the overhead would be nasty --- and on some platforms, it'd be
>>> astonishingly bad.  We sweated quite a lot to get the overhead of
>>> pg_stat_activity wait monitoring down to the point where it would be
>>> tolerable for non-heavyweight locks, but I'm afraid this would push
>>> it back into the not-tolerable range.
>
>> Could we handle this like log_lock_waits..?
>
> Well, that only applies to heavyweight locks, which do a gettimeofday
> anyway in order to schedule the deadlock-check timeout.  If you were
> willing to populate this new column only for heavyweight locks, maybe it
> could be done for minimal overhead.  But that would be backsliding
> quite a lot compared to what we just did to extend pg_stat_activity's
> coverage of lock types.
>

Can we think of introducing new guc trace_system_waits or something
like that which will indicate that the sessions will report the value
of wait_start in pg_stat_activity?  The default value of such a
parameter can be false which means wait_start will be shown as NULL in
pg_stat_activity and when it is enabled the wait_start can show the
time as proposed in this thread.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] pg_stat_activity.waiting_start

От
Jim Nasby
Дата:
On 12/28/16 7:10 AM, Amit Kapila wrote:
> Can we think of introducing new guc trace_system_waits or something
> like that which will indicate that the sessions will report the value
> of wait_start in pg_stat_activity?  The default value of such a
> parameter can be false which means wait_start will be shown as NULL in
> pg_stat_activity and when it is enabled the wait_start can show the
> time as proposed in this thread.

In my experience the problem with those kind of settings is that they're 
never enabled when you actually need them. I think it'd be much better 
to find a way to always capture wait_starts that are over some minimum 
duration, where collection overhead won't matter but you still have some 
good info about what's going on. For pg_stat_activity I'd think that 
threshold would be on the order of 50-100ms, though maybe there's other 
places where a tighter tolerance would help.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] pg_stat_activity.waiting_start

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 12/28/16 7:10 AM, Amit Kapila wrote:
>> Can we think of introducing new guc trace_system_waits or something
>> like that which will indicate that the sessions will report the value
>> of wait_start in pg_stat_activity?

> In my experience the problem with those kind of settings is that they're 
> never enabled when you actually need them.

Yeah.  And they especially won't be enabled in production situations,
if people find out that they cause lots of overhead.

> I think it'd be much better 
> to find a way to always capture wait_starts that are over some minimum 
> duration, where collection overhead won't matter but you still have some 
> good info about what's going on. For pg_stat_activity I'd think that 
> threshold would be on the order of 50-100ms, though maybe there's other 
> places where a tighter tolerance would help.

The idea of just capturing the wait start for heavyweight locks, and
not other lock types, still seems superior to any of the alternatives
that have been suggested ...
        regards, tom lane



Re: [HACKERS] pg_stat_activity.waiting_start

От
Jim Nasby
Дата:
On 12/28/16 11:25 AM, Tom Lane wrote:
> The idea of just capturing the wait start for heavyweight locks, and
> not other lock types, still seems superior to any of the alternatives
> that have been suggested ...

Is some kind of alarm a viable option for the others? If setting the 
alarm is cheap, you could just set one for say 5ms when you have to wait 
on a lock. If setting one is too expensive perhaps one could be set at 
transaction start that looks at a global (and the global would be set 
when a wait started). Obviously that means there's inaccuracy to the 
true time spent waiting, but I think that's certainly fine for 
pg_stat_activity. Most importantly, it would mean that if something has 
gone horribly wrong you'd at least have some kind of relatively accurate 
duration to work from.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] pg_stat_activity.waiting_start

От
Tom Lane
Дата:
Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> On 12/28/16 11:25 AM, Tom Lane wrote:
>> The idea of just capturing the wait start for heavyweight locks, and
>> not other lock types, still seems superior to any of the alternatives
>> that have been suggested ...

> Is some kind of alarm a viable option for the others? If setting the 
> alarm is cheap,

Setting an alarm is certain to require a gettimeofday and/or a kernel
call.  It is far from cheap.
        regards, tom lane



Re: [HACKERS] pg_stat_activity.waiting_start

От
Amit Kapila
Дата:
On Wed, Dec 28, 2016 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> On 12/28/16 7:10 AM, Amit Kapila wrote:
>>> Can we think of introducing new guc trace_system_waits or something
>>> like that which will indicate that the sessions will report the value
>>> of wait_start in pg_stat_activity?
>
>> In my experience the problem with those kind of settings is that they're
>> never enabled when you actually need them.
>

I agree with that, but I think it might be better than nothing
especially for LWLock waits.

> Yeah.  And they especially won't be enabled in production situations,
> if people find out that they cause lots of overhead.
>

What kind of overhead are we expecting here?  I think it probably
depends on workload, but it is quite possible that in most workloads
it is in single digit.  I agree that in an ideal situation we should
not allow even 1% overhead, however, if people are interested in doing
the detail level of monitoring at the expense of some performance hit,
then why not give them the option to do so.

>> I think it'd be much better
>> to find a way to always capture wait_starts that are over some minimum
>> duration, where collection overhead won't matter but you still have some
>> good info about what's going on. For pg_stat_activity I'd think that
>> threshold would be on the order of 50-100ms, though maybe there's other
>> places where a tighter tolerance would help.
>
> The idea of just capturing the wait start for heavyweight locks, and
> not other lock types, still seems superior to any of the alternatives
> that have been suggested ...
>

If we want to capture waits without any system level parameter, then
that seems better option and maybe, later on, we can capture I/O waits
as well. However, I feel as proposed the name of parameter wait_start
can confuse users considering we have wait_event and wait_event_type
for generic waits.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] pg_stat_activity.waiting_start

От
Robert Haas
Дата:
On Wed, Dec 28, 2016 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jim Nasby <Jim.Nasby@bluetreble.com> writes:
>> On 12/28/16 11:25 AM, Tom Lane wrote:
>>> The idea of just capturing the wait start for heavyweight locks, and
>>> not other lock types, still seems superior to any of the alternatives
>>> that have been suggested ...
>
>> Is some kind of alarm a viable option for the others? If setting the
>> alarm is cheap,
>
> Setting an alarm is certain to require a gettimeofday and/or a kernel
> call.  It is far from cheap.

If one were OK with a really-really-rough value for when the wait
started, you could have a slot for a timestamp in PGPROC that is
cleared by pgstat_report_wait_end() but not set by
pgstat_report_wait_start().  Then you could have a background process
that goes through and sets it for all processes advertising a wait
event every second, or every 10 seconds, or every half second, or
whatever you want.  Of course this doesn't eliminate the overhead but
it pushes it into the background which, if there are idle CPUs, is
almost as good.

I previously proposed something similar to this as a way of profiling
waits and I believe you weren't very supportive of it, but I still
think these kinds of ideas have some legs.  We can neither take the
approach that timestamp overhead is irrelevant nor the position that
timestamps are expensive so let's not have any.  Some third way has to
be found.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] pg_stat_activity.waiting_start

От
Bruce Momjian
Дата:
On Mon, Dec 26, 2016 at 03:36:39PM -0500, Tom Lane wrote:
> In practice, there should never be waits on LWLocks (much less spinlocks)
> that exceed order-of-milliseconds; if there are, either we chose the wrong
> lock type or the system is pretty broken in general.  So maybe it's
> sufficient if we provide a wait start time for heavyweight locks ...
> though that still seems kind of ugly.  (Also, I don't recall the existing
> code factorization there, but getting the start time into pg_stat_activity
> without an extra gettimeofday call might be hard.  As I said, there is
> one being done, but I'm not sure how accessible its result is.)

Agreed.  No need in adding overhead for short-lived locks because the
milli-second values are going to be meaningless to users. I would be
happy if we could find some weasel value for non-heavyweight locks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Agreed.  No need in adding overhead for short-lived locks because the
> milli-second values are going to be meaningless to users. I would be
> happy if we could find some weasel value for non-heavyweight locks.

To avoid a NULL value for waiting_start, and thanks to non-heavyweight
locks don't exceed order-of-milliseconds, I think it would be
acceptable to just return now() whenever something wants to know
waiting_start i.e. when something selects from pg_stat_activity.

The exact value would only be within orders-of-milliseconds away from
now() anyway, so one can argue it's not that important, as long as the
documentation is clear on that point.



Re: [HACKERS] pg_stat_activity.waiting_start

От
Bruce Momjian
Дата:
On Thu, Jan  5, 2017 at 06:48:17PM -1000, Joel Jacobson wrote:
> On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Agreed.  No need in adding overhead for short-lived locks because the
> > milli-second values are going to be meaningless to users. I would be
> > happy if we could find some weasel value for non-heavyweight locks.
> 
> To avoid a NULL value for waiting_start, and thanks to non-heavyweight
> locks don't exceed order-of-milliseconds, I think it would be
> acceptable to just return now() whenever something wants to know
> waiting_start i.e. when something selects from pg_stat_activity.
> 
> The exact value would only be within orders-of-milliseconds away from
> now() anyway, so one can argue it's not that important, as long as the
> documentation is clear on that point.

I don't think now() is a good value as it doesn't indicate to the user
which values are real measurements and which are not.  NULL is probably
the best.  +/-infinity is odd too.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] pg_stat_activity.waiting_start

От
Andres Freund
Дата:
On 2017-01-06 10:43:32 -0500, Bruce Momjian wrote:
> On Thu, Jan  5, 2017 at 06:48:17PM -1000, Joel Jacobson wrote:
> > On Thu, Jan 5, 2017 at 4:59 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > Agreed.  No need in adding overhead for short-lived locks because the
> > > milli-second values are going to be meaningless to users. I would be
> > > happy if we could find some weasel value for non-heavyweight locks.
> > 
> > To avoid a NULL value for waiting_start, and thanks to non-heavyweight
> > locks don't exceed order-of-milliseconds, I think it would be
> > acceptable to just return now() whenever something wants to know
> > waiting_start i.e. when something selects from pg_stat_activity.
> > 
> > The exact value would only be within orders-of-milliseconds away from
> > now() anyway, so one can argue it's not that important, as long as the
> > documentation is clear on that point.
> 
> I don't think now() is a good value as it doesn't indicate to the user
> which values are real measurements and which are not.  NULL is probably
> the best.  +/-infinity is odd too.

Yea. If one wants to make NULL into now() it's trivial enough with a
single coalesce().

Andres



Re: [HACKERS] pg_stat_activity.waiting_start

От
Greg Stark
Дата:
On 6 January 2017 at 02:59, Bruce Momjian <bruce@momjian.us> wrote:
>
> Agreed.  No need in adding overhead for short-lived locks because the
> milli-second values are going to be meaningless to users. I would be
> happy if we could find some weasel value for non-heavyweight locks.

For what it's worth I don't think this is true. It may be true that we
don't need precise measurements of short-lived locks but we do need
accurate measurements even if they're in the expected range.

What users need to know is in aggregate how much of the time the
database is spending working on their queries is going into different
states. Even if no LWLock is held for more than a few milliseconds at
a time if it turns out that 80% of the time is being spend in waiting
on LWLocks then there's no point in worrying about cpu speed, for
example. And knowing *which* LWLock is taking up an aggregate of 80%
of time would point at either configuration changes or code changes to
reduce that contention.

I would actually argue the reverse of the above proposal would be more
useful. What we need are counts of how often LWLocks take longer than,
say, 50ms and for shorter waits we need to know how long. Perhaps not
precisely for individual waits but in aggregate we need the totals to
be right so as long as the measurements are accurate that would be
sufficient. So an accurate but imprecise measurement +/- 10ms with low
overhead is better than a precise measurement with high overhead.

-- 
greg



Re: [HACKERS] pg_stat_activity.waiting_start

От
Bruce Momjian
Дата:
On Sat, Jan  7, 2017 at 01:25:08PM +0000, Greg Stark wrote:
> I would actually argue the reverse of the above proposal would be more
> useful. What we need are counts of how often LWLocks take longer than,
> say, 50ms and for shorter waits we need to know how long. Perhaps not
> precisely for individual waits but in aggregate we need the totals to
> be right so as long as the measurements are accurate that would be
> sufficient. So an accurate but imprecise measurement +/- 10ms with low
> overhead is better than a precise measurement with high overhead.

I agree those values are important, but I don't think people are going
to be able to use pg_stat_activity to get them, so I don't see the point
of trying to supply them there.

See
https://www.postgresql.org/message-id/CA+Tgmoav9Q5v5ZGT3+wP_1tQjT6TGYXrwrDcTRrWimC+ZY7RRA@mail.gmail.com
for an excellent example of getting those values via polling.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] pg_stat_activity.waiting_start

От
Joel Jacobson
Дата:
On Sat, Jan 7, 2017 at 3:25 AM, Greg Stark <stark@mit.edu> wrote:
> What users need to know is in aggregate how much of the time the
> database is spending working on their queries is going into different
> states.

This is a separate feature idea, but I think it's really valuable as well.

Maybe something similar to pg_stat_user_functions?
But instead grouping by wait_event_type, wait_event, and showing
accumulated count and sum of waiting time since last stat reset, just
like the other pg_stat_* views?

Maybe something like this?

\d pg_stat_waitingView "pg_catalog.pg_stat_waiting"  Column        |       Type       | Modifiers
-----------------+------------------+-----------wait_event_type | name             |wait_event      | name
|waiting_counter| bigint           |waiting_time    | double precision |
 



Re: [HACKERS] pg_stat_activity.waiting_start

От
Jim Nasby
Дата:
On 12/28/16 10:26 PM, Robert Haas wrote:
> On Wed, Dec 28, 2016 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jim Nasby <Jim.Nasby@bluetreble.com> writes:
>>> On 12/28/16 11:25 AM, Tom Lane wrote:
>>>> The idea of just capturing the wait start for heavyweight locks, and
>>>> not other lock types, still seems superior to any of the alternatives
>>>> that have been suggested ...
>>
>>> Is some kind of alarm a viable option for the others? If setting the
>>> alarm is cheap,
>>
>> Setting an alarm is certain to require a gettimeofday and/or a kernel
>> call.  It is far from cheap.
>
> If one were OK with a really-really-rough value for when the wait
> started, you could have a slot for a timestamp in PGPROC that is
> cleared by pgstat_report_wait_end() but not set by
> pgstat_report_wait_start().  Then you could have a background process
> that goes through and sets it for all processes advertising a wait
> event every second, or every 10 seconds, or every half second, or
> whatever you want.  Of course this doesn't eliminate the overhead but
> it pushes it into the background which, if there are idle CPUs, is
> almost as good.
>
> I previously proposed something similar to this as a way of profiling
> waits and I believe you weren't very supportive of it, but I still
> think these kinds of ideas have some legs.  We can neither take the
> approach that timestamp overhead is irrelevant nor the position that
> timestamps are expensive so let's not have any.  Some third way has to
> be found.

I like that idea as a compromise. I've spent a lot of time looking at 
systems with no process > 90% CPU and IO < 90% yet individual backends 
being slow and wished I had some idea of what the backend was doing that 
wasn't being accounted for.

BTW, instead of a flag I'd consider using a counter. That would allow 
the background process to notice if backends were going into waits that 
ultimately got resolved in between scans of PGPROC. That's obviously not 
useful for pg_stat_activity, but it would be meaningful for anything 
that was trying to accumulate wait time stats.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] pg_stat_activity.waiting_start

От
Jim Nasby
Дата:
On 1/7/17 12:41 PM, Joel Jacobson wrote:
> On Sat, Jan 7, 2017 at 3:25 AM, Greg Stark <stark@mit.edu> wrote:
>> What users need to know is in aggregate how much of the time the
>> database is spending working on their queries is going into different
>> states.
>
> This is a separate feature idea, but I think it's really valuable as well.
>
> Maybe something similar to pg_stat_user_functions?
> But instead grouping by wait_event_type, wait_event, and showing
> accumulated count and sum of waiting time since last stat reset, just
> like the other pg_stat_* views?
>
> Maybe something like this?
>
> \d pg_stat_waiting
>  View "pg_catalog.pg_stat_waiting"
>    Column        |       Type       | Modifiers
> -----------------+------------------+-----------
>  wait_event_type | name             |
>  wait_event      | name             |
>  waiting_counter | bigint           |
>  waiting_time    | double precision |

Yes, I've wanted this many times in the past. If combined with Robert's 
idea of a background process that does the expensive time calls this 
could potentially provide very useful information for even very short 
duration locks.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)