On 12/28/16 10:26 PM, Robert Haas wrote:
> On Wed, Dec 28, 2016 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jim Nasby <Jim.Nasby@bluetreble.com> writes:
>>> On 12/28/16 11:25 AM, Tom Lane wrote:
>>>> The idea of just capturing the wait start for heavyweight locks, and
>>>> not other lock types, still seems superior to any of the alternatives
>>>> that have been suggested ...
>>
>>> Is some kind of alarm a viable option for the others? If setting the
>>> alarm is cheap,
>>
>> Setting an alarm is certain to require a gettimeofday and/or a kernel
>> call. It is far from cheap.
>
> If one were OK with a really-really-rough value for when the wait
> started, you could have a slot for a timestamp in PGPROC that is
> cleared by pgstat_report_wait_end() but not set by
> pgstat_report_wait_start(). Then you could have a background process
> that goes through and sets it for all processes advertising a wait
> event every second, or every 10 seconds, or every half second, or
> whatever you want. Of course this doesn't eliminate the overhead but
> it pushes it into the background which, if there are idle CPUs, is
> almost as good.
>
> I previously proposed something similar to this as a way of profiling
> waits and I believe you weren't very supportive of it, but I still
> think these kinds of ideas have some legs. We can neither take the
> approach that timestamp overhead is irrelevant nor the position that
> timestamps are expensive so let's not have any. Some third way has to
> be found.
I like that idea as a compromise. I've spent a lot of time looking at
systems with no process > 90% CPU and IO < 90% yet individual backends
being slow and wished I had some idea of what the backend was doing that
wasn't being accounted for.
BTW, instead of a flag I'd consider using a counter. That would allow
the background process to notice if backends were going into waits that
ultimately got resolved in between scans of PGPROC. That's obviously not
useful for pg_stat_activity, but it would be meaningful for anything
that was trying to accumulate wait time stats.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)