Re: RFC: replace pg_stat_activity.waiting with something more descriptive

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Дата
Msg-id CAPpHfdtdF8LyR0zBA_tzAwYq00GFZyVbh_XfFAABRQQ=MBnxAA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Amit Kapila <amit.kapila16@gmail.com>)
Re: RFC: replace pg_stat_activity.waiting with something more descriptive  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Jun 26, 2015 at 6:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jun 25, 2015 at 9:23 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 6/22/15 1:37 PM, Robert Haas wrote:
>> Currently, the only time we report a process as waiting is when it is
>> waiting for a heavyweight lock.  I'd like to make that somewhat more
>> fine-grained, by reporting the type of heavyweight lock it's awaiting
>> (relation, relation extension, transaction, etc.).  Also, I'd like to
>> report when we're waiting for a lwlock, and report either the specific
>> fixed lwlock for which we are waiting, or else the type of lock (lock
>> manager lock, buffer content lock, etc.) for locks of which there is
>> more than one.  I'm less sure about this next part, but I think we
>> might also want to report ourselves as waiting when we are doing an OS
>> read or an OS write, because it's pretty common for people to think
>> that a PostgreSQL bug is to blame when in fact it's the operating
>> system that isn't servicing our I/O requests very quickly.
>
> Could that also cover waiting on network?

Possibly.  My approach requires that the number of wait states be kept
relatively small, ideally fitting in a single byte.  And it also
requires that we insert pgstat_report_waiting() calls around the thing
that is notionally blocking.  So, if there are a small number of
places in the code where we do network I/O, we could stick those calls
around those places, and this would work just fine.  But if a foreign
data wrapper, or any other piece of code, does network I/O - or any
other blocking operation - without calling pgstat_report_waiting(), we
just won't know about it.

Idea of fitting wait information into single byte and avoid both locking and atomic operations is attractive.
But how long we can go with it?
Could DBA make some conclusion by single querying of pg_stat_activity or double querying?
In order to make a conclusion about system load one have to run daemon or background worker which is continuously sampling current wait events.
Sampling current wait event with high rate also gives some overhead to the system as well as locking or atomic operations.
Checking if backend is stuck isn't easy as well. If you don't expose how long last wait event continues it's hard to distinguish getting stuck on particular lock and high concurrency on that lock type.

I can propose following:

1) Expose more information about current lock to user. For instance, having duration of current wait event, user can determine if backend is getting stuck on particular event without sampling. 
2) Accumulate per backend statistics about each wait event type: number of occurrences and total duration. With this statistics user can identify system bottlenecks again without sampling.

Number #2 will be provided as a separate patch.
Number #1 require different concurrency model. ldus will extract it from "waits monitoring" patch shortly.
 
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [PATCH] Generalized JSON output functions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: polymorphic types - enforce casting to most common type automatically