Обсуждение: proposal: add 'waiting for replication' to pg_stat_activity.state

Поиск
Список
Период
Сортировка

proposal: add 'waiting for replication' to pg_stat_activity.state

От
Julian Schauder
Дата:
Hello Hackers,


I recently analyzed an incident where a major lag in synchronous replication
blocked a number of synchronous backends. I found myself looking at backends
that, according to pg_stat_activity, were neither waiting nor idle but yet they
didn't finish their work.

As it turns out, the major waiting loop for syncrep updates the processtitle,
but is silent within postgres and stat_activity. It seems misleading that
commited but waiting backends are 'active' although there is little done apart
from waiting.

> # select pid, waiting, state, substr(query,1,6) from pg_stat_activity ;
>   pid  | waiting | state  | substr
> -------+---------+--------+--------
>  26294 | f       | active | END;
>  26318 | f       | active | create
>  26323 | f       | active | insert
>  26336 | f       | active | insert
(output of waiting statements [vanilla])

While 'active' is technically correct for a backend that is commited but waiting
for replication in terms of 'not beeing available for new tasks', it also
implies that a backend is dealing with the issue at hand. The remote host
however is out of our clusters control, hence all signs should be pointing to
the standby-host.


I suggest adding a new state to pg_stat_activity.state for backends that are
waiting for their synchronous commit to be flushed on the remote host.
I chose 'waiting for synchronous replication' for now.

One should refrain from the waiting flag at this point as there is no waiting
done on internal processes. Instead the backend waits for factors beyond our
clusters control to change.


> # select pid, waiting, state, substr(query,1,6) from pg_stat_activity ;
>  pid  | waiting |                state                | substr
> ------+---------+-------------------------------------+--------
>  3360 | f       | waiting for synchronous replication | END;
>  3465 | f       | waiting for synchronous replication | create
>  3477 | f       | waiting for synchronous replication | insert
>  3489 | f       | waiting for synchronous replication | insert
(output of waiting statements [patched])


patch attached


regards,

Julian Schauder

Вложения

Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Craig Ringer
Дата:
On 3 December 2015 at 04:22, Julian Schauder <julian.schauder@credativ.de> wrote:
 
I suggest adding a new state to pg_stat_activity.state for backends that are
waiting for their synchronous commit to be flushed on the remote host.


Excellent idea. Anything that improves management and visibility into what the system is doing like this is really valuable. 

I notice that you don't set the 'waiting' flag.  'waiting' is presently documented as:

       <entry>True if this backend is currently waiting on a lock</entry>

... but I'm inclined to just widen its definition and set it here, since we most certainly are waiting, and the column isn't named 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring queries people have since they generally do an inner join on pg_locks anyway.

There are no test changes in this patch, but that's reasonable because we don't currently have a way to automate tests of sync rep.

I've attached a patch with minor wording/formatting changes, but I think I'd like to change 'waiting' to true as well. Opinions?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Craig Ringer
Дата:
On 3 December 2015 at 04:22, Julian Schauder <julian.schauder@credativ.de> wrote:

I suggest adding a new state to pg_stat_activity.state for backends that are
waiting for their synchronous commit to be flushed on the remote host.
I chose 'waiting for synchronous replication' for now.


I've added this to the next CF:



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Peter Eisentraut
Дата:
On 12/2/15 7:00 PM, Craig Ringer wrote:
> I notice that you don't set the 'waiting' flag.  'waiting' is presently
> documented as:
> 
>        <entry>True if this backend is currently waiting on a lock</entry>
> 
> ... but I'm inclined to just widen its definition and set it here, since
> we most certainly are waiting, and the column isn't named
> 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> queries people have since they generally do an inner join on pg_locks
> anyway.

I'm not so sure about that assumption.




Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Craig Ringer
Дата:
On 3 December 2015 at 09:32, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/2/15 7:00 PM, Craig Ringer wrote:
> I notice that you don't set the 'waiting' flag.  'waiting' is presently
> documented as:
>
>        <entry>True if this backend is currently waiting on a lock</entry>
>
> ... but I'm inclined to just widen its definition and set it here, since
> we most certainly are waiting, and the column isn't named
> 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> queries people have since they generally do an inner join on pg_locks
> anyway.

I'm not so sure about that assumption.

Even if it's an outer join, the worst that'll happen is that they'll get entries with nulls in pg_locks. I don't think it's worth worrying about too much.

We could always mitigate it by adding a pg_lock_status view to the system catalogs with a decent canned query over pg_stat_activity and pg_locks, so people can stop copying & pasting from the wiki or using buggy homebrew queries ;)

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Amit Kapila
Дата:
On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
> On 3 December 2015 at 09:32, Peter Eisentraut <peter_e@gmx.net> wrote:
>>
>> On 12/2/15 7:00 PM, Craig Ringer wrote:
>> > I notice that you don't set the 'waiting' flag.  'waiting' is presently
>> > documented as:
>> >
>> >        <entry>True if this backend is currently waiting on a lock</entry>
>> >
>> > ... but I'm inclined to just widen its definition and set it here, since
>> > we most certainly are waiting, and the column isn't named
>> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
>> > queries people have since they generally do an inner join on pg_locks
>> > anyway.
>>
>> I'm not so sure about that assumption.
>
>
> Even if it's an outer join, the worst that'll happen is that they'll get entries with nulls in pg_locks. I don't think it's worth worrying about too much.
>

That can be one way of dealing with it and another is that we
keep the current column as it is and add another view related
wait stats, anyway we need something like that for other purposes
like lwlock waits etc.  Basically something on lines what we
is being discussed in thread [1]

Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Craig Ringer
Дата:


On 3 December 2015 at 22:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>
> On 3 December 2015 at 09:32, Peter Eisentraut <peter_e@gmx.net> wrote:
>>
>> On 12/2/15 7:00 PM, Craig Ringer wrote:
>> > I notice that you don't set the 'waiting' flag.  'waiting' is presently
>> > documented as:
>> >
>> >        <entry>True if this backend is currently waiting on a lock</entry>
>> >
>> > ... but I'm inclined to just widen its definition and set it here, since
>> > we most certainly are waiting, and the column isn't named
>> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
>> > queries people have since they generally do an inner join on pg_locks
>> > anyway.
>>
>> I'm not so sure about that assumption.
>
>
> Even if it's an outer join, the worst that'll happen is that they'll get entries with nulls in pg_locks. I don't think it's worth worrying about too much.
>

That can be one way of dealing with it and another is that we
keep the current column as it is and add another view related
wait stats, anyway we need something like that for other purposes
like lwlock waits etc.  Basically something on lines what we
is being discussed in thread [1]



Good point. I'm not sure this shouldn't set 'waiting' anyway, though.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Andres Freund
Дата:
On December 4, 2015 9:48:55 AM GMT+01:00, Craig Ringer <craig@2ndquadrant.com> wrote:
>On 3 December 2015 at 22:58, Amit Kapila <amit.kapila16@gmail.com>
>wrote:
>
>> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig@2ndquadrant.com>
>> wrote:

>http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=LwAJ+w@mail.gmail.com
>>
>Good point. I'm not sure this shouldn't set 'waiting' anyway, though.

No. Waiting corresponds to pg locks -setting it to true for other things would be even more confusing...

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On December 4, 2015 9:48:55 AM GMT+01:00, Craig Ringer <craig@2ndquadrant.com> wrote:
> >On 3 December 2015 at 22:58, Amit Kapila <amit.kapila16@gmail.com>
> >wrote:
> >
> >> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer <craig@2ndquadrant.com>
> >> wrote:
> 
> >http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=LwAJ+w@mail.gmail.com
> >>
> >Good point. I'm not sure this shouldn't set 'waiting' anyway, though.
> 
> No. Waiting corresponds to pg locks -setting it to true for other things would be even more confusing...

Robert's opinion elsewhere is relevant for this patch, and contradicts
Andres' opinion here:
http://www.postgresql.org/message-id/CA+Tgmoaj+EPoO9qobvFH18F5kJg2tAEXhQ8_-VayWHUr-ZATMw@mail.gmail.com

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Thomas Munro
Дата:
On Thu, Dec 3, 2015 at 1:00 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 3 December 2015 at 04:22, Julian Schauder <julian.schauder@credativ.de>
> wrote:
>
>>
>> I suggest adding a new state to pg_stat_activity.state for backends that
>> are
>>
>> waiting for their synchronous commit to be flushed on the remote host.
>>
>
> Excellent idea. Anything that improves management and visibility into what
> the system is doing like this is really valuable.
>
> I notice that you don't set the 'waiting' flag.  'waiting' is presently
> documented as:
>
>        <entry>True if this backend is currently waiting on a lock</entry>
>
> ... but I'm inclined to just widen its definition and set it here, since we
> most certainly are waiting, and the column isn't named 'waiting_on_a_lock'.
> It shouldn't upset various canned lock monitoring queries people have since
> they generally do an inner join on pg_locks anyway.
>
> There are no test changes in this patch, but that's reasonable because we
> don't currently have a way to automate tests of sync rep.
>
> I've attached a patch with minor wording/formatting changes, but I think I'd
> like to change 'waiting' to true as well. Opinions?

A couple of minor things I noticed:

+           <literal>waiting for synchronous replication</>: The
backend is waiting for its transaction to be flushed on a synchronous
standby.

I wouldn't say "flushed" here.  If you set synchronous_commit =
remote_write, then it's not actually waiting for it to be flushed (and
there may eventually be other levels too).  Maybe just "The backend is
waiting for a response from a synchronous standby"?

+#include <pgstat.h>

I think this should use "" instead of <> and should come after
#include "miscadmin.h".

Thinking of other patches in flight, I think I'd want the proposed
N-sync standbys feature to be able to explain in more detail what it's
waiting for (and likewise my causal reads proposal which does that via
the process title), though I realise that the details of how/where to
do that are changing in the "replace pg_stat_activity.waiting" thread
which this patch is waiting on.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: proposal: add 'waiting for replication' to pg_stat_activity.state

От
Alvaro Herrera
Дата:
Thomas Munro wrote:

> Thinking of other patches in flight, I think I'd want the proposed
> N-sync standbys feature to be able to explain in more detail what it's
> waiting for (and likewise my causal reads proposal which does that via
> the process title), though I realise that the details of how/where to
> do that are changing in the "replace pg_stat_activity.waiting" thread
> which this patch is waiting on.

Right, interesting thought.  I think that for the time being we should
close this one as returned with feedback, since there's no point in
keeping it open in commitfest.  I encourage the author (and everyone
else) to look at the other patch and help there, which is going to
ultimately achieve the outcome desired with this patch.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services