Обсуждение: equivalent to "replication_timeout" on standby server

Поиск
Список
Период
Сортировка

equivalent to "replication_timeout" on standby server

От
Samba
Дата:
Hi all,
The postgres manual explains the "replication_timeout" to be used to
"Terminate replication connections that are inactive longer than the specified number of milliseconds. This is useful for the primary server to detect a standby crash or network outage"

Is there a similar configuration parameter that helps the WAL receiver processes to terminate the idle connections on the standby servers?

It would be very useful (for monitoring purpose) if the termination of such an idle connection on either master or standby servers is logged with appropriate message.

Could some one explain me if this is possible with postgres-9.1.1?

Thanks and Regards,
Samba

Re: equivalent to "replication_timeout" on standby server

От
Fujii Masao
Дата:
On Thu, Nov 3, 2011 at 12:25 AM, Samba <saasira@gmail.com> wrote:
> The postgres manual explains the "replication_timeout" to be used to
>
> "Terminate replication connections that are inactive longer than the
> specified number of milliseconds. This is useful for the primary server to
> detect a standby crash or network outage"
>
> Is there a similar configuration parameter that helps the WAL receiver
> processes to terminate the idle connections on the standby servers?

No.

But setting keepalive libpq parameters in primary_conninfo might be useful
to detect the termination of connection from the standby server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: equivalent to "replication_timeout" on standby server

От
Samba
Дата:
Thanks Fuji for that I hint...

I searched around on the internet for that trick and it looks like we can make the Standby close its connection to the master much earlier than it otherwise would;it is good for me now.

But still there seems to be two problem areas that can be improved over time... 
  • although both master(with replication_timeout)  and slave (with tcp timeout option in primary_conninfo parameter) closes the connection in quick time (based on tcp idle connection  timeout), as of now they do not log such information. It would be really helpful if such disconnects are logged with appropriate severity so that the problem can identified early and help in keeping track of patterns and history of such issues.

  • Presently, neither master nor standby server attempts to resume streaming replication when they happen to see each other after some prolonged disconnect. It would be better if either master or slave or both the servers makes periodic checks to find if the other is reachable and resume the replication( if possible, or else log the message that a full sync may be required).

Thanks and Regards,
Samba

----------------------------------------------------------------------------------------------------------------------
On Fri, Nov 4, 2011 at 7:25 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Nov 3, 2011 at 12:25 AM, Samba <saasira@gmail.com> wrote:
> The postgres manual explains the "replication_timeout" to be used to
>
> "Terminate replication connections that are inactive longer than the
> specified number of milliseconds. This is useful for the primary server to
> detect a standby crash or network outage"
>
> Is there a similar configuration parameter that helps the WAL receiver
> processes to terminate the idle connections on the standby servers?

No.

But setting keepalive libpq parameters in primary_conninfo might be useful
to detect the termination of connection from the standby server.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: equivalent to "replication_timeout" on standby server

От
Fujii Masao
Дата:
On Fri, Nov 4, 2011 at 10:58 PM, Samba <saasira@gmail.com> wrote:
> although both master(with replication_timeout)  and slave (with tcp timeout
> option in primary_conninfo parameter) closes the connection in quick time
> (based on tcp idle connection  timeout), as of now they do not log such
> information. It would be really helpful if such disconnects are logged with
> appropriate severity so that the problem can identified early and help in
> keeping track of patterns and history of such issues.

Oh, really? Unless I'm missing something, when replication timeout happens,
the following log message would be logged in the master:

    terminating walsender process due to replication timeout

OTOH, something like the following would be logged in the standby:

    could not receive data from WAL stream......

> Presently, neither master nor standby server attempts to resume streaming
> replication when they happen to see each other after some prolonged
> disconnect. It would be better if either master or slave or both the servers
> makes periodic checks to find if the other is reachable and resume the
> replication( if possible, or else log the message that a full sync may be
> required).

The standby periodically tries reconnecting to the master after it detects
the termination of replication connection. So even after prolonged disconnect,
replication can automatically resume.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center