Обсуждение: Fast Primary shutdown only after wal_sender_timeout

Поиск
Список
Период
Сортировка

Fast Primary shutdown only after wal_sender_timeout

От
Michael Banck
Дата:
Hi,

I'm doing some failover tests on a 2-node streaming replication cluster
and shutting down the primary with 'pg_ctl -m fast' results in a timeout
of 50-60 seconds, pg_ctl returns only after the latter message:

<71804----2016-10-28 10:01:37.833 CEST-5808e5a4.1187c-transid:0>LOG:
database system is shut down
<62866-replicator-[unbekannt]-10.1.181.30(39609)-2016-10-28 10:02:27.963
CEST-581305b9.f592-transid:0>LOG:  terminating walsender process due to
replication timeout

If I set wal_sender_timeout (it has been commented out so far, i.e. set
to 60 seconds) to something smaller like 10 seconds, I get a 10 second
delay. There are no users logged into either primary or standby, nor is
there any other activity. The hot_standby_feedback parameter is set to
'on'.

I would assume that the replication connection is shut down along with
the backends, but this seems to be not the case, is this expected?

This is on 9.5.4, self-compiled.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




Re: Fast Primary shutdown only after wal_sender_timeout

От
Jehan-Guillaume de Rorthais
Дата:

Le 28 octobre 2016 12:40:24 GMT+02:00, Michael Banck <michael.banck@credativ.de> a écrit :
>Hi,
>
>I'm doing some failover tests on a 2-node streaming replication cluster
>and shutting down the primary with 'pg_ctl -m fast' results in a
>timeout
>of 50-60 seconds, pg_ctl returns only after the latter message:
>
><71804----2016-10-28 10:01:37.833 CEST-5808e5a4.1187c-transid:0>LOG:
>database system is shut down
><62866-replicator-[unbekannt]-10.1.181.30(39609)-2016-10-28
>10:02:27.963
>CEST-581305b9.f592-transid:0>LOG:  terminating walsender process due to
>replication timeout
>
>If I set wal_sender_timeout (it has been commented out so far, i.e. set
>to 60 seconds) to something smaller like 10 seconds, I get a 10 second
>delay. There are no users logged into either primary or standby, nor is
>there any other activity. The hot_standby_feedback parameter is set to
>'on'.
>
>I would assume that the replication connection is shut down along with
>the backends, but this seems to be not the case, is this expected?

Yes, in normal situation. But the master ensure everything has been replicated to the connected standby before shutting
downthe connections.  

It it hits wal_sender_timeout, maybe you have a badly disconnected standby not detected by the master? Maybe a
secondaryIP address moved away from the master before its shutdown ? 
>
>This is on 9.5.4, self-compiled.
>
>
>Michael

/ioguix