Обсуждение: Behavior difference for walsender and walreceiver for n/w breakdown case

Поиск
Список
Период
Сортировка

Behavior difference for walsender and walreceiver for n/w breakdown case

От
Amit Kapila
Дата:

I have observed that currently incase there is a network break between master and standby, walsender process gets terminated immediately, however
walreceiver detects the breakage after long time.
The main reason I could see is due to replication_timeout configuration parameter, walsender checks for replication_timeout, if there is no communication from other side till replication_timeout time it detects it as a condition to terminate the walsender.
However there is no such mechanism in walreceiver, it fails during send socket call from XLogWalRcvSendReply() after calling the same many times as internally might be in send until the sockets internal buffer is full, it keeps accumulating even if other side recv has not received the data.

Shouldn't in walreceiver, there be a mechanism so that it can detect n/w failure sooner?


Basic Steps to observe above behavior
1. Both master and standby machine are connected normally,
2. then you use the command: ifconfig ip down; make the network card of master and standby down,
Observation
master can detect connect abnormal, but the standby can't detect connect abnormal and show a connected channel long time.

With Regards,

Amit Kapila