Re: BUG #7534: walreceiver takes long time to detect n/w breakdown

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Дата
Msg-id 003a01cd9164$4e199930$ea4ccb90$@kapila@huawei.com
обсуждение исходный текст
Ответ на Re: BUG #7534: walreceiver takes long time to detect n/w breakdown  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-bugs
On Wednesday, September 12, 2012 10:12 PM Magnus Hagander wrote:
On Wed, Sep 12, 2012 at 1:54 PM,  <amit.kapila@huawei.com> wrote:
>> The following bug has been logged on the website:
>
>> Bug reference:      7534
>> Logged by:          Amit Kapila
>> Email address:      amit.kapila@huawei.com
>> PostgreSQL version: 9.2.0
>> Operating system:   Suse 10
>> Description:
>
>> 1. Both master and standby machine are connected normally,
>> 2. then you use the command: ifconfig ip down; make the network card of
>> master and standby down,
>
>> Observation
>> master can detect connect abnormal, but the standby can't detect connect
>> abnormal and show a connected channel long time.

> The master will detect it quicker, because it will get an error when
> it tries to send something.

> But the standby should detect it either when sending the feedback
> message (what's your wal_receiver_status_interval set to?) or when
> ythe kernel does (have you configured the tcp keepalive on the slave
> somehow?)
  wal_receiver_status_interval - 10s (we have not changed this. Used as
default).
  We have tried by using tcp keepalive as well, it might not be able to
detect as receiver is anyway trying to send
  Receiver status.
  It fails during send socket call from XLogWalRcvSendReply() after calling
the same many times as internally might be in
  send() until the sockets internal buffer is full, it keeps accumulating
even if other side recv has not received the
  data.
  Also in walsender, it is failing to replication_timeout parameter not due
to send failure.
  So in my opinion, the full-proof solution would be to have mechanism
(replication_timeout) similar to walsender in
  walreceiver.

> Oh, and what do you actually mean by "long time"?
  15-20 mins.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

В списке pgsql-bugs по дате отправления:

Предыдущее
От: ldm@apartia.fr
Дата:
Сообщение: BUG #7535: ERROR: variable not found in subplan target list
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown