Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

Поиск
Список
Период
Сортировка
От Aidan Van Dyk
Тема Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Дата
Msg-id AANLkTimrKs7TyVaZDb-7v0mio29T63b4SVAxE5+tDqka@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> OK. How about keepalive-like parameters and behaviors?
>
>    replication_keepalives_idle
>    replication_keepalives_interval
>    replication_keepalives_count
>
> The master sends the keepalive packet if replication_keepalives_idle
> elapsed after receiving the last ACK packet including the receive/
> fsync/replay LSNs from the standby. OTOH, the standby sends the
> ACK packet back to the master as soon as receiving the keepalive
> packet.
>
> If the master could not receive the ACK packet for
> replication_keepalives_interval, it repeats sending the keepalive
> packet and receiving the ACK replication_keepalives_count -1
> times. If no ACK packet has finally arrived, the master thinks the
> standby has been dead.

I thought we were using a single TCP session per standby/slave?  So
adding another "KEEPALIVE" into the local buffer side of the TCP
stream isn't going to help a "stuck" one arrive earlier.

You really only have a few situations:

1) Network problems.  Stuffing more stuff into the local buffers isn't
gonig to help get packets from the remote that it would like to send
(I say like to send, because network problems could be on either/both
directions, the remote may or may not have seen our keepalive
requrest)

2) The remote is getting them, and is swamped.  It's not going to get
processing our 2nd keepalive any sooner than processing our 1st.

If a walreceiver reads a "keepalive" request, Just declare that it
must reply immediately.  Then the master config can trust that a
keepalive should be replied to pretty quickly if networks is ok.  TCP
will make it get there "eventually" if it's a bad network, and the
admins have set it be very network tolerant.

The ACK might report that the salve is hopelessly behind on
fsyncing/applying it's WAL, but that's good too.  At least then the
ACK comes back, and the master knows the slave is still churning away
on the last batch of WAL, and can decide if it wants to think the
slave is too far behind and boot it out.


--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Следующее
От: Tom Lane
Дата:
Сообщение: Re: bug in SignalSomeChildren