Re: Synch Rep for CommitFest 2009-07

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Synch Rep for CommitFest 2009-07
Дата
Msg-id 3f0b79eb0907170215n5765442bw4ea99b031199ba5b@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synch Rep for CommitFest 2009-07  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
Hi,

On Thu, Jul 16, 2009 at 6:00 PM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
> The archive should not normally contain partial XLOG files, only if you
> manually copy one there after primary has crashed. So I don't think
> that's something we need to support.

You are right. And, if the last valid record exists in the middle of
the restored
file (e.g. by XLOG_SWITCH record), <begin> should indicate the head of the
next file.

> Hmm. You only need the timeline history file if the base backup was
> taken in an earlier timeline. That situation would only arise if you
> (manually) take a base backup, restore to a server (which creates a new
> timeline), and then create a slave against that server. At least in the
> 1st phase, I think we can assume that the standby has access to the same
> archive, and will find the history file from there. If not, throw an
> error. We can add more bells and whistles later.

Okey, I hold the problem about a history file for possible later consideration.

> As the patch stands, new walsender connections are refused when one is
> active already. What if the walsender connection is in a zombie state?
> For example, it's trying to send WAL to the slave, but the network
> connection is down, and the packets are going to a black hole. It will
> take a while for the TCP layer to declare the connection dead, and close
> the socket. During that time, you can't connect a new slave to the
> master, or the same slave using a better network connection.
>
> The most robust way to fix that is to support multiple walsenders. The
> zombie walsender can take its time to die, while the new walsender
> serves the new connection. You could tweak SO_TIMEOUTs and stuff, but
> even then the standby process could be in some weird hung state.
>
> And of course, when we get around to add support for multiple slaves,
> we'll have to do that anyway. Better get it right to begin with.

Thanks for the detailed description! I was thinking that a new GUC
replication_timeout and some keepalive parameters would be enough to
help with such trouble. But I agree that the support multiple walsenders
is better solution, so I'll try this problem.

> Even in synchronous replication, a backend should only have to wait when
> it commits. You would only see the difference with very large
> transactions that write more WAL than fits in wal_buffers, though, like
> data loading.

That's right.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikhil Sontakke
Дата:
Сообщение: Re: [PATCH] DefaultACLs
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Synch Rep for CommitFest 2009-07