Re: Synch Rep for CommitFest 2009-07
От | Fujii Masao |
---|---|
Тема | Re: Synch Rep for CommitFest 2009-07 |
Дата | |
Msg-id | 3f0b79eb0907170215n5765442bw4ea99b031199ba5b@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Synch Rep for CommitFest 2009-07 (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Список | pgsql-hackers |
Hi, On Thu, Jul 16, 2009 at 6:00 PM, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: > The archive should not normally contain partial XLOG files, only if you > manually copy one there after primary has crashed. So I don't think > that's something we need to support. You are right. And, if the last valid record exists in the middle of the restored file (e.g. by XLOG_SWITCH record), <begin> should indicate the head of the next file. > Hmm. You only need the timeline history file if the base backup was > taken in an earlier timeline. That situation would only arise if you > (manually) take a base backup, restore to a server (which creates a new > timeline), and then create a slave against that server. At least in the > 1st phase, I think we can assume that the standby has access to the same > archive, and will find the history file from there. If not, throw an > error. We can add more bells and whistles later. Okey, I hold the problem about a history file for possible later consideration. > As the patch stands, new walsender connections are refused when one is > active already. What if the walsender connection is in a zombie state? > For example, it's trying to send WAL to the slave, but the network > connection is down, and the packets are going to a black hole. It will > take a while for the TCP layer to declare the connection dead, and close > the socket. During that time, you can't connect a new slave to the > master, or the same slave using a better network connection. > > The most robust way to fix that is to support multiple walsenders. The > zombie walsender can take its time to die, while the new walsender > serves the new connection. You could tweak SO_TIMEOUTs and stuff, but > even then the standby process could be in some weird hung state. > > And of course, when we get around to add support for multiple slaves, > we'll have to do that anyway. Better get it right to begin with. Thanks for the detailed description! I was thinking that a new GUC replication_timeout and some keepalive parameters would be enough to help with such trouble. But I agree that the support multiple walsenders is better solution, so I'll try this problem. > Even in synchronous replication, a backend should only have to wait when > it commits. You would only see the difference with very large > transactions that write more WAL than fits in wal_buffers, though, like > data loading. That's right. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: