Re: Streaming Replication patch for CommitFest 2009-09
От | Fujii Masao |
---|---|
Тема | Re: Streaming Replication patch for CommitFest 2009-09 |
Дата | |
Msg-id | 3f0b79eb0909240120t6a7d56b3gfc7119af8bcc287b@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Streaming Replication patch for CommitFest 2009-09 (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: Streaming Replication patch for CommitFest 2009-09
(Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: Streaming Replication patch for CommitFest 2009-09 (Fujii Masao <masao.fujii@gmail.com>) |
Список | pgsql-hackers |
Hi, Sorry for the delay. On Mon, Sep 21, 2009 at 4:51 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Having gone through the patch now in more detail, I think it's in pretty > good shape. I'm happy with the overall design, except that I haven't > been able to make up my mind if walreceiver should indeed be a > stand-alone program as discussed, or a postmaster child process as in > the patch you submitted. Putting that question aside for a moment, > here's some minor things, in no particular order: Thanks for the comments. > - The async API in PQgetXLogData is quite different from the other > commands. It's close to the API from PQgetCopyData(), but doesn't return > a malloc'd buffer like PQgetCopyData does. I presume that's to optimize > away the extra memcpy step? Yes. This is for preventing extra memcpy. > I don't think that's really necessary, I > don't recall any complaints about that in PQgetCopyData(), and if it > does become an issue, it could be optimized away by mallocing the buffer > first and reading directly to that. OK. I'll change PQgetXLogData() to return a malloc'd buffer, and will remove PQmarkConsumed(). > - Can we avoid sprinkling XLogStreamingAllowed() calls to places where > we check if WAL-logging is required (nbtsort.c, copy.c etc.). I think we > need a new macro to encapsulate (XLogArchivingActive() || > XLogStreamingAllowed()). Yes. I'll introduce a new macro XLogIsNeeded() which encapsulates (XLogArchivingActive() || XLogStreamingAllowed()). > - Is O_DIRECT ever a good idea in walreceiver? If it's really direct and > doesn't get cached, the startup process will need to read from disk. Good point. I agree that O_DIRECT is useless if walreceiver works with the startup process. It might be useful if only stand-alone walreceiver program is executed in the standby. > - Can we replace read/write_conninfo with just a long-enough field in > shared mem? Would be simpler. (this is moot if we go with the > stand-alone walreceiver program and pass it as a command-line argument) Yes, if we can decide the length of conninfo. Since I could not decide that, I used read/write_conninfo to tell walreceiver the conninfo. Is the fixed size 1024B enough for conninfo? > - walreceiver shouldn't die on connection error, just to be restarted by > startup process. Can we add error handling a la bgwriter and have a > retry loop within walreceiver? (again, if we go with a stand-alone > walreceiver program, it's probably better to have startup process > responsible to restart walreceiver, as it is now) Error handling a la bgwriter? You mean that PG_exception_stack should be set up to handle an ERROR exception? Anyway, I'll change walreceiver to retry connecting to the primary after an error occurs in PQstartXLogStreaming()/PQgetXLogData()/ PQputXLogRecPtr(). Should we set an upper limit of the number of the retries? > - pq_wait in backend waits until you can read or write at least 1 byte. > There is no guarantee that you can send or read the whole message > without blocking. We'd have to put the socket in non-blocking mode for > that. I'm not sure what the implications of this are. Umm... AFAIK, poll and select guarantee that at least the subsequent recv will not be blocked. If there is only 1 byte available in the buffer, recv would read that 1 byte and return immediately. I'm not sure if send will get stuck even after poll is passed. In my environment (RHEL5), send seems not to be blocked. > - we should include system_identifier somewhere in the replication > startup handshake. Otherwise you can connect to server from a different > system and have logs shipped, if they happen to be roughly at the same > point in WAL. Replay will almost certainly fail, but we should error > earlier. Agreed. I'll do that. > - I know I said we should have just asynchronous replication at first, > but looking ahead, how would you do synchronous? As the previous patch did, I'm going to make walsender read the latest XLOG from wal_buffers, introduce the signaling between a backend and walsender, and keep a backend waiting until the specified XLOG has been written or fsynced in the standby. > What kind of signaling > is needed between walreceiver and startup process for that? I was thinking that the synchronization mode which a client waits until XLOG has been applied is not necessary right now, so no signaling is also not required between those processes yet. But, HS requires this capability? > - 'replication' shouldn't be a real database. Agreed. I'll remove that. > I found the paging logic in walsender confusing, and didn't like the > idea that walsender needs to set the XLOGSTREAM_END_SEG flag. Surely > walreceiver knows how to split the WAL into files without such a flag. I > reworked that logic, I think it's easier to understand now. I kept the > support for the flag in libpq and the protocol for now, but it should be > removed too, or repurposed to indicate that pg_switch_xlog() was done in > the master. I've pushed that to 'replication-orig' branch in my git > repository, attached is the same as a diff against your SR_0914.patch. > > I need a break from this patch, so I'll take a closer look at Simon's > hot standby now. Meanwhile, can you work on the above items and submit a > new version, please? Yeah, sure. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Следующее
От: ningДата:
Сообщение: "BEGIN TRANSACTION" and "START TRANSACTION": different error handling