Re: pg_basebackup may fail to send feedbacks.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: pg_basebackup may fail to send feedbacks. |
Дата | |
Msg-id | 20150310.172922.141408613.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: pg_basebackup may fail to send feedbacks. (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: pg_basebackup may fail to send feedbacks.
(Fujii Masao <masao.fujii@gmail.com>)
|
Список | pgsql-hackers |
Hi, the attached is the v5 patch. - Do feGetCurrentTimestamp() only when necessary. - Rebased to current master At Mon, 2 Mar 2015 20:21:36 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwG1tJHpG03oZgwoKxt5wYD5v4S3HuTgSx7RotBhHnjwJw@mail.gmail.com> > On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > Hello, the attached is the v4 patch that checks feedback timing > > every WAL segments boundary. .. > > I said that checking whether to send feedback every boundary > > between WAL segments seemed too long but after some rethinking, I > > changed my mind. > > > > - The most large possible delay source in the busy-receive loop > > is fsyncing at closing WAL segment file just written, this can > > take several seconds. Freezing longer than the timeout > > interval could not be saved and is not worth saving anyway. > > > > - 16M bytes-disk-writes intervals between gettimeofday() seems > > to be gentle enough even on platforms where gettimeofday() is > > rather heavy. > > Sounds reasonable to me. > > So we don't need to address the problem in walreceiver side so proactively > because it tries to send the feedback every after flushing the WAL records. > IOW, the problem you observed is less likely to happen. Right? > > + now = feGetCurrentTimestamp(); > + if (standby_message_timeout > 0 && Surely it would hardly happen, especially on ordinary configuration. However, the continuous receiving of the replication stream is a quite normal behavior even if hardly happens. So the receiver should guarantee the feedbacks to be sent by logic as long as it is working normally, as long as the code for the special case won't be too large and won't take too long time:). The current walreceiver doesn't look trying to send feedbacks after flushing every wal records. HandleCopyStream will restlessly process the records in a gapless replication stream, sending feedback only when keepalive packet arrives. It is the fact that the response to the keepalive would help greatly but it is not what the keepalives are for. It is intended to be used to confirm if a silent receiver is still alive. Even with this fix, the case that one write or flush takes longer time than the feedback interval cannot be saved, but it would be ok since it should be regarded as disorder. > Minor comment: should feGetCurrentTimestamp() be called after the check of > standby_message_timeout > 0, to avoid unnecessary calls of that? Ah, you're right. I'll fixed it. > ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len, > XLogRecPtr *blockpos, uint32 timeline, > char *basedir, stream_stop_callback stream_stop, > - char *partial_suffix, bool mark_done) > + char *partial_suffix, bool mark_done, > + int standby_message_timeout, int64 *last_status) > > Maybe it's time to refactor this ugly coding (i.e., currently many arguments > need to be given to each functions. Looks ugly)... I'm increasing the ugliness:( XLog stuff seems to need to share many states widely to work. But the parameter list of the function looks to be bearable to this extent, to me:). regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления: