Re: pg_basebackup may fail to send feedbacks.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Re: pg_basebackup may fail to send feedbacks. |
Дата | |
Msg-id | 20150218.173441.13047480.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: pg_basebackup may fail to send feedbacks. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Ответы |
Re: pg_basebackup may fail to send feedbacks.
|
Список | pgsql-hackers |
Hello, this is the last patch for pg_basebackup/pg_receivexlog on master (9.5). Preor versions don't have this issue. 4. basebackup_reply_fix_mst_v2.patch receivelog.c patch applyable on master. This is based on the same design with walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday(). regards, At Tue, 17 Feb 2015 19:45:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20150217.194519.58137941.horiguchi.kyotaro@lab.ntt.co.jp> > Thank you for the comment. Three new patches are attached. I > forgot to give a revision number on the previous patch, but I > think this is the 2nd version. > > 1. walrcv_reply_fix_94_v2.patch > Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE > > 2. walrcv_reply_fix_92_v2.patch > Walreceiver patch applyable on REL9_2_STABLE > > 3. walrcv_reply_fix_91_v2.patch > Walreceiver patch applyable on REL9_1_STABLE > > > At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwHZ_4yWyokA+5ks9s_698adjH8+0haMCSC9WYFh37GY7g@mail.gmail.com> > > On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > >> Introduce the maximum number of records that we can receive and > > >> process between feedbacks? For example, we can change > > >> XLogWalRcvSendHSFeedback() so that it's called per at least > > >> 8 records. I'm not sure what number is good, though... > > > > > > At the beginning of the "while(len > 0){if (len > 0){" block, > > > last_recv_timestamp is always kept up to date (by using > > > gettimeotda():). So we can use the variable instead of > > > gettimeofday() in my previous proposal. > > > > Yes, but something like last_recv_timestamp doesn't exist in > > REL9_1_STABLE. So you don't think that your proposed fix > > should be back-patched to all supported versions. Right? > > Back to 9.3 has the loop with the same structure. 9.2 is in a bit > differenct shape but looks to have the same issue. 9.1 and 9.0 > has the same staps with 9.2 but 9.0 doesn't has wal sender > timeout and 9.1 does not have a variable having current time. > > 9.3 and later: the previous patch works, but revised as your > comment. > > 9.2: The time of the last reply is stored in the file-scope > variable reply_message, and the current time is stored in > walrcv->lastMsgReceiptTime. The timeout is determined using > theses variables. > > 9.1: walrcv doesn't have lastMsgReceiptTime. The current time > should be acquired explicitly in the innermost loop. I tried > calling gettimeofday() once per several loops. The skip count > is determined by anticipated worst duration to process a wal > record and wal_receiver_status_interval. However, I doubt the > necessity to do so.. The value of the worst duration is simply > a random guess. > > 9.0: No point to do this kind of fix. > > > > > The start time of the timeout could be last_recv_timestamp at the > > > beginning of the while loop, since it is a bit earlier than the > > > actual time at the point. > > > > Sounds strange to me. I think that last_recv_timestamp should be > > compared with the time when the last feedback message was sent, > > e.g., maybe you can expose sendTime in XLogWalRcvSendReplay() > > as global variable, and use it to compare with last_recv_timestamp. > > You're right to do exactly right thing, but, as you mentioned, > exposing sendTime is requied to do so. The solution I proposed is > the second-best assuming that wal_sender_timeout is recommended > to be longer enough (several times longer) than > wal_receiver_status_interval. If no one minds to expose sendTime, > it is the best solution. The attached patch does it. > > # The added comment in the patch was wrong, though. > > > When we get out of the WAL receive loop due to the timeout check > > which your patch added, XLogWalRcvFlush() is always executed. > > I don't think that current WAL file needs to be flushed at that time. > > Thank you for pointing it out. Run XLogWalRcvSendReply(force = > true) immediately instead of breaking from the loop. > > > > The another solution would be using > > > RegisterTimeout/enable_timeout_after and it seemed to be work for > > > me. It can throw out the time counting stuff in the loop we are > > > talking about and that of XLogWalRcvSendHSFeedback and > > > XLogWalRcvSendReply, but it might be a bit too large for the > > > gain. > > > > Yes, sounds overkill. > > However, gettimeofday() for each recv is also a overkill for most > cases. I'll post the patches for receivelog.c based on the patch > for 9.1 wal receiver. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]