Обсуждение: pg_basebackup may fail to send feedbacks.

Поиск
Список
Период
Сортировка

pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hello, I received an report that pg_basebackup with "-X stream"
always exits with the following error.

> pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.

The walsender had been terminated by replication timeout a bit
before the above message.

> LOG:  terminating walsender process due to replication timeout

====

I digged into this and found that a accumulation of delay in
receiving xlog stream can cause such an error. This hardly occurs
on an ordinary environment but this would be caused by temporary
(or permanent) heavy load on the receiver-side machine. A virtual
machine environment could raises the chance, I suppose.

In HandleCopyStream(), the feedbacks are sent only after breaks
of the xlog stream, so continuous flow of xlog stream prevents it
from being sent with expected intervals. walsender sends
keepalive message for such a case, but it arrives with a long
delay being caught by the congestion of the stream, but this is
not a problem because the keepalive is intended to be sent while
idle.

I think that the status feedback should be sent whenever
standby_message_timeout has elapsed just after (or before) an
incoming message is processed. The seemingly most straightforward
way to fix this is breaking the innner-loop (while(r != 0)) if
the time to feedback comes as the attached patch #1
(0001-Make-sure-to-send-...).

What do you thing about this?

regards,


=====
- How to reproduce the situation.

As mentioned before, this hardly occurs on ordinary
environment. But simulating the heavy load by inserting a delay
in HandleCopyStream() effectively let the error occur.

With the attached patch #2(insert_interval.diff), the following
operation let us see the situation. The walsender reports timeout
although the stream processing is delayed but running steadily.

postgresql.conf:
> wal_level = hot_standby
> max_wal_senders = 2
> wal_sender_timeout = 20s  # * 2 of default of standby_message_timeout

Terminal1$ psql postgres
postgres=# CREATE TABLE t1 (a text);
postgres=# ALTER TABLE t1 ALTER COLUMN a SET STORAGE EXTERNAL;
postgres=# INSERT INTO t1 (SELECT repeat('x', 10000) FROM generate_series(0, 99999)); -- about 1GB
postgres=# ^D

Terminal1$ pgbench -i postgres
Terminal1$ pgbench -T 600 -h localhost postgres

Terminal2$ pg_basebackupo -r 32k -X stream -D data -h localhost

... in about a couple of minutes on my environment..

Terminal1:
> LOG:  terminating walsender process due to replication timeout

... after another couple of minutes.

Terminal2:
> pg_basebackup: could not receive data from WAL stream: server closed...

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
I'm very sorry for confused report. The problem found in 9.4.0
and the diagnosis was mistakenly done on master.

9.4.0 has no problem of feedback delay caused by slow xlog
receiving on pg_basebackup mentioned in the previous mail. But
the current master still has this problem.

regards,

At Mon, 02 Feb 2015 13:48:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20150202.134834.100672846.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello, I received an report that pg_basebackup with "-X stream"
> always exits with the following error.
> 
> > pg_basebackup: could not receive data from WAL stream: server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> 
> The walsender had been terminated by replication timeout a bit
> before the above message.
> 
> > LOG:  terminating walsender process due to replication timeout
> 
> ====
> 
> I digged into this and found that a accumulation of delay in
> receiving xlog stream can cause such an error. This hardly occurs
> on an ordinary environment but this would be caused by temporary
> (or permanent) heavy load on the receiver-side machine. A virtual
> machine environment could raises the chance, I suppose.
> 
> In HandleCopyStream(), the feedbacks are sent only after breaks
> of the xlog stream, so continuous flow of xlog stream prevents it
> from being sent with expected intervals. walsender sends
> keepalive message for such a case, but it arrives with a long
> delay being caught by the congestion of the stream, but this is
> not a problem because the keepalive is intended to be sent while
> idle.
> 
> I think that the status feedback should be sent whenever
> standby_message_timeout has elapsed just after (or before) an
> incoming message is processed. The seemingly most straightforward
> way to fix this is breaking the innner-loop (while(r != 0)) if
> the time to feedback comes as the attached patch #1
> (0001-Make-sure-to-send-...).
> 
> What do you thing about this?
> 
> regards,
> 
> 
> =====
> - How to reproduce the situation.
> 
> As mentioned before, this hardly occurs on ordinary
> environment. But simulating the heavy load by inserting a delay
> in HandleCopyStream() effectively let the error occur.
> 
> With the attached patch #2(insert_interval.diff), the following
> operation let us see the situation. The walsender reports timeout
> although the stream processing is delayed but running steadily.
> 
> postgresql.conf:
> > wal_level = hot_standby
> > max_wal_senders = 2
> > wal_sender_timeout = 20s  # * 2 of default of standby_message_timeout
> 
> Terminal1$ psql postgres
> postgres=# CREATE TABLE t1 (a text);
> postgres=# ALTER TABLE t1 ALTER COLUMN a SET STORAGE EXTERNAL;
> postgres=# INSERT INTO t1 (SELECT repeat('x', 10000) FROM generate_series(0, 99999)); -- about 1GB
> postgres=# ^D
> 
> Terminal1$ pgbench -i postgres
> Terminal1$ pgbench -T 600 -h localhost postgres
> 
> Terminal2$ pg_basebackupo -r 32k -X stream -D data -h localhost
> 
> ... in about a couple of minutes on my environment..
> 
> Terminal1:
> > LOG:  terminating walsender process due to replication timeout
> 
> ... after another couple of minutes.
> 
> Terminal2:
> > pg_basebackup: could not receive data from WAL stream: server closed...

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I'm very sorry for confused report. The problem found in 9.4.0
> and the diagnosis was mistakenly done on master.
>
> 9.4.0 has no problem of feedback delay caused by slow xlog
> receiving on pg_basebackup mentioned in the previous mail. But
> the current master still has this problem.

Seems walreceiver has the same problem. No?

Regards,

-- 
Fujii Masao



Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwGudGCMnHZinkd37i+JijDkruEcrea1NCRs1MMtE3rOFQ@mail.gmail.com>
> On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > I'm very sorry for confused report. The problem found in 9.4.0
> > and the diagnosis was mistakenly done on master.
> >
> > 9.4.0 has no problem of feedback delay caused by slow xlog
> > receiving on pg_basebackup mentioned in the previous mail. But
> > the current master still has this problem.
> 
> Seems walreceiver has the same problem. No?

pg_receivexlog.c would have the same problem since it uses the
same function with pg_basebackup.c.

The correspondent of HandleCopyStream in wansender is
WalReceiverMain, and it doesn't seem to have the same kind of
loop shown below. It seems to surely send feedback per one
record.

|   r = stream_reader();
|   while (r > 0)
|   {
|      ... wal record processing stuff without sending feedback..
|      r = stream_reader();
|   }

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Thu, Feb 5, 2015 at 10:20 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwGudGCMnHZinkd37i+JijDkruEcrea1NCRs1MMtE3rOFQ@mail.gmail.com>
>> On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> > I'm very sorry for confused report. The problem found in 9.4.0
>> > and the diagnosis was mistakenly done on master.
>> >
>> > 9.4.0 has no problem of feedback delay caused by slow xlog
>> > receiving on pg_basebackup mentioned in the previous mail. But
>> > the current master still has this problem.
>>
>> Seems walreceiver has the same problem. No?
>
> pg_receivexlog.c would have the same problem since it uses the
> same function with pg_basebackup.c.
>
> The correspondent of HandleCopyStream in wansender is
> WalReceiverMain, and it doesn't seem to have the same kind of
> loop shown below. It seems to surely send feedback per one
> record.
>
> |   r = stream_reader();
> |   while (r > 0)
> |   {
> |      ... wal record processing stuff without sending feedback..
> |      r = stream_reader();
> |   }

WalReceiverMain() has the similar code as follows.
   len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);   if (len != 0)   {       for (;;)       {           if (len > 0)
     {               ....               len = walrcv_receive(0, &buf);           }   }
 

Regards,

-- 
Fujii Masao



Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Sorry, I misunderstood that.

> > At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwGudGCMnHZinkd37i+JijDkruEcrea1NCRs1MMtE3rOFQ@mail.gmail.com>
> >> On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> > I'm very sorry for confused report. The problem found in 9.4.0
> >> > and the diagnosis was mistakenly done on master.
> >> >
> >> > 9.4.0 has no problem of feedback delay caused by slow xlog
> >> > receiving on pg_basebackup mentioned in the previous mail. But
> >> > the current master still has this problem.
> >>
> >> Seems walreceiver has the same problem. No?
> >
> > pg_receivexlog.c would have the same problem since it uses the
> > same function with pg_basebackup.c.
> >
> > The correspondent of HandleCopyStream in wansender is
> > WalReceiverMain, and it doesn't seem to have the same kind of
> > loop shown below. It seems to surely send feedback per one
> > record.
> >
> > |   r = stream_reader();
> > |   while (r > 0)
> > |   {
> > |      ... wal record processing stuff without sending feedback..
> > |      r = stream_reader();
> > |   }
> 
> WalReceiverMain() has the similar code as follows.
> 
>     len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
>     if (len != 0)
>     {
>         for (;;)
>         {
>             if (len > 0)
>             {
>                 ....
>                 len = walrcv_receive(0, &buf);
>             }
>     }

The loop seems a bit different but eventually the same about this
discussion.

408>     len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
409>     if (len != 0)
410>     {
415>         for (;;)
416>         {
417>             if (len > 0)
418>             {
425>                XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
426>             }
427-438>         else {break;}
439>             len = walrcv_receive(0, &buf);
440>         }
441>     }

XLogWalRcvProcessMsg doesn't send feedback when processing
'w'=WAL record packet. So the same thing as pg_basebackup and
pg_receivexlog will occur on walsender.

Exiting the for(;;) loop by TimestampDifferenceExceeds just
before line 439 is an equivalent measure to I poposed for
receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there
is seemingly simpler (but I feel a bit uncomfortable for the
latter)

Or other measures?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Fri, Feb 6, 2015 at 3:22 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Sorry, I misunderstood that.
>
>> > At Wed, 4 Feb 2015 19:22:39 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwGudGCMnHZinkd37i+JijDkruEcrea1NCRs1MMtE3rOFQ@mail.gmail.com>
>> >> On Wed, Feb 4, 2015 at 4:58 PM, Kyotaro HORIGUCHI
>> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> >> > I'm very sorry for confused report. The problem found in 9.4.0
>> >> > and the diagnosis was mistakenly done on master.
>> >> >
>> >> > 9.4.0 has no problem of feedback delay caused by slow xlog
>> >> > receiving on pg_basebackup mentioned in the previous mail. But
>> >> > the current master still has this problem.
>> >>
>> >> Seems walreceiver has the same problem. No?
>> >
>> > pg_receivexlog.c would have the same problem since it uses the
>> > same function with pg_basebackup.c.
>> >
>> > The correspondent of HandleCopyStream in wansender is
>> > WalReceiverMain, and it doesn't seem to have the same kind of
>> > loop shown below. It seems to surely send feedback per one
>> > record.
>> >
>> > |   r = stream_reader();
>> > |   while (r > 0)
>> > |   {
>> > |      ... wal record processing stuff without sending feedback..
>> > |      r = stream_reader();
>> > |   }
>>
>> WalReceiverMain() has the similar code as follows.
>>
>>     len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
>>     if (len != 0)
>>     {
>>         for (;;)
>>         {
>>             if (len > 0)
>>             {
>>                 ....
>>                 len = walrcv_receive(0, &buf);
>>             }
>>     }
>
> The loop seems a bit different but eventually the same about this
> discussion.
>
> 408>     len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
> 409>     if (len != 0)
> 410>     {
> 415>         for (;;)
> 416>         {
> 417>             if (len > 0)
> 418>             {
> 425>                XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
> 426>             }
> 427-438>         else {break;}
> 439>             len = walrcv_receive(0, &buf);
> 440>         }
> 441>     }
>
> XLogWalRcvProcessMsg doesn't send feedback when processing
> 'w'=WAL record packet. So the same thing as pg_basebackup and
> pg_receivexlog will occur on walsender.
>
> Exiting the for(;;) loop by TimestampDifferenceExceeds just
> before line 439 is an equivalent measure to I poposed for
> receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there
> is seemingly simpler (but I feel a bit uncomfortable for the
> latter)

I'm concerned about the performance degradation by calling
getimeofday() per a record.

> Or other measures?

Introduce the maximum number of records that we can receive and
process between feedbacks? For example, we can change
XLogWalRcvSendHSFeedback() so that it's called per at least
8 records. I'm not sure what number is good, though...

Regards,

-- 
Fujii Masao



Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hello,

The attached patch is fix for walreciever not using gettimeofday,
and fix for receivelog using it.

> > XLogWalRcvProcessMsg doesn't send feedback when processing
> > 'w'=WAL record packet. So the same thing as pg_basebackup and
> > pg_receivexlog will occur on walsender.
> >
> > Exiting the for(;;) loop by TimestampDifferenceExceeds just
> > before line 439 is an equivalent measure to I poposed for
> > receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there
> > is seemingly simpler (but I feel a bit uncomfortable for the
> > latter)
> 
> I'm concerned about the performance degradation by calling
> getimeofday() per a record.
> 
> > Or other measures?
> 
> Introduce the maximum number of records that we can receive and
> process between feedbacks? For example, we can change
> XLogWalRcvSendHSFeedback() so that it's called per at least
> 8 records. I'm not sure what number is good, though...

At the beginning of the "while(len > 0){if (len > 0){" block,
last_recv_timestamp is always kept up to date (by using
gettimeotda():). So we can use the variable instead of
gettimeofday() in my previous proposal.

The start time of the timeout could be last_recv_timestamp at the
beginning of the while loop, since it is a bit earlier than the
actual time at the point. 

The another solution would be using
RegisterTimeout/enable_timeout_after and it seemed to be work for
me. It can throw out the time counting stuff in the loop we are
talking about and that of XLogWalRcvSendHSFeedback and
XLogWalRcvSendReply, but it might be a bit too large for the
gain.

Considering pg_basebackup/receivexlog, the loop in receivelog.c
does not maintain the time value within it, so I think we are
forced to use feGetCurrentTimeStamp if not using SIGALRM. The wal
reading function simply gets the data from the buffer in memory
for most calls so the gettimeofday for each iteration could slow
the process significantly. SIGALRM seems to be valuable for the
case.

Thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hi,

15 19:48:23 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20150210.194823.219136034.horiguchi.kyotaro@lab.ntt.co.jp>
> Considering pg_basebackup/receivexlog, the loop in receivelog.c
> does not maintain the time value within it, so I think we are
> forced to use feGetCurrentTimeStamp if not using SIGALRM. The wal
> reading function simply gets the data from the buffer in memory
> for most calls so the gettimeofday for each iteration could slow
> the process significantly. SIGALRM seems to be valuable for the
> case.

As a very fancy and ugly sample, the attached patch does this,
although, I'm a bit at a loss how to make this signal things to
be more sober..

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> The attached patch is fix for walreciever not using gettimeofday,
> and fix for receivelog using it.
>
>> > XLogWalRcvProcessMsg doesn't send feedback when processing
>> > 'w'=WAL record packet. So the same thing as pg_basebackup and
>> > pg_receivexlog will occur on walsender.
>> >
>> > Exiting the for(;;) loop by TimestampDifferenceExceeds just
>> > before line 439 is an equivalent measure to I poposed for
>> > receivelog.c, but calling XLogWalRcvSendHSFeedback(false) there
>> > is seemingly simpler (but I feel a bit uncomfortable for the
>> > latter)
>>
>> I'm concerned about the performance degradation by calling
>> getimeofday() per a record.
>>
>> > Or other measures?
>>
>> Introduce the maximum number of records that we can receive and
>> process between feedbacks? For example, we can change
>> XLogWalRcvSendHSFeedback() so that it's called per at least
>> 8 records. I'm not sure what number is good, though...
>
> At the beginning of the "while(len > 0){if (len > 0){" block,
> last_recv_timestamp is always kept up to date (by using
> gettimeotda():). So we can use the variable instead of
> gettimeofday() in my previous proposal.

Yes, but something like last_recv_timestamp doesn't exist in
REL9_1_STABLE. So you don't think that your proposed fix
should be back-patched to all supported versions. Right?

> The start time of the timeout could be last_recv_timestamp at the
> beginning of the while loop, since it is a bit earlier than the
> actual time at the point.

Sounds strange to me. I think that last_recv_timestamp should be
compared with the time when the last feedback message was sent,
e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
as global variable, and use it to compare with last_recv_timestamp.

When we get out of the WAL receive loop due to the timeout check
which your patch added, XLogWalRcvFlush() is always executed.
I don't think that current WAL file needs to be flushed at that time.

> The another solution would be using
> RegisterTimeout/enable_timeout_after and it seemed to be work for
> me. It can throw out the time counting stuff in the loop we are
> talking about and that of XLogWalRcvSendHSFeedback and
> XLogWalRcvSendReply, but it might be a bit too large for the
> gain.

Yes, sounds overkill.

Regards,

-- 
Fujii Masao



Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Thank you for the comment. Three new patches are attached. I
forgot to give a revision number on the previous patch, but I
think this is the 2nd version.

1. walrcv_reply_fix_94_v2.patch  Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE

2. walrcv_reply_fix_92_v2.patch  Walreceiver patch applyable on REL9_2_STABLE

3. walrcv_reply_fix_91_v2.patch  Walreceiver patch applyable on REL9_1_STABLE


At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwHZ_4yWyokA+5ks9s_698adjH8+0haMCSC9WYFh37GY7g@mail.gmail.com>
> On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> Introduce the maximum number of records that we can receive and
> >> process between feedbacks? For example, we can change
> >> XLogWalRcvSendHSFeedback() so that it's called per at least
> >> 8 records. I'm not sure what number is good, though...
> >
> > At the beginning of the "while(len > 0){if (len > 0){" block,
> > last_recv_timestamp is always kept up to date (by using
> > gettimeotda():). So we can use the variable instead of
> > gettimeofday() in my previous proposal.
> 
> Yes, but something like last_recv_timestamp doesn't exist in
> REL9_1_STABLE. So you don't think that your proposed fix
> should be back-patched to all supported versions. Right?

Back to 9.3 has the loop with the same structure. 9.2 is in a bit
differenct shape but looks to have the same issue. 9.1 and 9.0
has the same staps with 9.2 but 9.0 doesn't has wal sender
timeout and 9.1 does not have a variable having current time.

9.3 and later: the previous patch works, but revised as your comment.

9.2: The time of the last reply is stored in the file-scope variable reply_message, and the current time is stored in
walrcv->lastMsgReceiptTime.The timeout is determined using theses variables.
 

9.1: walrcv doesn't have lastMsgReceiptTime. The current time should be acquired explicitly in the innermost loop. I
triedcalling gettimeofday() once per several loops. The skip count is determined by anticipated worst duration to
processa wal record and wal_receiver_status_interval. However, I doubt the necessity to do so.. The value of the worst
durationis simply a random guess.
 

9.0: No point to do this kind of fix.


> > The start time of the timeout could be last_recv_timestamp at the
> > beginning of the while loop, since it is a bit earlier than the
> > actual time at the point.
> 
> Sounds strange to me. I think that last_recv_timestamp should be
> compared with the time when the last feedback message was sent,
> e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
> as global variable, and use it to compare with last_recv_timestamp.

You're right to do exactly right thing, but, as you mentioned,
exposing sendTime is requied to do so. The solution I proposed is
the second-best assuming that wal_sender_timeout is recommended
to be longer enough (several times longer) than
wal_receiver_status_interval. If no one minds to expose sendTime,
it is the best solution. The attached patch does it.

# The added comment in the patch was wrong, though.

> When we get out of the WAL receive loop due to the timeout check
> which your patch added, XLogWalRcvFlush() is always executed.
> I don't think that current WAL file needs to be flushed at that time.

Thank you for pointing it out. Run XLogWalRcvSendReply(force =
true) immediately instead of breaking from the loop.

> > The another solution would be using
> > RegisterTimeout/enable_timeout_after and it seemed to be work for
> > me. It can throw out the time counting stuff in the loop we are
> > talking about and that of XLogWalRcvSendHSFeedback and
> > XLogWalRcvSendReply, but it might be a bit too large for the
> > gain.
> 
> Yes, sounds overkill.

However, gettimeofday() for each recv is also a overkill for most
cases. I'll post the patches for receivelog.c based on the patch
for 9.1 wal receiver.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hello, this is the last patch for pg_basebackup/pg_receivexlog on
master (9.5). Preor versions don't have this issue.

4. basebackup_reply_fix_mst_v2.patch receivelog.c patch applyable on master.

This is based on the same design with
walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday().

regards,

At Tue, 17 Feb 2015 19:45:19 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20150217.194519.58137941.horiguchi.kyotaro@lab.ntt.co.jp>
> Thank you for the comment. Three new patches are attached. I
> forgot to give a revision number on the previous patch, but I
> think this is the 2nd version.
> 
> 1. walrcv_reply_fix_94_v2.patch
>    Walreceiver patch applyable on master/REL9_4_STBLE/REL9_3_STABLE
> 
> 2. walrcv_reply_fix_92_v2.patch
>    Walreceiver patch applyable on REL9_2_STABLE
> 
> 3. walrcv_reply_fix_91_v2.patch
>    Walreceiver patch applyable on REL9_1_STABLE
> 
> 
> At Sat, 14 Feb 2015 03:01:22 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwHZ_4yWyokA+5ks9s_698adjH8+0haMCSC9WYFh37GY7g@mail.gmail.com>
> > On Tue, Feb 10, 2015 at 7:48 PM, Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >> Introduce the maximum number of records that we can receive and
> > >> process between feedbacks? For example, we can change
> > >> XLogWalRcvSendHSFeedback() so that it's called per at least
> > >> 8 records. I'm not sure what number is good, though...
> > >
> > > At the beginning of the "while(len > 0){if (len > 0){" block,
> > > last_recv_timestamp is always kept up to date (by using
> > > gettimeotda():). So we can use the variable instead of
> > > gettimeofday() in my previous proposal.
> > 
> > Yes, but something like last_recv_timestamp doesn't exist in
> > REL9_1_STABLE. So you don't think that your proposed fix
> > should be back-patched to all supported versions. Right?
> 
> Back to 9.3 has the loop with the same structure. 9.2 is in a bit
> differenct shape but looks to have the same issue. 9.1 and 9.0
> has the same staps with 9.2 but 9.0 doesn't has wal sender
> timeout and 9.1 does not have a variable having current time.
> 
> 9.3 and later: the previous patch works, but revised as your
>   comment.
> 
> 9.2: The time of the last reply is stored in the file-scope
>   variable reply_message, and the current time is stored in
>   walrcv->lastMsgReceiptTime. The timeout is determined using
>   theses variables.
> 
> 9.1: walrcv doesn't have lastMsgReceiptTime. The current time
>   should be acquired explicitly in the innermost loop. I tried
>   calling gettimeofday() once per several loops. The skip count
>   is determined by anticipated worst duration to process a wal
>   record and wal_receiver_status_interval. However, I doubt the
>   necessity to do so.. The value of the worst duration is simply
>   a random guess.
> 
> 9.0: No point to do this kind of fix.
> 
> 
> > > The start time of the timeout could be last_recv_timestamp at the
> > > beginning of the while loop, since it is a bit earlier than the
> > > actual time at the point.
> > 
> > Sounds strange to me. I think that last_recv_timestamp should be
> > compared with the time when the last feedback message was sent,
> > e.g., maybe you can expose sendTime in XLogWalRcvSendReplay()
> > as global variable, and use it to compare with last_recv_timestamp.
> 
> You're right to do exactly right thing, but, as you mentioned,
> exposing sendTime is requied to do so. The solution I proposed is
> the second-best assuming that wal_sender_timeout is recommended
> to be longer enough (several times longer) than
> wal_receiver_status_interval. If no one minds to expose sendTime,
> it is the best solution. The attached patch does it.
> 
> # The added comment in the patch was wrong, though.
> 
> > When we get out of the WAL receive loop due to the timeout check
> > which your patch added, XLogWalRcvFlush() is always executed.
> > I don't think that current WAL file needs to be flushed at that time.
> 
> Thank you for pointing it out. Run XLogWalRcvSendReply(force =
> true) immediately instead of breaking from the loop.
> 
> > > The another solution would be using
> > > RegisterTimeout/enable_timeout_after and it seemed to be work for
> > > me. It can throw out the time counting stuff in the loop we are
> > > talking about and that of XLogWalRcvSendHSFeedback and
> > > XLogWalRcvSendReply, but it might be a bit too large for the
> > > gain.
> > 
> > Yes, sounds overkill.
> 
> However, gettimeofday() for each recv is also a overkill for most
> cases. I'll post the patches for receivelog.c based on the patch
> for 9.1 wal receiver.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Wed, Feb 18, 2015 at 5:34 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, this is the last patch for pg_basebackup/pg_receivexlog on
> master (9.5). Preor versions don't have this issue.
>
> 4. basebackup_reply_fix_mst_v2.patch
>   receivelog.c patch applyable on master.
>
> This is based on the same design with
> walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday().

Thanks for updating the patches! But I'm still not sure if the idea depending
on the frequent calls of gettimeofday() for each WAL receive is good or not.
Some users may complain about the performance impact by such frequent calls
and we may want to get rid of them from walreceiver loop in the future.
If we adopt your idea now, I'm afraid that it would tie our hands in that case.

How much impact can such frequent calls of gettimeofday() have on replication
performance? If it's not negligible, probably we should remove them at first
and find out another idea to fix the problem you pointed. ISTM that it's not so
difficult to remove them. Thought? Do you have any numbers which can prove
that such frequent gettimeofday() has only ignorable impact on the performance?

Regards,

-- 
Fujii Masao



Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Thu, 19 Feb 2015 19:22:21 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwGLFLaFrCYcuikkVefNaoEL448TLSJ9oPsvb17v3foZHA@mail.gmail.com>
> On Wed, Feb 18, 2015 at 5:34 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Hello, this is the last patch for pg_basebackup/pg_receivexlog on
> > master (9.5). Preor versions don't have this issue.
> >
> > 4. basebackup_reply_fix_mst_v2.patch
> >   receivelog.c patch applyable on master.
> >
> > This is based on the same design with
> > walrcv_reply_fix_91_v2.patch in the aspect of gettimeofday().
> 
> Thanks for updating the patches! But I'm still not sure if the idea depending
> on the frequent calls of gettimeofday() for each WAL receive is good or not.

Neither do I. Nowadays, linux on AMD64/x64 environment has no
problem even if gettimeofday() called frequently, but Windows
seems to have a problem and I don't know about other platforms.

One possible timing source is LSN.

>  if ((blockpos - last_blockpos) / BLKSZ > 0)
>  {
>      now = feGetCurrentTimestamp();
>      if (feTimestampDifferenceExceeds(last_status, now,
..
>          if (!sendFeedback(conn, blockpos, now, false))
>      }
>  }
>  
>  last_blockpos = blockpos;

But once per PAGESZ can easily be more frequent than once per 10
records and XLOG_SEG_SIZE seems too big.  However I don't see any
bases to determine the frequency between them nor other than the
time itself.

SIGALRM seems to me to be more preferable to keep the main jobe
as fast as possible than introducing a code with no reasonable
basis.


> Some users may complain about the performance impact by such frequent calls
> and we may want to get rid of them from walreceiver loop in the future.
> If we adopt your idea now, I'm afraid that it would tie our hands in that case.
> 
> How much impact can such frequent calls of gettimeofday() have on replication
> performance? If it's not negligible, probably we should remove them at first
> and find out another idea to fix the problem you pointed. ISTM that it's not so
> difficult to remove them. Thought? Do you have any numbers which can prove
> that such frequent gettimeofday() has only ignorable impact on the performance?

The attached patch is 'the more sober' version of SIGLARM patch.

I'll search for the another way after this.

regards,

diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 8caedff..c55af83 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -16,6 +16,8 @@#include <sys/stat.h>#include <unistd.h>
+#include <signal.h>
+#include <sys/time.h>/* local includes */#include "receivelog.h"
@@ -33,6 +35,12 @@ static XLogRecPtr lastFlushPosition = InvalidXLogRecPtr;static bool still_sending = true;        /*
feedbackstill needs to be sent? */
 
+static bool standby_message_timeout_active = false;   /* timeout is active?  */
+static bool standby_message_timeout_expired = false;  /* timeout expired?  */
+
+static void SigAlrmHandler(int s);
+static int set_standby_message_timeout(long duration);
+static int cancel_standby_message_timeout(void);static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
             uint32 timeline, char *basedir,               stream_stop_callback stream_stop, int
standby_message_timeout,
@@ -60,6 +68,14 @@ static long CalculateCopyStreamSleeptime(int64 now, int standby_message_timeout,static bool
ReadEndOfStreamingResult(PGresult*res, XLogRecPtr *startpos,                         uint32 *timeline);
 
+
+static void
+SigAlrmHandler(int s)
+{
+    standby_message_timeout_expired = true;
+}
+
+static boolmark_file_as_archived(const char *basedir, const char *fname){
@@ -802,6 +818,65 @@ ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos, uint32 *timeline)}/*
+ * set_standby_message_timeout: Set standby message timeout.
+ *
+ * duration is the timeout duration in milliseconds.
+ *
+ * Returns 0 on success and -1 on error.
+ */
+static int
+set_standby_message_timeout(long duration)
+{
+    struct itimerval itv;
+
+    /* negative duration means no standby_message_timeout. */
+    if (duration < 0) return 0;
+
+    MemSet(&itv, 0, sizeof(itv));
+    itv.it_value.tv_sec = duration / 1000;
+    itv.it_value.tv_usec = (duration % 1000) * 1000;
+    standby_message_timeout_active = true;
+    standby_message_timeout_expired = false;
+    if (setitimer(ITIMER_REAL, &itv, NULL) != 0)
+    {
+        fprintf(stderr, "could not set timer: %m");
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * cancel_standby_message_timeout: cancel standby message timeout if active.
+ *
+ * Returns 0 on success and -1 on error.
+ */
+static int
+cancel_standby_message_timeout(void)
+{
+    struct itimerval itv;
+
+    if (!standby_message_timeout_active)
+        return;
+
+    standby_message_timeout_active = false;
+
+    if (standby_message_timeout_expired)
+    {
+        standby_message_timeout_expired = false;
+        return 0;
+    }
+    MemSet(&itv, 0, sizeof(itv));
+    if (setitimer(ITIMER_REAL, &itv, NULL) != 0)
+    {
+        fprintf(stderr, "could not set timer: %m");
+        return -1;
+    }
+
+    return 0;
+}
+
+/* * The main loop of ReceiveXlogStream. Handles the COPY stream after * initiating streaming with the START_STREAMING
command.*
 
@@ -818,6 +893,15 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,    char       *copybuf =
NULL;   int64        last_status = -1;    XLogRecPtr    blockpos = startpos;
 
+    sighandler_t    oldhandler;
+
+    oldhandler = pqsignal(SIGALRM, SigAlrmHandler);
+
+    if (oldhandler != SigAlrmHandler)
+    {
+        /* SIGALRM should not be used here  */
+        Assert(oldhandler == SIG_DFL);
+    }    still_sending = true;
@@ -878,23 +962,9 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,         */        sleeptime =
CalculateCopyStreamSleeptime(now,standby_message_timeout,
last_status);
-        r = CopyStreamReceive(conn, sleeptime, ©buf);
-        while (r != 0)
+        while (r > 0)        {
-            if (r == -1)
-                goto error;
-            if (r == -2)
-            {
-                PGresult    *res = HandleEndOfCopyStream(conn, copybuf, blockpos,
-                                                         basedir, partial_suffix,
-                                                         stoppos, mark_done);
-                if (res == NULL)
-                    goto error;
-                else
-                    return res;
-            }
-            /* Check the message type. */            if (copybuf[0] == 'k')            {
@@ -925,14 +995,62 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,            }            /*
+              If standby_message_timeout has expired, send feed back and then
+             * set timeout again.
+             */
+            if (standby_message_timeout_expired)
+            {
+                if (!sendFeedback(conn, blockpos, now, false))
+                    goto error;
+                last_status = feGetCurrentTimestamp();
+
+                /* Set next timeout */
+                sleeptime = CalculateCopyStreamSleeptime(now, standby_message_timeout,
+                                                         last_status);
+                if (set_standby_message_timeout(sleeptime) < 0)
+                    goto error;
+            }
+
+            /*             * Process the received data, and any subsequent data we             * can read without
blocking.            */            r = CopyStreamReceive(conn, 0, ©buf);
 
+
+            /* If continuous input has come, set timeout for standby
+             * message. This fires a bit later than exact desired time but it
+             * would be earlier enough because standby_message_timeout is
+             * ususally set enough smaller than wal_sender_timeout.
+             */
+            if (r > 0 && !standby_message_timeout_active)
+            {
+                if (set_standby_message_timeout(sleeptime) < 0)
+                    goto error;
+
+            }        }
-    }
+        if (cancel_standby_message_timeout() < 0)
+            goto error;
+        if (r == -1)
+            goto error;
+        if (r == -2)
+        {
+            PGresult    *res = HandleEndOfCopyStream(conn, copybuf, blockpos,
+                                                     basedir, partial_suffix,
+                                                     stoppos, mark_done);
+            if (res == NULL)
+                goto error;
+            else
+                return res;
+        }
+
+    }
+    error:
+    cancel_standby_message_timeout();
+    pqsignal(SIGALRM, oldhandler);
+    if (copybuf != NULL)        PQfreemem(copybuf);    return NULL;

Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hello, the attached is the v4 patch that checks feedback timing
every WAL segments boundary.

At Fri, 20 Feb 2015 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20150220.172914.241732690.horiguchi.kyotaro@lab.ntt.co.jp>
> > Some users may complain about the performance impact by such
> > frequent calls and we may want to get rid of them from
> > walreceiver loop in the future.  If we adopt your idea now,
> > I'm afraid that it would tie our hands in that case.
> > 
> > How much impact can such frequent calls of gettimeofday()
> > have on replication performance? If it's not negligible,
> > probably we should remove them at first and find out another
> > idea to fix the problem you pointed. ISTM that it's not so
> > difficult to remove them. Thought? Do you have any numbers
> > which can prove that such frequent gettimeofday() has only
> > ignorable impact on the performance?
> 
> The attached patch is 'the more sober' version of SIGLARM patch.

I said that checking whether to send feedback every boundary
between WAL segments seemed too long but after some rethinking, I
changed my mind.
- The most large possible delay source in the busy-receive loop  is fsyncing at closing WAL segment file just written,
thiscan  take several seconds.  Freezing longer than the timeout  interval could not be saved and is not worth saving
anyway.
- 16M bytes-disk-writes intervals between gettimeofday() seems  to be gentle enough even on platforms where
gettimeofday()is  rather heavy.
 


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello, the attached is the v4 patch that checks feedback timing
> every WAL segments boundary.
>
> At Fri, 20 Feb 2015 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote
in<20150220.172914.241732690.horiguchi.kyotaro@lab.ntt.co.jp>
 
>> > Some users may complain about the performance impact by such
>> > frequent calls and we may want to get rid of them from
>> > walreceiver loop in the future.  If we adopt your idea now,
>> > I'm afraid that it would tie our hands in that case.
>> >
>> > How much impact can such frequent calls of gettimeofday()
>> > have on replication performance? If it's not negligible,
>> > probably we should remove them at first and find out another
>> > idea to fix the problem you pointed. ISTM that it's not so
>> > difficult to remove them. Thought? Do you have any numbers
>> > which can prove that such frequent gettimeofday() has only
>> > ignorable impact on the performance?
>>
>> The attached patch is 'the more sober' version of SIGLARM patch.
>
> I said that checking whether to send feedback every boundary
> between WAL segments seemed too long but after some rethinking, I
> changed my mind.
>
>  - The most large possible delay source in the busy-receive loop
>    is fsyncing at closing WAL segment file just written, this can
>    take several seconds.  Freezing longer than the timeout
>    interval could not be saved and is not worth saving anyway.
>
>  - 16M bytes-disk-writes intervals between gettimeofday() seems
>    to be gentle enough even on platforms where gettimeofday() is
>    rather heavy.

Sounds reasonable to me.

So we don't need to address the problem in walreceiver side so proactively
because it tries to send the feedback every after flushing the WAL records.
IOW, the problem you observed is less likely to happen. Right?

+            now = feGetCurrentTimestamp();
+            if (standby_message_timeout > 0 &&

Minor comment: should feGetCurrentTimestamp() be called after the check of
standby_message_timeout > 0, to avoid unnecessary calls of that?
ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,                   XLogRecPtr *blockpos, uint32 timeline,
           char *basedir, stream_stop_callback stream_stop,
 
-                   char *partial_suffix, bool mark_done)
+                   char *partial_suffix, bool mark_done,
+                   int standby_message_timeout, int64 *last_status)

Maybe it's time to refactor this ugly coding (i.e., currently many arguments
need to be given to each functions. Looks ugly)...

Regards,

-- 
Fujii Masao



Re: pg_basebackup may fail to send feedbacks.

От
Kyotaro HORIGUCHI
Дата:
Hi, the attached is the v5 patch.

- Do feGetCurrentTimestamp() only when necessary.
- Rebased to current master


At Mon, 2 Mar 2015 20:21:36 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwG1tJHpG03oZgwoKxt5wYD5v4S3HuTgSx7RotBhHnjwJw@mail.gmail.com>
> On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Hello, the attached is the v4 patch that checks feedback timing
> > every WAL segments boundary.
..
> > I said that checking whether to send feedback every boundary
> > between WAL segments seemed too long but after some rethinking, I
> > changed my mind.
> >
> >  - The most large possible delay source in the busy-receive loop
> >    is fsyncing at closing WAL segment file just written, this can
> >    take several seconds.  Freezing longer than the timeout
> >    interval could not be saved and is not worth saving anyway.
> >
> >  - 16M bytes-disk-writes intervals between gettimeofday() seems
> >    to be gentle enough even on platforms where gettimeofday() is
> >    rather heavy.
> 
> Sounds reasonable to me.
> 
> So we don't need to address the problem in walreceiver side so proactively
> because it tries to send the feedback every after flushing the WAL records.
> IOW, the problem you observed is less likely to happen. Right?
> 
> +            now = feGetCurrentTimestamp();
> +            if (standby_message_timeout > 0 &&

Surely it would hardly happen, especially on ordinary
configuration.

However, the continuous receiving of the replication stream is a
quite normal behavior even if hardly happens.  So the receiver
should guarantee the feedbacks to be sent by logic as long as it
is working normally, as long as the code for the special case
won't be too large and won't take too long time:).

The current walreceiver doesn't look trying to send feedbacks
after flushing every wal records. HandleCopyStream will
restlessly process the records in a gapless replication stream,
sending feedback only when keepalive packet arrives. It is the
fact that the response to the keepalive would help greatly but it
is not what the keepalives are for. It is intended to be used to
confirm if a silent receiver is still alive.

Even with this fix, the case that one write or flush takes longer
time than the feedback interval cannot be saved, but it would be
ok since it should be regarded as disorder.


> Minor comment: should feGetCurrentTimestamp() be called after the check of
> standby_message_timeout > 0, to avoid unnecessary calls of that?

Ah, you're right. I'll fixed it.

>  ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
>                     XLogRecPtr *blockpos, uint32 timeline,
>                     char *basedir, stream_stop_callback stream_stop,
> -                   char *partial_suffix, bool mark_done)
> +                   char *partial_suffix, bool mark_done,
> +                   int standby_message_timeout, int64 *last_status)
> 
> Maybe it's time to refactor this ugly coding (i.e., currently many arguments
> need to be given to each functions. Looks ugly)...

I'm increasing the ugliness:(

XLog stuff seems to need to share many states widely to work. But
the parameter list of the function looks to be bearable to this
extent, to me:).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup may fail to send feedbacks.

От
Fujii Masao
Дата:
On Tue, Mar 10, 2015 at 5:29 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hi, the attached is the v5 patch.
>
> - Do feGetCurrentTimestamp() only when necessary.
> - Rebased to current master
>
>
> At Mon, 2 Mar 2015 20:21:36 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwG1tJHpG03oZgwoKxt5wYD5v4S3HuTgSx7RotBhHnjwJw@mail.gmail.com>
>> On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> > Hello, the attached is the v4 patch that checks feedback timing
>> > every WAL segments boundary.
> ..
>> > I said that checking whether to send feedback every boundary
>> > between WAL segments seemed too long but after some rethinking, I
>> > changed my mind.
>> >
>> >  - The most large possible delay source in the busy-receive loop
>> >    is fsyncing at closing WAL segment file just written, this can
>> >    take several seconds.  Freezing longer than the timeout
>> >    interval could not be saved and is not worth saving anyway.
>> >
>> >  - 16M bytes-disk-writes intervals between gettimeofday() seems
>> >    to be gentle enough even on platforms where gettimeofday() is
>> >    rather heavy.
>>
>> Sounds reasonable to me.
>>
>> So we don't need to address the problem in walreceiver side so proactively
>> because it tries to send the feedback every after flushing the WAL records.
>> IOW, the problem you observed is less likely to happen. Right?
>>
>> +            now = feGetCurrentTimestamp();
>> +            if (standby_message_timeout > 0 &&
>
> Surely it would hardly happen, especially on ordinary
> configuration.
>
> However, the continuous receiving of the replication stream is a
> quite normal behavior even if hardly happens.  So the receiver
> should guarantee the feedbacks to be sent by logic as long as it
> is working normally, as long as the code for the special case
> won't be too large and won't take too long time:).
>
> The current walreceiver doesn't look trying to send feedbacks
> after flushing every wal records. HandleCopyStream will
> restlessly process the records in a gapless replication stream,
> sending feedback only when keepalive packet arrives. It is the
> fact that the response to the keepalive would help greatly but it
> is not what the keepalives are for. It is intended to be used to
> confirm if a silent receiver is still alive.
>
> Even with this fix, the case that one write or flush takes longer
> time than the feedback interval cannot be saved, but it would be
> ok since it should be regarded as disorder.
>
>
>> Minor comment: should feGetCurrentTimestamp() be called after the check of
>> standby_message_timeout > 0, to avoid unnecessary calls of that?
>
> Ah, you're right. I'll fixed it.

Even if the timeout has not elapsed yet, if synchronous mode is true, we should
send feedback here?

Regards,

-- 
Fujii Masao