Fujii-san, thank you for comments.
>The cause of this problem is that the checkpointer's sleep time is calculated
>from both checkpoint_timeout and archive_timeout during normal running,
>but calculated only from checkpoint_timeout during recovery. So Daisuke-san's
>patch tries to change that so that it's calculated from both of them even
>during recovery. No?
Yes, it's exactly so.
>last_xlog_switch_time is not updated during recovery. So "elapsed_secs" can be
>large and cur_timeout can be negative. Isn't this problematic?
Yes... My patch was missing this.
How about using the original archive_timeout value for calculating cur_timeout during recovery?
if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
{
elapsed_secs = now - last_xlog_switch_time;
if (elapsed_secs >= XLogArchiveTimeout)
continue; /* no sleep for us ... */
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
}
+ else if (XLogArchiveTimeout > 0)
+ cur_timeout = Min(cur_timeout, XLogArchiveTimeout);
During recovery, accurate cur_timeout is not calculated because elapsed_secs is not used.
However, after recovery is complete, WAL archiving will start by the next archive_timeout is reached.
I felt it is enough to solve this problem.
>As another approach, what about waking the checkpointer up at the end of
>recovery like we already do for walsenders?
If the above solution is not good, I will consider this approach.
Regards,
Daisuke, Higuchi