On Tue, Aug 16, 2011 at 9:55 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> When I tested the PITR on git master with max_wal_senders > 0,
> I found that the following inappropriate log meesage was always
> output even though cascading replication is not in progress. Attached
> patch fixes this problem.
>
> LOG: terminating all walsender processes to force cascaded
> standby(s) to update timeline and reconnect
>
> When making the patch, I found another problem about cascading
> replication; When promoting a cascading standby, postmaster sends
> SIGUSR2 to any cascading walsenders to kill them. But there is a
> orner-case where such walsender fails to receive SIGUSR2 and
> survives a standby promotion unexpectedly. This happens when
> postmaster sends SIGUSR2 before the walsender marks itself as
> a WAL sender, because postmaster sends SIGUSR2 to only the
> processes marked as a WAL sender.
>
> To avoid the corner-case, I changed walsender so that it checks
> whether recovery is in progress or not again after marking itself
> as a WAL sender. If recovery is not in progress even though the
> walsender is cascading one, it does the same thing as SIGUSR2
> signal handler does, and then exits later. Attached patch also includes
> this fix.
Looks like valid problems and appropriate fixes to me. Will commit.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services