On Fri, Apr 1, 2011 at 4:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think this problem is harmless in practice since it doesn't happen
>> too often. But
>> that can happen...
>>
>> The simple fix is to change ServerLoop() so that it periodically calls
>> PostmasterStateMachine() while shutdown is running.
>
> One idea I had was to have a backend that changes state from regular
> backend to walsender kick the postmaster in some way - for example by
> writing to a socket the other end of which the postmaster is holding
> open. Florian suggested that might be useful anyway as a means of
> detecting when the postmaster has gone belly-up, so maybe we could
> kill two birds with one stone. That seems like too much rejiggering
> to do this late in the release cycle, though. But I don't think the
> idea of calling PostmasterStateMachine() periodically is very
> appealing either - that's a significant change in how that code is
> being used now, and even if it doesn't break anything else, it'll
> allow for hangs of up to 60 seconds, which doesn't sound exciting
> either.
>
> The root of this problem in some sense is that we don't distinguish
> between regular backends and backends that haven't yet decided whether
> they are regular backends or walsenders. But even if we created such
> a distinction it won't fix the problem unless the postmaster somehow
> gets notified of the state change. And if we have that, then we're
> back to not needing to distinguish.
>
> Anyone have a good idea?
Another simple fix is to make walsender send SIGUSR1 to postmaster
so that it calls PostmasterStateMachine() in sigusr1_handler(), when it
marks itself as walsender. The attached patch does this. Thought?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center