On 2013-02-01 08:55:24 -0500, Peter Eisentraut wrote:
> On 1/31/13 5:42 PM, MauMau wrote:
> > Thank you for sharing your experience. So you also considered making
> > postmaster SIGKILL children like me, didn't you? I bet most of people
> > who encounter this problem would feel like that.
> >
> > It is definitely pg_ctl who needs to be prepared, not the users. It may
> > not be easy to find out postgres processes to SIGKILL if multiple
> > instances are running on the same host. Just doing "pkill postgres"
> > will unexpectedly terminate postgres of other instances.
>
> In my case, it was one backend process segfaulting, and then some other
> backend processes didn't respond to the subsequent SIGQUIT sent out by
> the postmaster. So pg_ctl didn't have any part in it.
>
> We ended up addressing that by installing a nagios event handler that
> checked for this situation and cleaned it up.
>
> > I would like to make a patch which that changes SIGQUIT to SIGKILL when
> > postmaster terminates children. Any other better ideas?
>
> That was my idea back then, but there were some concerns about it.
>
> I found an old patch that I had prepared for this, which I have
> attached. YMMV.
> +static void
> +quickdie_alarm_handler(SIGNAL_ARGS)
> +{
> + /*
> + * We got here if ereport() was blocking, so don't go there again
> + * except when really asked for.
> + */
> + elog(DEBUG5, "quickdie aborted by alarm");
> +
Its probably not wise to enter elog.c again, that path might allocate
memory and we wouldn't be any wiser. Unfortunately there's not much
besides a write(2) to stderr that can safely be done...
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services