Re: stats collector dies in current

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: stats collector dies in current
Дата
Msg-id 19363.1092543548@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: stats collector dies in current  (Jan Wieck <JanWieck@Yahoo.com>)
Ответы Re: stats collector dies in current  (Oliver Jowett <oliver@opencloud.com>)
Список pgsql-hackers
Jan Wieck <JanWieck@Yahoo.com> writes:
> In that context, is SIGTSTP similar to SIGSTOP in that it cannot be 
> caught or ignored?

Possibly.  I've reproduced the problem here on an RHL 8 system
(2.4.18 kernel) and I think it's a kernel bug.  Points:

1. AFAICS, the only case where the stats buffer process will exit(1)
without logging a prior message is where it's gotten SIGCHLD.  So,
hypothesis: it is the collector process (grandchild process) that
is dying.

2. Experiment one: try to strace the collector process to see what
it's doing.  Result: failure goes away!!!

3. Experiment two: try to strace the buffer process.  Result: indeed
it's getting SIGCHLD (in fact it seems to get it before SIGTSTP
arrives).

So at the very least we've got a Heisenbug, but my opinion is we are
seeing broken kernel behavior.

The only difference in signal handling that I can see from 7.4 is that
the collector process explicitly executes pqsignal calls to re-establish
all the signal handlers it should have inherited from its parent.
I suspect (but haven't tested) that removing that supposedly redundant
code would make the failure go away again.

The handler re-establishment was put in because it is needed for the
EXEC_BACKEND case, but possibly we could make it #ifndef EXEC_BACKEND
to work around this problem.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jan Wieck
Дата:
Сообщение: Re: stats collector dies in current
Следующее
От: Gavin Sherry
Дата:
Сообщение: Re: 8.0 beta status