Re: Core dump

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Core dump
Дата
Msg-id 27214.971381455@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Core dump  (Dan Moschuk <dan@freebsd.org>)
Ответы Re: Core dump
Список pgsql-hackers
Dan Moschuk <dan@freebsd.org> writes:
> Sparc solaris 2.7 with postgres 7.0.2
> It seems to be reproducable, the server crashes on us at a rate of about
> every few hours.

That's a very bizarre backtrace.  Why the multiple levels of recursive
entry to the quickdie() signal handler?  I wonder if you aren't looking
at some kind of Solaris bug --- perhaps it's not able to cope with a
signal handler turning around and issuing new kernel calls.

The core file you are looking at is probably *not* from the original
failure, whatever that is.  The sequence is probably

1. Some backend crashes for unknown reason, dumping core.

2. Postmaster observes messy death of a child, decides that mass suicide  followed by restart is called for.
Postmastersends SIGUSR1 to all  remaining backends to make them commit hara-kiri.
 

3. One or more other backends crash trying to obey postmaster's command.  The corefile left for you to examine comes
fromwhichever crashed  last.
 

So there are at least two problems here, but we only have evidence of
the second one.

Since the problem is fairly reproducible, I'd suggest you temporarily
dike out the elog(NOTICE) call in quickdie() (in
src/backend/tcop/postgres.c), which will probably allow the backends
to honor SIGUSR1 without dumping core.  Then you have a shot at seeing
the core from the original failure.

Assuming that this works (ie, you find a core that's not got anything
to do with quickdie()), I'd suggest an inquiry to Sun about whether
their signal handler logic hasn't got a problem with write() issued
from inside a signal handler.  Meanwhile let us know what the new
backtrace shows.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephan Szabo
Дата:
Сообщение: Re: possible constraint bug?
Следующее
От: Joseph Shraibman
Дата:
Сообщение: Re: [INTERFACES] JDBC Large ResultSet problem + BadTimeStamp Patch