Re: Core dump

Поиск
Список
Период
Сортировка
От Dan Moschuk
Тема Re: Core dump
Дата
Msg-id 20001012182442.A3861@spirit.jaded.net
обсуждение исходный текст
Ответ на Re: Core dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Core dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
| Still, it's a mighty peculiar backtrace.

Indeed.

| After looking at postmaster.c, I see that the postmaster will issue
| SIGUSR1 to all remaining backends *each* time it sees a child exit
| with nonzero status.  And it just so happens that quickdie() chooses
| to exit with exit(1) not exit(0).  So a new theory is
| 
| 1. Some backend crashes.
| 
| 2. Postmaster issues SIGUSR1 to all remaining backends.
| 
| 3. As each backend gives up the ghost, postmaster gets another wait()
|    response and issues another SIGUSR1 to the ones that are left.
| 
| 4. Last remaining backend has been SIGUSR1'd enough times to overrun
|    stack memory, leading to coredump.

This theory might make a little more sense with the explanation below.

| I'm not too enamored of this theory because it doesn't explain the
| perfect repeatability shown in your backtrace.  It seems unlikely that
| each recursive quickdie() call would get just as far as elog's write()
| and no farther before the postmaster is able to issue another signal.
| Still, it's a possibility.

Well, when this happens the machine is _heavily_ loaded.  It could be that
the write()s are just taking longer than they should, giving it enough time
to be signaled by another SIGUSR1.  It may also explain why the SIGUSR1s
are being sent so much, as the heavily loaded machine tends not to clean up
its children as fast as it is expected.

| We should probably tweak the postmaster to be less enthusiastic about
| signaling its children repeatedly.

Perhaps have postgres ignore SIGUSR1 after it has already received one?

Regards,
-Dan
-- 
Man is a rational animal who always loses his temper when he is called
upon to act in accordance with the dictates of reason.               -- Oscar Wilde


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Core dump
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Core dump