Re: [ADMIN] Streaming Replication Server Crash

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: [ADMIN] Streaming Replication Server Crash
Дата
Msg-id 50862525.5060904@ringerc.id.au
обсуждение исходный текст
Ответ на Re: [ADMIN] Streaming Replication Server Crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [ADMIN] Streaming Replication Server Crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On 10/22/2012 08:52 PM, Tom Lane wrote:
> Craig Ringer <ringerc@ringerc.id.au> writes:
>> On 10/19/2012 04:40 PM, raghu ram wrote:
>>> 2012-10-19 12:26:46 IST [1338]: [18-1] user=,db= LOG:  server process
>>> (PID 15565) was terminated by signal 10
>
>> That's odd. SIGUSR1 (signal 10) shouldn't terminate PostgreSQL.
>
>> Was the server intentionally sent SIGUSR1 by an admin? Do you know what
>> triggered the signal?
>
> SIGUSR1 is used for all sorts of internal cross-process signaling
> purposes.  There's no need to hypothesize any external force sending
> it; if somebody had broken a PG process's signal handling setup for
> SIGUSR1, a crash of this sort could be expected in short order.
>
> But having said that, are we sure 10 is SIGUSR1 on the OP's platform?
> AFAIK, that signal number is not at all compatible across different
> flavors of Unix.  (I see SIGUSR1 is 30 on OS X for instance.)

Gah. I incorrectly though that POSIX specified signal *numbers*, not
just names. That does not appear to actually be the case. Thanks.

A bit of searching suggests that on Solaris/SunOS, signal 10 is SIGBUS:

http://www.s-gms.ms.edus.si/cgi-bin/man-cgi?signal+3HEAD
http://docs.oracle.com/cd/E23824_01/html/821-1464/signal-3head.html

... which tends to suggest an entirely different interpretation than
"someone broke a signal hander":

https://blogs.oracle.com/peteh/entry/sigbus_versus_sigsegv_according_to

such as:

- Bad mmap()ed read
- alignment error
- hardware fault

so it's not immensely different to a segfault in that it can be caused
by errors in hardware, OS, or applications.

Raghu, did PostgreSQL dump a core file? If it didn't, you might want to
enable core dumps in future. If it did dump a core, attaching a debugger
to the core file might tell you where it crashed, possibly offering some
more information to diagnose the issue. I'm not familiar enough with
Solaris to offer detailed advice on that, especially as you haven't
mentioned your Solaris version, how you installed Pg, etc. This may be
of some use:


http://stackoverflow.com/questions/6403803/how-to-get-backtrace-function-line-number-on-solaris

--
Craig Ringer


В списке pgsql-general по дате отправления:

Предыдущее
От: Scott Marlowe
Дата:
Сообщение: Re: Plug-pull testing worked, diskchecker.pl failed
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [ADMIN] Streaming Replication Server Crash