Re: Chasing "signal 11" issues

Поиск
Список
Период
Сортировка
От Scott Marlowe
Тема Re: Chasing "signal 11" issues
Дата
Msg-id 1143732927.26940.11.camel@state.g2switchworks.com
обсуждение исходный текст
Ответ на Chasing "signal 11" issues  ("Tass Chapman" <tasseh.postgres@gmail.com>)
Список pgsql-general
On Thu, 2006-03-30 at 07:02, Tass Chapman wrote:
> Since Monday I have been seeing "terminated by signal 11" messages in
> my 7.4.6 + Slon 1.0.5 system,. but only on the master
>
> I've done a dumapall, initdb and restore , which reduced the frequency
> but I still get them 6-8 times a day.
>
> After turning up logging it seemed to die when calling a very small
> table (2 rows, 4 columns, 8 char text strings), but manually selecting
> caused no issues, so I then took a hit and shutdown the system and
> swapped out the RAM (from earlier list suggestions).
>
> This seemed to work until 7 hours later when the problem has
> reappeared, at a higher frequency too.
>
> It is ONLY occuring on the master, not on any of the leaf (replicated)
> nodes, and seems to be triggered by a few different systems connecting
> (so no common code base)

As mentioned earlier, this tends to be caused by hardware.  Note that it
can be caused by buggy software or corrupted binaries as well.

It is possible that the binaries you're running on have become corrupted
in some small way.  You might want to run md5sum across all the binaries
(postgresql, slony, etc...) on the bad and good machine and compare
them.

If the problem is in the hardware, and I think it is, it could be
anywhere, bad drive, raid controller, raid cache, scsi interface, CPU,
memory, and so on.so, memtest86 might find the problem if it's mainboard
/ CPU / memory, but if it's an I/O problem, it won't.

The most common failures are mechanical in nature.  I've had machines
that were crashing, and all I had to do was reseat the CPU or memory or
heat sink and suddenly it was running fine.

However, you need to switch over to your failover machine immediately.
Running your main database on what is most likely faulty hardware is a
recipe for corruption of your database.

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Chasing "signal 11" issues
Следующее
От: "Christopher Condit"
Дата:
Сообщение: Re: pgsql and streams