Обсуждение: SIGSEGV happens over once a day

Поиск
Список
Период
Сортировка

SIGSEGV happens over once a day

От
Richard Yen
Дата:
Hi all,

I'm experiencing signal 11 (segmentation fault) failures on the
master node of a 3-node Slony-I cluster.  In the past week, we've
averaged a little more than one segfault per day (11 times in the
past 10, including today).  Any ideas what's going on?

Would anyone know how to track this issue?

Don't know if attaching log output might help, but it's very similar
to the following (the responses to those threads didn't help us,
though):
http://archives.postgresql.org/pgsql-general/2004-06/msg01204.php
http://www.thescripts.com/forum/thread422225.html

Here's the machine where postgres is faulting:
db1 (Dell 6650):
master Slony-I node
postgreSQL version: 7.4.6
OS: Debian Linux 3.1
CPU: Xeon 4 X 2.5GHz
RAM: 8 GB
DISK:
      / 4 x 18 GB drive: raid 10
      /db/data/base 12 x 36 GB: raid 10
      /db/data/pg_xlog 2 x 73 GB: raid 1

The other two machines don't die, but they're set up pretty much the
same way.  The only difference is that db2 is running 8.1.3.

So what seems odd to me is that db1 and db3 are pretty much identical
(db3 has a 1.40GHz Xeon instead of a 2.5GHz, and some RAM
differences), yet postgres dies all the time on db1, but has yet to
die on db2 or db3, so I'm guessing maybe it's an UPDATE/INSERT/etc.?

Everything was running fine until last Tuesday, when this happened.
We've created no new stored procedures, made no changes, or anything
of the sort.

We've rebooted the db1 machine, but to no avail.  I guess the next
thing is swapping the RAM chips with other machines?  Any other
suggestions?

Let me know if you need other info...

Any help would be greatly appreciated!
--Richard

Re: SIGSEGV happens over once a day

От
Tom Lane
Дата:
Richard Yen <richyen@iparadigms.com> writes:
> Here's the machine where postgres is faulting:
> postgreSQL version: 7.4.6

Well, the very first recommendation you're going to get is to run
something newer.  7.4 is up to 7.4.12 (.13 next week) and we don't
make those patch releases just to fill spare time.

            regards, tom lane

Re: SIGSEGV happens over once a day

От
Tomasz Ostrowski
Дата:
On Thu, 11 May 2006, Richard Yen wrote:

> I'm experiencing signal 11 (segmentation fault) failures on the
> master node of a 3-node Slony-I cluster.  In the past week, we've
> averaged a little more than one segfault per day (11 times in the
> past 10, including today).  Any ideas what's going on?

Looks like hardware error. Failed fan. Processor/RAM/HDD overheating.
Try "memtest86" and "cpuburn" testing for several hours.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                      Winnie the Pooh