URGENT: Database keeps crashing - suspect damaged RAM

Поиск
Список
Период
Сортировка
От Markus Wollny
Тема URGENT: Database keeps crashing - suspect damaged RAM
Дата
Msg-id 2266D0630E43BB4290742247C8910575014CE340@dozer.computec.de
обсуждение исходный текст
Ответы Re: URGENT: Database keeps crashing - suspect damaged RAM  (nconway@klamath.dyndns.org (Neil Conway))
Re: URGENT: Database keeps crashing - suspect damaged RAM  (John Gray <jgray@azuli.co.uk>)
Re: URGENT: Database keeps crashing - suspect damaged RAM  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hello!

I just installed PostgreSQL 7.2.1 on SuSE 7.3, 4xPIIIXEON 550MHz, 2GB
RAM, 5x18GB SCSI RAID. The OS was freshly installed, after that I
compiled and installed PostgreSQL from source (./configure
--prefix=/opt/pgsql/ --with-perl --enable-odbc --enable-locale
--enable-syslog). I copied the settings in postgresql.conf etc. from an
identical machine running the identical platform. Then I imported a
database to the new installation. The import seems to be successfull, I
didn't get any errors during import. A subsequent vacuum analyze did
finish without anything out of the ordinary.

Just a few minutes after this vacuum analyze, the database crashed for
the first time. It keeps crashing every now and then - every one or two
minutes.

What puzzles me is the fact that this very same machine was running
Oracle 8i on Win2k more or less flawlessly just up to a few hours before
- more or less meaning that we never really noticed anything much out of
the ordinary. There might have been some minor issues after a
RAM-upgrade from 1 GB to 2 GB just a week ago, but looking back it's
hard to say if that could be due to bad RAM or just some bad code which
we've sorted out (or disposed of) by now. As the machine is already
running Linux and PostgreSQL it's quite impossible to prove my suspicion
by going back to Oracle and having a closer look.

What I'd like to know is if I need to look any further than RAM - shall
I just chuck the new modules out of the machine? Or is there some other
issue that could cause this behaviour? I am quite sure that I didn't do
anything wrong during installation, configuration and import and the
same application code is running without errors on a different machine
at this very moment. I don't like the "record with zero length" and
"Cannot allocate memory"-bits in the logfile at all, let alone the "was
terminated by signal 9"-thingy.

So: Is it bad RAM? How can I make sure? What else could it be?

Here's a small excerpt from the logfile:

2002-08-06 17:31:38 [17063]  DEBUG:  Pages 0: Changed 0, Empty 0; Tup 0:
Vac 0, Keep 0, UnUsed 0.
        Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
2002-08-06 17:36:23 [17296]  DEBUG:  _mdfd_blind_getseg: couldn't open
/var/lib/pgsql/data/base/base/16596/16671: Cannot allocate memory
2002-08-06 17:36:24 [17296]  FATAL 2:  cannot write block 13387 of
16596/16671 blind: Cannot allocate memory
2002-08-06 17:36:24 [16530]  DEBUG:  server process (pid 17296) exited
with exit code 2
2002-08-06 17:36:24 [16530]  DEBUG:  terminating any other active server
processes
2002-08-06 17:36:24 [17081]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
[...]
2002-08-06 17:36:24 [16530]  DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-08-06 17:36:24 [17298]  DEBUG:  database system was interrupted at
2002-08-06 17:31:21 CEST
2002-08-06 17:36:24 [17298]  DEBUG:  checkpoint record is at 0/325D7C78
2002-08-06 17:36:24 [17298]  DEBUG:  redo record is at 0/325D7C78; undo
record is at 0/0; shutdown FALSE
2002-08-06 17:36:24 [17298]  DEBUG:  next transaction id: 2270; next
oid: 901292
2002-08-06 17:36:24 [17298]  DEBUG:  database system was not properly
shut down; automatic recovery in progress
2002-08-06 17:36:24 [17298]  DEBUG:  redo starts at 0/325D7CB8
2002-08-06 17:36:25 [17298]  DEBUG:  ReadRecord: record with zero length
at 0/326E16C4
2002-08-06 17:36:25 [17298]  DEBUG:  redo done at 0/326E16A0
2002-08-06 17:36:30 [17298]  DEBUG:  database system is ready
2002-08-06 17:40:53 [16530]  DEBUG:  connection startup failed (fork
failure): Cannot allocate memory
2002-08-06 17:52:50 [16530]  DEBUG:  connection startup failed (fork
failure): Cannot allocate memory
2002-08-06 17:52:54 [16530]  DEBUG:  server process (pid 18237) was
terminated by signal 9
2002-08-06 17:52:54 [16530]  DEBUG:  terminating any other active server
processes
2002-08-06 17:52:54 [18234]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
[...]
2002-08-06 17:52:57 [18253]  FATAL 1:  The database system is in
recovery mode
2002-08-06 17:52:57 [18255]  FATAL 1:  The database system is in
recovery mode
2002-08-06 17:52:57 [18254]  FATAL 1:  The database system is in
recovery mode
2002-08-06 17:52:57 [18235]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-08-06 17:52:57 [18256]  FATAL 1:  The database system is in
recovery mode
2002-08-06 17:52:57 [18257]  FATAL 1:  The database system is in
recovery mode
2002-08-06 17:52:57 [18258]  FATAL 1:  The database system is in
recovery mode
2002-08-06 17:52:57 [16530]  DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-08-06 17:52:57 [18260]  FATAL 1:  The database system is starting
up
2002-08-06 17:52:57 [18259]  DEBUG:  database system was interrupted at
2002-08-06 17:51:38 CEST
2002-08-06 17:52:57 [18259]  DEBUG:  checkpoint record is at 0/32991848
2002-08-06 17:52:57 [18259]  DEBUG:  redo record is at 0/3297F4D8; undo
record is at 0/0; shutdown FALSE
2002-08-06 17:52:57 [18259]  DEBUG:  next transaction id: 3704; next
oid: 909484
2002-08-06 17:52:57 [18259]  DEBUG:  database system was not properly
shut down; automatic recovery in progress
2002-08-06 17:52:57 [18259]  DEBUG:  redo starts at 0/3297F4D8
2002-08-06 17:52:57 [18261]  FATAL 1:  The database system is starting
up
2002-08-06 17:52:58 [18259]  DEBUG:  ReadRecord: record with zero length
at 0/32BF0278
2002-08-06 17:52:58 [18259]  DEBUG:  redo done at 0/32BF0254
2002-08-06 17:52:59 [18262]  FATAL 1:  The database system is starting
up
2002-08-06 17:53:00 [18259]  DEBUG:  database system is ready
2002-08-06 17:54:24 [16530]  DEBUG:  connection startup failed (fork
failure): Cannot allocate memory
2002-08-06 17:54:31 [16530]  DEBUG:  server process (pid 18283) was
terminated by signal 9
2002-08-06 17:54:31 [16530]  DEBUG:  terminating any other active server
processes
2002-08-06 17:54:31 [18275]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
[...]
2002-08-06 17:54:32 [16530]  DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
2002-08-06 17:54:32 [18296]  DEBUG:  database system was interrupted at
2002-08-06 17:53:00 CEST
2002-08-06 17:54:32 [18296]  DEBUG:  checkpoint record is at 0/32BF0278
2002-08-06 17:54:32 [18296]  DEBUG:  redo record is at 0/32BF0278; undo
record is at 0/0; shutdown TRUE
2002-08-06 17:54:32 [18296]  DEBUG:  next transaction id: 4456; next
oid: 909484
2002-08-06 17:54:32 [18296]  DEBUG:  database system was not properly
shut down; automatic recovery in progress
2002-08-06 17:54:32 [18296]  DEBUG:  redo starts at 0/32BF02B8
2002-08-06 17:54:32 [18296]  DEBUG:  ReadRecord: record with zero length
at 0/32F0B3C0
2002-08-06 17:54:32 [18296]  DEBUG:  redo done at 0/32F0B39C
2002-08-06 17:54:34 [18297]  FATAL 1:  The database system is starting
up
2002-08-06 17:54:34 [18298]  FATAL 1:  The database system is starting
up
2002-08-06 17:54:34 [18299]  FATAL 1:  The database system is starting
up
2002-08-06 17:54:34 [18300]  FATAL 1:  The database system is starting
up
2002-08-06 17:54:34 [18296]  DEBUG:  database system is ready
2002-08-06 17:57:35 [16530]  DEBUG:  connection startup failed (fork
failure): Cannot allocate memory
2002-08-06 17:57:54 [16530]  DEBUG:  server process (pid 18366) was
terminated by signal 9
2002-08-06 17:57:54 [16530]  DEBUG:  terminating any other active server
processes
2002-08-06 17:57:54 [18368]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-08-06 17:57:56 [18409]  DEBUG:  ReadRecord: record with zero length
at 0/3338749C
2002-08-06 17:57:58 [18425]  FATAL 1:  The database system is starting
up
2002-08-06 17:57:58 [18409]  DEBUG:  database system is ready
2002-08-06 17:58:53 [18432]  NOTICE:  RelationBuildDesc: can't open
idx_bm_user_id: Cannot allocate memory
2002-08-06 17:59:00 [18443]  FATAL 1:  cannot open pg_attribute: Cannot
allocate memory
2002-08-06 17:59:01 [16530]  DEBUG:  connection startup failed (fork
failure): Cannot allocate memory
2002-08-06 17:59:01 [16530]  DEBUG:  server process (pid 18436) was
terminated by signal 9
2002-08-06 17:59:01 [16530]  DEBUG:  terminating any other active server
processes
2002-08-06 17:59:03 [18510]  DEBUG:  ReadRecord: record with zero length
at 0/336E9970
2002-08-06 18:00:15 [16530]  DEBUG:  connection startup failed (fork
failure): Cannot allocate memory
2002-08-06 18:00:17 [18589]  DEBUG:  ReadRecord: record with zero length
at 0/33A7C194

Thank you for your kind assistance!

Regards,

    Markus Wollny

В списке pgsql-general по дате отправления:

Предыдущее
От: J Smith
Дата:
Сообщение: Re: Creating GiST Indices?
Следующее
От: "Markus Wollny"
Дата:
Сообщение: Re: URGENT: Database keeps crashing - suspect damaged RAM