Re: The database system is in recovery mode

Поиск
Список
Период
Сортировка
От Andrew Sullivan
Тема Re: The database system is in recovery mode
Дата
Msg-id 20030502141444.GC13419@libertyrms.info
обсуждение исходный текст
Ответ на The database system is in recovery mode  (Trevor Astrope <astrope@e-corp.net>)
Список pgsql-admin
On Thu, May 01, 2003 at 06:24:03PM -0400, Trevor Astrope wrote:
>  Could this be the linux kernel randomly killing processes under heavy
> load issue?

Not from the look of things.  See below.

> System is postgresql 7.2.1 on redhat 7.2. Here's the logs:

You should really upgrade at least to 7.2.4 (no dump required).
7.2.1 has some nasty bugs.

> 2003-05-01 16:54:08 DEBUG:  server process (pid 2599) was
> terminated by signal 11
                       ^^

That's not signal 9, so it's not the kernel.  Sig 11 is SIGSEV on
Linux, which probably means some sort of memory problem.  Are you
suing ECC RAM for your database?  You should.  In any case, the first
thing I'd do is run memtest86 on it.


> 2003-05-01 16:54:08 DEBUG:  terminating any other active server processes
> 2003-05-01 16:54:08 NOTICE:  Message from PostgreSQL backend:
>         The Postmaster has informed me that some other backend
>         died abnormally and possibly corrupted shared memory.
>         I have rolled back the current transaction and am
>         going to terminate your database system connection and exit.
>         Please reconnect to the database system and repeat your query.
>
> After a bunch of these, the database goes in recovery mode:

That's what it's supposed to do.  It's what WAL buys you.

> I presume this is rerunning the WAL? Is the message serious...could there
> be database corruption or just lost transactions?

Neither, assuming you have good hardware and you're using fsync.  WAL
is there precisely to make the system crash safe.  (Of course, if
it's sitting on an ext2 partition and the system goes down hard, you
have a different batch of problems.  But WAL+fsync protects you from
postmaster crashes, and machine crashes if your filesystem is
crash-safe.)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


В списке pgsql-admin по дате отправления:

Предыдущее
От: JEANARTHUR@EUROVOX.FR
Дата:
Сообщение: problem after an hd failure
Следующее
От: Tom Lane
Дата:
Сообщение: Re: problem after an hd failure