Re: Backend core dump, Please help, Urgent!

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Backend core dump, Please help, Urgent!
Дата
Msg-id 16289.945207576@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: [HACKERS] Re: Backend core dump, Please help, Urgent!  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
[ I'm redirecting this to pg-hackers since it doesn't look like an
interfaces problem ... ]

Matthew Hagerty <matthew@venux.net> writes:
> The app is written in PHP3-3.0.12 compiled as an Apache-1.3.6 module.  The
> OS is FreeBSD-3.1-Release with GCC-2.7.2.1 and a PostgreSQL-6.5.1 backend.

You should probably update to 6.5.3 for starters.  I'm not all that
hopeful that any of the bugfixes in 6.5.3 will fix this, but it'd be
pretty silly not to try it before investing a lot of work running down
the problem.

> The app went online on August 30, 1999 and has run without incident until
> yesterday.  At about 10am Dec, 13th, 1999 one of the programmers noticed
> that none of the forum messages would come up.  I went to the console of
> the server and saw this message about 10 or 15 times:

> Dec 13 10:35:56 redbox /kernel: pid 13856 (postgres), uid 1002: exited on
> signal 11 (core dumped)

> A ps -xa revealed about 15 or so postgres processes!  I did not think
> postgres made any child processes?!?!  So I stopped the web server and
> killed the main postgres process which seemed to kill all the other
> postgres processes.  I then tried to restart postgres and got an error
> message that was something like:

> IpcSemaphore??? - Key=54321234 Max

You could probably have recovered from this with "ipcclean" instead of a
reboot; it sounds like the postmaster failed to release the shared
semaphores before exiting.  Which it should have, unless maybe you used
kill -9 on it...

> At 9:36am on the 14th it happened again.  Again I was unable to recover the
> data and had to rebuild the data directory.  I did not delete the data
> directory this time, I just moved it to another directory so I would have
> it.  I also have the core dumps.  The only file I had to delete was the
> pg_log in the data directory.  What is this file?  It had grown to 700Meg
> in under 24 hours!!  Also, the core dump for the main app grew from 2.7Meg
> to over 80Meg while I was trying to dump the data.

Sure sounds like a corrupted-data problem.  Can you use gdb on the
corefiles to get a backtrace of what they were doing?

> My biggest hang-up is why all of a sudden?

Good question.  We'll probably know the answer when we find the problem.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] [6.5.3] FATAL 1: my bits moved right off the end of the world!
Следующее
От: wieck@debis.com (Jan Wieck)
Дата:
Сообщение: Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining