Backend core dump, Please help, Urgent!

Поиск
Список
Период
Сортировка
От Matthew Hagerty
Тема Backend core dump, Please help, Urgent!
Дата
Msg-id 4.1.19991214174335.0400ba00@mail.venux.net
обсуждение исходный текст
Список pgsql-hackers
Greetings,

I think Tom Lane forwarded this over from [INTERFACES] (thanks Tom!), but I
thought I should post it since it is my problem and not Tom's.

Original post as follows:
-------------------------

If anyone could help me figure out what is going on with my PostgreSQL
backend I would greatly appreciate it!!  I'll try to be brief and to the point.

I work for a small company and we created an online app for another small
company that has about 300 members who access the site.  I think the record
for simultaneous logins is about 15, so the load is not really that great.
There are about 3000 to 5000 records added per month.

The app is written in PHP3-3.0.12 compiled as an Apache-1.3.6 module.  The
OS is FreeBSD-3.1-Release with GCC-2.7.2.1 and a PostgreSQL-6.5.1 backend.
I start the postgres process at startup like this:

su postgres -c "/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data -i
> /usr/local/pgsql/postgres.log 2>&1 &"

The server is an Intel R440LX Motherboard with two P2/333, 128Meg ECC DIMM,
and three 4.5G WD SCSI drives.

The primary database and main app code were designed and written in-house,
however we do use a PHP3 program called Phorum to implement a message forum
for the users.  The main app database and the phorum database are two
separate databases.

The app went online on August 30, 1999 and has run without incident until
yesterday.  At about 10am Dec, 13th, 1999 one of the programmers noticed
that none of the forum messages would come up.  I went to the console of
the server and saw this message about 10 or 15 times:

Dec 13 10:35:56 redbox /kernel: pid 13856 (postgres), uid 1002: exited on
signal 11 (core dumped)

A ps -xa revealed about 15 or so postgres processes!  I did not think
postgres made any child processes?!?!  So I stopped the web server and
killed the main postgres process which seemed to kill all the other
postgres processes.  I then tried to restart postgres and got an error
message that was something like:

IpcSemaphore??? - Key=54321234 Max

I could kick myself for not recording the exact message.  Something to do
with shared memory I think.  Never the less, postgres was not going to
start back up and I did not know what the error was telling me, so I had to
reboot (uptime said 143 days).

When the system came back up postgres started and I tried to check if there
was a post to the phorum database that may have caused the core dump.  I
executed 2 queries and then tried to query the main app database from
another terminal.  The main app queries were not executing, so I did a ps
-xa to see what processes were running and there were exactly 2 core dumped
sig 11 postgres processes!!  So I did another query on the phorum database
and got a 3rd core dumped process!

At this point I killed all the postgres processes, restarted postgres and
tried to do a dump on the main app database.  pg_dump gave an error similar
to this (I kick myself again):

Tuple 0:0 invalid, can't dump.

So, pg_dump was not going to give me a backup to that point, so I stopped
postgres and issued:

# rm -r data
# initdb
# createdb ipa
# createdb phorum

Then I used the previous day's backup for the main app, and just created
the table structure for the phourm since we do not backup that data.
Restarted the postgres and the web server and all seemed fine... until today.

At 9:36am on the 14th it happened again.  Again I was unable to recover the
data and had to rebuild the data directory.  I did not delete the data
directory this time, I just moved it to another directory so I would have
it.  I also have the core dumps.  The only file I had to delete was the
pg_log in the data directory.  What is this file?  It had grown to 700Meg
in under 24 hours!!  Also, the core dump for the main app grew from 2.7Meg
to over 80Meg while I was trying to dump the data.

My biggest hang-up is why all of a sudden?  We literally did not change
anything!  The system was working fine since August.  And now, after
creating new databases, it does it again in less than 24 hours!  Also, is
there some reason why the log file created by postgres does not timestamp
its entries?

I will provide any table structures, core files, server logs, etc. if
needed.  Anything that might give me an idea as to what is going on.

Thank you,
Matthew


Matthew Hagerty
Venux Technology Group
matthew@venux.net
616.458.9800 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: wieck@debis.com (Jan Wieck)
Дата:
Сообщение: Re: [HACKERS] Volunteer: Large Tuples / Tuple chaining
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] UNICODE characters vs. BINARY