Обсуждение: Postmaster crashes after upgrade to 8.1.4!

Поиск
Список
Период
Сортировка

Postmaster crashes after upgrade to 8.1.4!

От
CG
Дата:
Upgrading from 8.1.3 to 8.1.4, I compiled with the same configure flags,
installed to a separate directory, shut down 8.1.3, copied the "data" directory
over to the new 8.1.4 directory (cp -Rp), set my symlinks so that
/usr/local/pgsql points to the new 8.1.4 directory, and fired it up.  I ran
some queries, inserts, updates, deletes, etc. Everything looked good! I removed
the old data directory since by leaving it there we'd be strapped for space,
and I went to bed. Everything ran fine until 4AM when the database dump runs.
After that we've been extremely unstable... We run, we crash, we run, we crash
... Here's a snippet of what I've been seeing...

2006-05-25 08:29:26.665 EDT   LOG:  all server processes terminated;
reinitializing
2006-05-25 08:29:26.677 EDT   LOG:  database system was interrupted at
2006-05-25 08:28:55 EDT
2006-05-25 08:29:26.678 EDT   LOG:  checkpoint record is at 28/3C0A35D8
2006-05-25 08:29:26.678 EDT   LOG:  redo record is at 28/3C0A35D8; undo record
is at 0/0; shutdown TRUE
2006-05-25 08:29:26.678 EDT   LOG:  next transaction ID: 204190433; next OID:
186871674
2006-05-25 08:29:26.678 EDT   LOG:  next MultiXactId: 1; next MultiXactOffset:
0
2006-05-25 08:29:26.678 EDT   LOG:  database system was not properly shut down;
automatic recovery in progress
2006-05-25 08:29:26.688 EDT   LOG:  redo starts at 28/3C0A3628
2006-05-25 08:29:26.702 EDT   LOG:  unexpected pageaddr 28/28102000 in log file
40, segment 60, offset 1056768
2006-05-25 08:29:26.702 EDT   LOG:  redo done at 28/3C0FFDF8
2006-05-25 08:29:27.097 EDT myuser mydata 192.168.167.4(54695)FATAL:  the
database system is starting up
2006-05-25 08:29:27.303 EDT   LOG:  database system is ready
2006-05-25 08:29:27.303 EDT   LOG:  transaction ID wrap limit is 1073799886,
limited by database "postgres"
2006-05-25 08:30:34.139 EDT   LOG:  autovacuum: processing database "mydata"
2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was terminated
by signal 11
2006-05-25 08:30:50.076 EDT   LOG:  terminating any other active server
processes
2006-05-25 08:30:50.076 EDT myuser mydata 10.0.1.1(4135)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.076 EDT myuser mydata 10.0.1.1(4135)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.076 EDT myuser mydata 10.0.1.1(4135)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.077 EDT myuser mydata 10.0.0.12(2990)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.077 EDT myuser mydata 10.0.0.12(2990)DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2006-05-25 08:30:50.077 EDT myuser mydata 10.0.0.12(2990)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.078 EDT myuser mydata 192.168.167.4(54696)WARNING:
terminating connection because of crash of another server process
2006-05-25 08:30:50.078 EDT myuser mydata 192.168.167.4(54696)DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2006-05-25 08:30:50.078 EDT myuser mydata 192.168.167.4(54696)HINT:  In a
moment you should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.080 EDT myuser mydata 10.0.2.1(4474)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.080 EDT myuser mydata 10.0.2.1(4474)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.080 EDT myuser mydata 10.0.2.1(4474)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.081 EDT myuser mydata 10.0.2.1(4473)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.081 EDT myuser mydata 10.0.2.1(4473)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.081 EDT myuser mydata 10.0.2.1(4473)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.081 EDT myuser mydata 10.0.2.1(4459)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.081 EDT myuser mydata 10.0.2.1(4459)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.081 EDT myuser mydata 10.0.2.1(4459)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.084 EDT myuser mydata 10.0.1.1(4128)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.084 EDT myuser mydata 10.0.1.1(4128)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.084 EDT myuser mydata 10.0.1.1(4128)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.085 EDT myuser mydata 10.0.1.1(4129)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.085 EDT myuser mydata 10.0.1.1(4129)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.085 EDT myuser mydata 10.0.1.1(4129)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.085 EDT myuser mydata 10.0.2.1(4472)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.085 EDT myuser mydata 10.0.2.1(4472)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.085 EDT myuser mydata 10.0.2.1(4472)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.086 EDT myuser mydata 10.0.1.1(4130)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.086 EDT myuser mydata 10.0.1.1(4130)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.086 EDT myuser mydata 10.0.1.1(4130)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.087 EDT myuser mydata 10.0.2.1(4471)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.087 EDT myuser mydata 10.0.2.1(4471)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.087 EDT myuser mydata 10.0.2.1(4471)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.087 EDT myuser mydata 10.0.1.1(4131)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.087 EDT myuser mydata 10.0.1.1(4131)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.087 EDT myuser mydata 10.0.1.1(4131)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4138)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4138)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4138)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4136)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4136)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4136)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4132)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.088 EDT myuser mydata 10.0.1.1(4132)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4132)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4133)WARNING:  terminating
connection because of crash of another server process
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4133)DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly corrupted
shared memory.
2006-05-25 08:30:50.089 EDT myuser mydata 10.0.1.1(4133)HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
2006-05-25 08:30:50.091 EDT   LOG:  all server processes terminated;
reinitializing
2006-05-25 08:30:50.103 EDT   LOG:  database system was interrupted at
2006-05-25 08:29:27 EDT
2006-05-25 08:30:50.103 EDT   LOG:  checkpoint record is at 28/3C101AE8
2006-05-25 08:30:50.103 EDT   LOG:  redo record is at 28/3C101AE8; undo record
is at 0/0; shutdown TRUE
2006-05-25 08:30:50.103 EDT   LOG:  next transaction ID: 204190698; next OID:
186879866
2006-05-25 08:30:50.103 EDT   LOG:  next MultiXactId: 1; next MultiXactOffset:
0
2006-05-25 08:30:50.103 EDT   LOG:  database system was not properly shut down;
automatic recovery in progress
2006-05-25 08:30:50.114 EDT   LOG:  redo starts at 28/3C101B38


What could be wrong? I am panicking. Please advise!

CG

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Postmaster crashes after upgrade to 8.1.4!

От
Tom Lane
Дата:
CG <cgg007@yahoo.com> writes:
> 2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was terminated
> by signal 11

That should be leaving a core dump file (if not, restart the postmaster
under "ulimit -c unlimited").  Get a stack trace with gdb to get some
more info about what's going on.

            regards, tom lane

Re: Postmaster crashes after upgrade to 8.1.4!

От
CG
Дата:
I didn't find a core dump.

Perhaps I'm looking in the wrong spot or for the wrong file. The file should be
called "core.32140", correct? ... I did a "find / -name core*" ... that found
nothing useful.



--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> CG <cgg007@yahoo.com> writes:
> > 2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was
> terminated
> > by signal 11
>
> That should be leaving a core dump file (if not, restart the postmaster
> under "ulimit -c unlimited").  Get a stack trace with gdb to get some
> more info about what's going on.
>
>             regards, tom lane
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Postmaster crashes after upgrade to 8.1.4!

От
Bill Moran
Дата:
CG <cgg007@yahoo.com> wrote:

> I didn't find a core dump.
>
> Perhaps I'm looking in the wrong spot or for the wrong file. The file should be
> called "core.32140", correct? ... I did a "find / -name core*" ... that found
> nothing useful.

find / -name '*core*' would be more reliable.  FreeBSD, for example, makes
coredumps in the format {processname}.core.

> --- Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > CG <cgg007@yahoo.com> writes:
> > > 2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was
> > terminated
> > > by signal 11
> >
> > That should be leaving a core dump file (if not, restart the postmaster
> > under "ulimit -c unlimited").  Get a stack trace with gdb to get some
> > more info about what's going on.

--
Bill Moran

Also, I can kill you with my brain.

    River Tam


Re: Postmaster crashes after upgrade to 8.1.4!

От
CG
Дата:
Okay, there was no core dump to be found.

I had to revert back to 8.1.3 which seems to be running fine. I am /extremely/
thankful that there was no data corruption.

I took a 24 hour old dumpfile of the database it was crashing on and I restored
it to a similar AMD64 box (SunFire x2100 instead of SunFire x4100) running
8.1.4 and tried to crash it as the other was crashing. No joy. It seems to run.
I'll leave it running and try to put a decent load on the box to get it to
crash.

Since I would have to down the production database to get a working copy, I
won't be able to copy the offending data directory over to the test
installation until my next maint window rolls around in a few weeks. That, or
we have another outage of some type which would give me the ability to down the
database and copy the tree over.

I wish I could've done more analysis while the server was crippled. I'll keep
trying.

CG

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> CG <cgg007@yahoo.com> writes:
> > 2006-05-25 08:30:50.076 EDT   LOG:  server process (PID 32140) was
> terminated
> > by signal 11
>
> That should be leaving a core dump file (if not, restart the postmaster
> under "ulimit -c unlimited").  Get a stack trace with gdb to get some
> more info about what's going on.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com