Обсуждение: postgres server crashes unexpectedly

Поиск
Список
Период
Сортировка

postgres server crashes unexpectedly

От
"Chadwick Horn"
Дата:
hi all,
 
we've had a db to stop working for some reason and have searched through google to it's ends to no avail. we have reindexed the database and system database as tried to determine other things that would keep it from working... again, to no avail.
 
this just started occuring about 2 weeks ago and below is a recent snippet of error log that we have.  any idea what would/could be causing these crashes? any response is much appreciated as this is not a time-critial issue for us.
 
 
thanks!
 
chadwick
 
error log:
 
 
LOG:  redo is not required
LOG:  database system is ready
PANIC:  corrupted item pointer: offset = 0, size = 0
LOG:  autovacuum process (PID 3037) was terminated by signal 6
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2008-03-18 02:55:30 PDT
LOG:  checkpoint record is at 25/6068AE30
LOG:  redo record is at 25/6068AE30; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/414366728; next OID: 240102
LOG:  next MultiXactId: 1132; next MultiXactOffset: 2431
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 25/6068AE80
LOG:  redo is not required
LOG:  database system is ready
PANIC:  corrupted item pointer: offset = 0, size = 0
LOG:  autovacuum process (PID 3045) was terminated by signal 6
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2008-03-18 02:56:31 PDT
LOG:  checkpoint record is at 25/6068AE80
LOG:  redo record is at 25/6068AE80; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/414366728; next OID: 240102
LOG:  next MultiXactId: 1132; next MultiXactOffset: 2431
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 25/6068AED0
LOG:  redo is not required
LOG:  database system is ready
PANIC:  corrupted item pointer: offset = 0, size = 0
LOG:  autovacuum process (PID 3072) was terminated by signal 6
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2008-03-18 02:57:32 PDT
LOG:  checkpoint record is at 25/6068AED0
LOG:  redo record is at 25/6068AED0; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/414366728; next OID: 240102
LOG:  next MultiXactId: 1132; next MultiXactOffset: 2431
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 25/6068AF20
LOG:  redo is not required
LOG:  database system is ready
 
 
 
 
 

Re: postgres server crashes unexpectedly

От
Tom Lane
Дата:
"Chadwick Horn" <chadhorn@gmail.com> writes:
> PANIC:  corrupted item pointer: offset = 0, size = 0
> LOG:  autovacuum process (PID 3037) was terminated by signal 6

Hmm ... the only instances of that error text are in PageIndexTupleDelete
and PageIndexMultiDelete, so we can fairly safely say that you have a
partially zeroed-out page in some index somewhere.  If that's the only
damage then you're in luck: you can recover by reindexing.

What I'd do is turn off autovacuum and instead do a manual VACUUM
VERBOSE to see where it crashes; then you could just reindex the one
problem table instead of the whole database.

You ought to look into why this happened, too.  Since you've provided
precisely 0 context about PG version or platform, it's hard to speculate
about that ...
        regards, tom lane


Re: postgres server crashes unexpectedly

От
"Chadwick Horn"
Дата:
Hi there,

Sorry about the lack of information on the system. We're running fedora (not 
for sure what version though) core (whitebox).

I did as you said and this is the result:

DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  "grp_member": moved 0 row versions, truncated 4 to 4 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "public.story_member"
INFO:  "story_member": found 603570 removable, 9903 nonremovable row 
versions in 43011 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 44 to 44 bytes long.
There were 6139208 unused item pointers.
Total free space (including removable row versions) is 323999824 bytes.
42732 pages are or will become empty, including 0 at the end of the table.
42958 pages containing 323999400 free bytes are potential move destinations.
CPU 0.52s/0.18u sec elapsed 5.91 sec.
INFO:  index "fkx_story__story_member" now contains 9903 row versions in 
17736 pages
DETAIL:  64 index row versions were removed.
15219 index pages have been deleted, 15219 are currently reusable.
CPU 0.29s/0.06u sec elapsed 26.88 sec.
PANIC:  corrupted item pointer: offset = 0, size = 0
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: WARNING: 
terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
Failed.
!>
!>




I keep getting this error:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Succeeded.


What could be doing this? It just started out of the blue... I reindexed the 
index it mentioned and it seems to error out more...



-Chadwick



----- Original Message ----- 
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Chadwick Horn" <chadhorn@gmail.com>
Cc: <pgsql-sql@postgresql.org>
Sent: Monday, March 17, 2008 7:32 PM
Subject: Re: [SQL] postgres server crashes unexpectedly


> "Chadwick Horn" <chadhorn@gmail.com> writes:
>> PANIC:  corrupted item pointer: offset = 0, size = 0
>> LOG:  autovacuum process (PID 3037) was terminated by signal 6
>
> Hmm ... the only instances of that error text are in PageIndexTupleDelete
> and PageIndexMultiDelete, so we can fairly safely say that you have a
> partially zeroed-out page in some index somewhere.  If that's the only
> damage then you're in luck: you can recover by reindexing.
>
> What I'd do is turn off autovacuum and instead do a manual VACUUM
> VERBOSE to see where it crashes; then you could just reindex the one
> problem table instead of the whole database.
>
> You ought to look into why this happened, too.  Since you've provided
> precisely 0 context about PG version or platform, it's hard to speculate
> about that ...
>
> regards, tom lane 



Re: postgres server crashes unexpectedly

От
Joshua Kramerý€€€„
Дата:
On Tue, 18 Mar 2008, Chadwick Horn wrote:

> Sorry about the lack of information on the system. We're running fedora (not 
> for sure what version though) core (whitebox).

This may not matter in the least bit, but have you tried running the DB on 
a real RHEL, or CentOS box?  The kernel and libs on such a box would most 
likely be more stable than those on Fedora-based boxen...

Cheers,
-Josh



Re: postgres server crashes unexpectedly

От
"Chadwick Horn"
Дата:
In all honesty, we're fairly "trapped" on the box we have due to the depths 
of corporate approvals required to get something new online. I would, most 
def, prefer to be on anything BUT this...


----- Original Message ----- 
From: "Joshua Kramerý€€€„" <josh@globalherald.net>
To: "Chadwick Horn" <chadhorn@gmail.com>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-sql@postgresql.org>
Sent: Tuesday, March 18, 2008 8:37 AM
Subject: Re: [SQL] postgres server crashes unexpectedly


>
> On Tue, 18 Mar 2008, Chadwick Horn wrote:
>
>> Sorry about the lack of information on the system. We're running fedora 
>> (not for sure what version though) core (whitebox).
>
> This may not matter in the least bit, but have you tried running the DB on 
> a real RHEL, or CentOS box?  The kernel and libs on such a box would most 
> likely be more stable than those on Fedora-based boxen...
>
> Cheers,
> -Josh
> 



Re: postgres server crashes unexpectedly

От
Tom Lane
Дата:
"Chadwick Horn" <chadhorn@gmail.com> writes:
> I keep getting this error:

> Attempting reset: WARNING: terminating connection because of crash of another server process

It looks to me like psql is managing to start a new connection before
the postmaster notices the crash of the prior backend and tells
everybody to get out of town.  Which is odd, but maybe not too
implausible if your kernel is set up to favor interactive processes over
background --- it'd likely think psql is interactive and the postmaster
isn't.

> What could be doing this? It just started out of the blue... I reindexed the 
> index it mentioned and it seems to error out more...

If you reindexed only the last-mentioned index, then you reindexed the
wrong thing; it presumably died on the next index of story_member.
I'd reindex the whole table rather than guess which that is.

You should also consider the not-zero probability that you have more
than one corrupted index.  Keep reindexing tables until you can get
through a database-wide VACUUM.
        regards, tom lane


Re: postgres server crashes unexpectedly

От
"Chadwick Horn"
Дата:
> "Chadwick Horn" <chadhorn@gmail.com> writes:
>> I keep getting this error:
>
>> Attempting reset: WARNING: terminating connection because of crash of 
>> another server process
>
> It looks to me like psql is managing to start a new connection before
> the postmaster notices the crash of the prior backend and tells
> everybody to get out of town.  Which is odd, but maybe not too
> implausible if your kernel is set up to favor interactive processes over
> background --- it'd likely think psql is interactive and the postmaster
> isn't.

Is there a way to disable this or to make both interactive and/or 
background?

>
>> What could be doing this? It just started out of the blue... I reindexed 
>> the
>> index it mentioned and it seems to error out more...
>
> If you reindexed only the last-mentioned index, then you reindexed the
> wrong thing; it presumably died on the next index of story_member.
> I'd reindex the whole table rather than guess which that is.
>
> You should also consider the not-zero probability that you have more
> than one corrupted index.  Keep reindexing tables until you can get
> through a database-wide VACUUM.

I have VACUUM'd it until it's fibers are coming out. It seems to crash at 
various places (which, most likely, would be resolved if question #1 above 
is possible) and holds no consistancy. The error logs provide even fewer 
clues than the verbose output.


-chadwick



Re: postgres server crashes unexpectedly

От
Colin Wetherbee
Дата:
Chadwick Horn wrote:
>> It looks to me like psql is managing to start a new connection 
>> before the postmaster notices the crash of the prior backend and
>>  tells everybody to get out of town.  Which is odd, but maybe not
>>  too implausible if your kernel is set up to favor interactive 
>> processes over background --- it'd likely think psql is 
>> interactive and the postmaster isn't.
> 
> Is there a way to disable this or to make both interactive and/or 
> background?

I'm not sure how applications tell the kernel whether they are
interactive or background (or even if they do, at all), but you can
set the kernel's preference for this in the kernel configuration.

If you're not comfortable recompiling a new kernel, though, then
you're out of luck.

At any rate, you should look more thoroughly for problems with your
database before blaming the kernel for something.

Colin