Обсуждение: database corruption

Поиск
Список
Период
Сортировка

database corruption

От
Michael Guerin
Дата:
Hi,

    We recently switch over from Solaris to Linux and we've been
experiencing a couple database corruptions problems.  We're using the
2.6 kernel and a XFS file system, in case there are any known problems
with this setup.

It's my understanding that these errors are typically b/c of hardware
problems, so we're in the process of checking the disk.  However, are
there other things that I should look to find out what's going on?   The
table that was corrupted earlier this week was dropped and  recreated,
but it was corrupted again sometime last night.  The table is pretty
active, about 10GB are inserted and deleted every day.

-Michael


here's what vacuum comes up with

INFO:  vacuuming "public.tbltimeseries"
WARNING:  relation "tbltimeseries" TID 696892/1: OID is invalid
WARNING:  relation "tbltimeseries" TID 696892/2: OID is invalid
WARNING:  relation "tbltimeseries" TID 696892/3: OID is invalid
WARNING:  relation "tbltimeseries" TID 696892/4: OID is invalid
WARNING:  relation "tbltimeseries" TID 696892/5: OID is invalid
WARNING:  relation "tbltimeseries" TID 696892/6: OID is invalid
WARNING:  relation "tbltimeseries" TID 696893/1: OID is invalid
WARNING:  relation "tbltimeseries" TID 696893/2: OID is invalid
WARNING:  relation "tbltimeseries" TID 696893/3: OID is invalid
WARNING:  relation "tbltimeseries" TID 696893/4: OID is invalid
WARNING:  relation "tbltimeseries" TID 696893/5: OID is invalid
WARNING:  relation "tbltimeseries" TID 696893/6: OID is invalid
WARNING:  relation "tbltimeseries" TID 696894/1: OID is invalid
WARNING:  relation "tbltimeseries" TID 696894/2: OID is invalid
WARNING:  relation "tbltimeseries" TID 696894/3: OID is invalid
WARNING:  relation "tbltimeseries" TID 696894/4: OID is invalid
WARNING:  relation "tbltimeseries" TID 696895/1: OID is invalid
WARNING:  relation "tbltimeseries" TID 696895/2: OID is invalid
WARNING:  relation "tbltimeseries" TID 696895/3: OID is invalid
WARNING:  relation "tbltimeseries" TID 696895/4: OID is invalid
WARNING:  relation "tbltimeseries" TID 696895/5: OID is invalid
WARNING:  relation "tbltimeseries" TID 696895/6: OID is invalid
WARNING:  relation "tbltimeseries" TID 1124824/1: OID is invalid
WARNING:  relation "tbltimeseries" TID 1124824/2: OID is invalid
WARNING:  relation "tbltimeseries" TID 1124824/3: OID is invalid
WARNING:  relation "tbltimeseries" TID 1124824/4: OID is invalid
WARNING:  relation "tbltimeseries" TID 1124824/5: OID is invalid
INFO:  index "idx_timeseries2" now contains 27066056 row versions in
108521 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 3.30s/1.06u sec elapsed 17.28 sec.
INFO:  "tbltimeseries": found 0 removable, 27065818 nonremovable row
versions in 1643991 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 86 unused item pointers.
0 pages are entirely empty.
CPU 25.41s/4.88u sec elapsed 272.64 sec.
INFO:  vacuuming "pg_toast.pg_toast_1436963967"
ERROR:  invalid page header in block 1890530 of relation
"pg_toast_1436963967"

Re: database corruption

От
"Scott Marlowe"
Дата:
On Fri, 2004-06-11 at 14:16, Michael Guerin wrote:
> Hi,
>
>     We recently switch over from Solaris to Linux and we've been
> experiencing a couple database corruptions problems.  We're using the
> 2.6 kernel and a XFS file system, in case there are any known problems
> with this setup.
>
> It's my understanding that these errors are typically b/c of hardware
> problems, so we're in the process of checking the disk.  However, are
> there other things that I should look to find out what's going on?   The
> table that was corrupted earlier this week was dropped and  recreated,
> but it was corrupted again sometime last night.  The table is pretty
> active, about 10GB are inserted and deleted every day.

Be sure and test your memory too.  And if you have a hardware RAID
controller, sometimes they can be slightly broken and cause random
corruption.


Re: database corruption

От
Michael Guerin
Дата:
Scott Marlowe wrote:

>On Fri, 2004-06-11 at 14:16, Michael Guerin wrote:
>
>
>>Hi,
>>
>>    We recently switch over from Solaris to Linux and we've been
>>experiencing a couple database corruptions problems.  We're using the
>>2.6 kernel and a XFS file system, in case there are any known problems
>>with this setup.
>>
>>It's my understanding that these errors are typically b/c of hardware
>>problems, so we're in the process of checking the disk.  However, are
>>there other things that I should look to find out what's going on?   The
>>table that was corrupted earlier this week was dropped and  recreated,
>>but it was corrupted again sometime last night.  The table is pretty
>>active, about 10GB are inserted and deleted every day.
>>
>>
>
>Be sure and test your memory too.  And if you have a hardware RAID
>controller, sometimes they can be slightly broken and cause random
>corruption.
>
>
>
Switching from XFS to Reiserfs seems to have resolved our problems.

Re: database corruption

От
"A Palmblad"
Дата:
----- Original Message -----
From: "Michael Guerin" <guerin@rentec.com>
To: <pgsql-novice@postgresql.org>
Sent: Thursday, June 24, 2004 12:33 PM
Subject: Re: [NOVICE] database corruption


> Scott Marlowe wrote:
>
> >On Fri, 2004-06-11 at 14:16, Michael Guerin wrote:
> >
> >
> >>Hi,
> >>
> >>    We recently switch over from Solaris to Linux and we've been
> >>experiencing a couple database corruptions problems.  We're using the
> >>2.6 kernel and a XFS file system, in case there are any known problems
> >>with this setup.
> >>
> >>It's my understanding that these errors are typically b/c of hardware
> >>problems, so we're in the process of checking the disk.  However, are
> >>there other things that I should look to find out what's going on?   The
> >>table that was corrupted earlier this week was dropped and  recreated,
> >>but it was corrupted again sometime last night.  The table is pretty
> >>active, about 10GB are inserted and deleted every day.
> >>
> >>
> >
> >Be sure and test your memory too.  And if you have a hardware RAID
> >controller, sometimes they can be slightly broken and cause random
> >corruption.
> >
> >
> >
> Switching from XFS to Reiserfs seems to have resolved our problems.
>

That sounds like a problem we had - we were running off an XFS partition,
had table corruption trouble (invalid page headers) on  a large table.  We
were running kernel 2.6, AMD64 architecture.  After the problem occurred a
couple times we upgraded our raid controller firmware and driver, checked
our RAM, and switched over to the JFS file system.  Haven't had trouble
since.  We tried a couple things when fixing our problem; but I'm wondering
if anyone else is having trouble with XFS, and that that might be the root
cause of the trouble.

-Adam