Обсуждение: vacuumdb: PANIC: corrupted item pointer

Поиск
Список
Период
Сортировка

vacuumdb: PANIC: corrupted item pointer

От
Alain Peyrat
Дата:

Hello,

System: Red Hat Linux 4 64bits running postgres-7.4.16 (production)

Initial problem:

# pg_dump -O dbname -Ft -f /tmp/database.tar
pg_dump: query to get table columns failed: ERROR:  invalid memory alloc request size 9000688640

After some research, it seems to be related to a corruption of the database. Running a vacum crashes the db:

-bash-3.00$ vacuumdb -z -a
vacuumdb: vacuuming database "template1"
vacuumdb: vacuuming of database "template1" failed: PANIC:  corrupted item pointer: offset = 3336, size = 20
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
-bash-3.00$

Can someone help me to track down the problem ?

Can I recover the datas (last backup failed) ?

Best regards,

Alain.

Re: vacuumdb: PANIC: corrupted item pointer

От
Richard Huxton
Дата:
Alain Peyrat wrote:
>   Hello,
>
>   System: Red Hat Linux 4 64bits running postgres-7.4.16 (production)
>
>   Initial problem:
>
>   # pg_dump -O dbname -Ft -f /tmp/database.tar
> pg_dump: query to get table columns failed: ERROR:  invalid memory alloc
> request size 9000688640
>
>   After some research, it seems to be related to a corruption of the
> database. Running a vacum crashes the db:
>
> -bash-3.00$ vacuumdb -z -a
> vacuumdb: vacuuming database "template1"
> vacuumdb: vacuuming of database "template1" failed: PANIC:  corrupted
> item pointer: offset = 3336, size = 20

It would be nice if it's just template1 that is damaged, but I'm not
sure that's the case.

1. Have you had crashes or other hardware problems recently?
2. Can you vacuum the other databases?
3. Can you just dump the schema for your selected database with
--schema-only? (your pg_dump seemed to fail fetching column details)
4. Can you dump individual tables with --table=? (it might just be one
table that's damaged)

http://www.postgresql.org/docs/7.4/static/app-pgdump.html

>   Can someone help me to track down the problem ?
>
>   Can I recover the datas (last backup failed) ?

How much can be recovered will depend on what corruption has occurred.
If it's just a damaged index, then reindexing will fix it.
If it's a damaged system catalogue we might be able to fix the catalogue
then read the data.
If data files are damaged we'll need to locate the damage and work
around it. You will probably lose data on any damaged pages.

--
   Richard Huxton
   Archonet Ltd

Re: vacuumdb: PANIC: corrupted item pointer

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
> Alain Peyrat wrote:
>> Initial problem:
>>
>> # pg_dump -O dbname -Ft -f /tmp/database.tar
>> pg_dump: query to get table columns failed: ERROR:  invalid memory alloc
>> request size 9000688640
>>
>> After some research, it seems to be related to a corruption of the
>> database. Running a vacum crashes the db:
>>
>> -bash-3.00$ vacuumdb -z -a
>> vacuumdb: vacuuming database "template1"
>> vacuumdb: vacuuming of database "template1" failed: PANIC:  corrupted
>> item pointer: offset = 3336, size = 20

> It would be nice if it's just template1 that is damaged, but I'm not
> sure that's the case.

This looks pretty bad, because the above already proves corruption in two
different databases --- there is something wrong somewhere in template1,
and something else wrong somewhere in "dbname" (unless that is actually
template1).  It seems likely that there's been widespread damage from
some hardware or filesystem-level misfortune.

FWIW, a look in the source code shows that the 'corrupted item pointer'
message comes only from PageIndexTupleDelete, so that indicates a
damaged index which should be fixable by reindexing.  But the other one
looks more like heap damage, since it's apparently trying to copy a
damaged variable-width field and I don't think index entries get copied
anywhere in a normal query.  The query pg_dump is complaining about
looks at pg_attribute and pg_type entries for a particular table ---
hmm, I notice that it's not fetching any variable-width fields, so it's
not entirely clear where the error is coming from.  If you are really
lucky it could be that the corruption is actually in pg_statistic,
and it's failing when the planner tries to estimate row counts for the
query.  That would be really lucky because that's all discardable data.
Try "DELETE FROM pg_statistic" as superuser and see if you can dump
then.

> 1. Have you had crashes or other hardware problems recently?

Indeed.

            regards, tom lane

Re: vacuumdb: PANIC: corrupted item pointer

От
Richard Huxton
Дата:
Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
>> Alain Peyrat wrote:
>>> Initial problem:
>>>
>>> # pg_dump -O dbname -Ft -f /tmp/database.tar
>>> pg_dump: query to get table columns failed: ERROR:  invalid memory alloc
>>> request size 9000688640
>>>
>>> After some research, it seems to be related to a corruption of the
>>> database. Running a vacum crashes the db:
>>>
>>> -bash-3.00$ vacuumdb -z -a
>>> vacuumdb: vacuuming database "template1"
>>> vacuumdb: vacuuming of database "template1" failed: PANIC:  corrupted
>>> item pointer: offset = 3336, size = 20

>> It would be nice if it's just template1 that is damaged, but I'm not
>> sure that's the case.
>
> This looks pretty bad, because the above already proves corruption in two
> different databases --- there is something wrong somewhere in template1,
> and something else wrong somewhere in "dbname" (unless that is actually
> template1).  It seems likely that there's been widespread damage from
> some hardware or filesystem-level misfortune.
>
> FWIW, a look in the source code shows that the 'corrupted item pointer'
> message comes only from PageIndexTupleDelete, so that indicates a
> damaged index which should be fixable by reindexing.

Tom - could it be damage to a shared system-catalogue, and template1
just the first DB to be vacuumed? It just strikes me as odd that an
index on template1 would be corrupted - assuming it's your typical empty
template1 and is just being connected to during DB creation etc.

Unless, of course, you're looking at genuine on-disk corruption of
unused files/inodes in which it could be anything. Ick :-(

--
   Richard Huxton
   Archonet Ltd

Re: vacuumdb: PANIC: corrupted item pointer

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
> Tom Lane wrote:
>> FWIW, a look in the source code shows that the 'corrupted item pointer'
>> message comes only from PageIndexTupleDelete, so that indicates a
>> damaged index which should be fixable by reindexing.

> Tom - could it be damage to a shared system-catalogue, and template1
> just the first DB to be vacuumed?

Possible, but in any case the other error indicates an independent
problem.

> It just strikes me as odd that an
> index on template1 would be corrupted - assuming it's your typical empty
> template1 and is just being connected to during DB creation etc.

Yeah, I was wondering about that too.  PageIndexTupleDelete wouldn't
even get called unless there was something to delete, which indicates
a table that's been changed.  Is the OP in the habit of doing real work
in template1?

            regards, tom lane

Re: vacuumdb: PANIC: corrupted item pointer

От
AlJeux
Дата:
Richard Huxton a écrit :
> Alain wrote:
>>   Hello,
>>
>>   System: Red Hat Linux 4 64bits running postgres-7.4.16 (production)
>>
>>   Initial problem:
>>
>>   # pg_dump -O dbname -Ft -f /tmp/database.tar
>> pg_dump: query to get table columns failed: ERROR:  invalid memory
>> alloc request size 9000688640
>>
>>   After some research, it seems to be related to a corruption of the
>> database. Running a vacum crashes the db:
>>
>> -bash-3.00$ vacuumdb -z -a
>> vacuumdb: vacuuming database "template1"
>> vacuumdb: vacuuming of database "template1" failed: PANIC:  corrupted
>> item pointer: offset = 3336, size = 20
>
> It would be nice if it's just template1 that is damaged, but I'm not
> sure that's the case.

First, thank you Richard and Tom for helping me but I finally decided to
restore the last backup so now, the problem is no longer critical.

Anyway, being able to rescue some datas may be interesting.
The strange point is that the database was running quite well for simple
queries (nothing really visible but no backup possible).

> 1. Have you had crashes or other hardware problems recently?

No crash but we changed our server (<= seems the cause).

First try was using a file system copy to reduce downtime as it was two
same 7.4.x version but the result was not working (maybe related to
architecture change 32bits => 64 bits) so I finally dropped the db and
performed an dump/restore. I think, the db was a mix of 32/64 bits files.

> 2. Can you vacuum the other databases?

No.

> 3. Can you just dump the schema for your selected database with
> --schema-only? (your pg_dump seemed to fail fetching column details)

No.

> 4. Can you dump individual tables with --table=? (it might just be one
> table that's damaged)

No.

>
> http://www.postgresql.org/docs/7.4/static/app-pgdump.html
>
>>   Can someone help me to track down the problem ?
>>
>>   Can I recover the datas (last backup failed) ?
>
> How much can be recovered will depend on what corruption has occurred.
> If it's just a damaged index, then reindexing will fix it.
> If it's a damaged system catalogue we might be able to fix the catalogue
> then read the data.
> If data files are damaged we'll need to locate the damage and work
> around it. You will probably lose data on any damaged pages.
>


Re: vacuumdb: PANIC: corrupted item pointer

От
Richard Huxton
Дата:
AlJeux wrote:
> Richard Huxton a écrit :
>> 1. Have you had crashes or other hardware problems recently?
>
> No crash but we changed our server (<= seems the cause).
>
> First try was using a file system copy to reduce downtime as it was two
> same 7.4.x version but the result was not working (maybe related to
> architecture change 32bits => 64 bits) so I finally dropped the db and
> performed an dump/restore. I think, the db was a mix of 32/64 bits files.

Ah! That'll do it. You will get problems with a filesystem copy if
1. The database is running.
2. The architectures aren't identical
3. The installation options are different (a simple date option
difference can cause problems)

I'm surprised you didn't get more problems frankly.

--
   Richard Huxton
   Archonet Ltd

Re: vacuumdb: PANIC: corrupted item pointer

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
>> First try was using a file system copy to reduce downtime as it was two
>> same 7.4.x version but the result was not working (maybe related to
>> architecture change 32bits => 64 bits) so I finally dropped the db and
>> performed an dump/restore. I think, the db was a mix of 32/64 bits files.

> Ah! That'll do it. You will get problems with a filesystem copy if
> 1. The database is running.
> 2. The architectures aren't identical
> 3. The installation options are different (a simple date option
> difference can cause problems)

PG 8.1 and later record MAXALIGN in pg_control.h, and will refuse to
start up if there is a 32/64 bit compatibility problem.  But 7.4 has
no such defense.  It does check for --enable-integer-datetimes
compatibility, though.

            regards, tom lane