Обсуждение: vacuumdb: PANIC: corrupted item pointer
Hello,
System: Red Hat Linux 4 64bits running postgres-7.4.16 (production)
Initial problem:
# pg_dump -O dbname -Ft -f /tmp/database.tar
pg_dump: query to get table columns failed: ERROR: invalid memory alloc request size 9000688640
After some research, it seems to be related to a corruption of the database. Running a vacum crashes the db:
-bash-3.00$ vacuumdb -z -a
vacuumdb: vacuuming database "template1"
vacuumdb: vacuuming of database "template1" failed: PANIC: corrupted item pointer: offset = 3336, size = 20
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
-bash-3.00$
Can someone help me to track down the problem ?
Can I recover the datas (last backup failed) ?
Best regards,
Alain.
Alain Peyrat wrote: > Hello, > > System: Red Hat Linux 4 64bits running postgres-7.4.16 (production) > > Initial problem: > > # pg_dump -O dbname -Ft -f /tmp/database.tar > pg_dump: query to get table columns failed: ERROR: invalid memory alloc > request size 9000688640 > > After some research, it seems to be related to a corruption of the > database. Running a vacum crashes the db: > > -bash-3.00$ vacuumdb -z -a > vacuumdb: vacuuming database "template1" > vacuumdb: vacuuming of database "template1" failed: PANIC: corrupted > item pointer: offset = 3336, size = 20 It would be nice if it's just template1 that is damaged, but I'm not sure that's the case. 1. Have you had crashes or other hardware problems recently? 2. Can you vacuum the other databases? 3. Can you just dump the schema for your selected database with --schema-only? (your pg_dump seemed to fail fetching column details) 4. Can you dump individual tables with --table=? (it might just be one table that's damaged) http://www.postgresql.org/docs/7.4/static/app-pgdump.html > Can someone help me to track down the problem ? > > Can I recover the datas (last backup failed) ? How much can be recovered will depend on what corruption has occurred. If it's just a damaged index, then reindexing will fix it. If it's a damaged system catalogue we might be able to fix the catalogue then read the data. If data files are damaged we'll need to locate the damage and work around it. You will probably lose data on any damaged pages. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > Alain Peyrat wrote: >> Initial problem: >> >> # pg_dump -O dbname -Ft -f /tmp/database.tar >> pg_dump: query to get table columns failed: ERROR: invalid memory alloc >> request size 9000688640 >> >> After some research, it seems to be related to a corruption of the >> database. Running a vacum crashes the db: >> >> -bash-3.00$ vacuumdb -z -a >> vacuumdb: vacuuming database "template1" >> vacuumdb: vacuuming of database "template1" failed: PANIC: corrupted >> item pointer: offset = 3336, size = 20 > It would be nice if it's just template1 that is damaged, but I'm not > sure that's the case. This looks pretty bad, because the above already proves corruption in two different databases --- there is something wrong somewhere in template1, and something else wrong somewhere in "dbname" (unless that is actually template1). It seems likely that there's been widespread damage from some hardware or filesystem-level misfortune. FWIW, a look in the source code shows that the 'corrupted item pointer' message comes only from PageIndexTupleDelete, so that indicates a damaged index which should be fixable by reindexing. But the other one looks more like heap damage, since it's apparently trying to copy a damaged variable-width field and I don't think index entries get copied anywhere in a normal query. The query pg_dump is complaining about looks at pg_attribute and pg_type entries for a particular table --- hmm, I notice that it's not fetching any variable-width fields, so it's not entirely clear where the error is coming from. If you are really lucky it could be that the corruption is actually in pg_statistic, and it's failing when the planner tries to estimate row counts for the query. That would be really lucky because that's all discardable data. Try "DELETE FROM pg_statistic" as superuser and see if you can dump then. > 1. Have you had crashes or other hardware problems recently? Indeed. regards, tom lane
Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: >> Alain Peyrat wrote: >>> Initial problem: >>> >>> # pg_dump -O dbname -Ft -f /tmp/database.tar >>> pg_dump: query to get table columns failed: ERROR: invalid memory alloc >>> request size 9000688640 >>> >>> After some research, it seems to be related to a corruption of the >>> database. Running a vacum crashes the db: >>> >>> -bash-3.00$ vacuumdb -z -a >>> vacuumdb: vacuuming database "template1" >>> vacuumdb: vacuuming of database "template1" failed: PANIC: corrupted >>> item pointer: offset = 3336, size = 20 >> It would be nice if it's just template1 that is damaged, but I'm not >> sure that's the case. > > This looks pretty bad, because the above already proves corruption in two > different databases --- there is something wrong somewhere in template1, > and something else wrong somewhere in "dbname" (unless that is actually > template1). It seems likely that there's been widespread damage from > some hardware or filesystem-level misfortune. > > FWIW, a look in the source code shows that the 'corrupted item pointer' > message comes only from PageIndexTupleDelete, so that indicates a > damaged index which should be fixable by reindexing. Tom - could it be damage to a shared system-catalogue, and template1 just the first DB to be vacuumed? It just strikes me as odd that an index on template1 would be corrupted - assuming it's your typical empty template1 and is just being connected to during DB creation etc. Unless, of course, you're looking at genuine on-disk corruption of unused files/inodes in which it could be anything. Ick :-( -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > Tom Lane wrote: >> FWIW, a look in the source code shows that the 'corrupted item pointer' >> message comes only from PageIndexTupleDelete, so that indicates a >> damaged index which should be fixable by reindexing. > Tom - could it be damage to a shared system-catalogue, and template1 > just the first DB to be vacuumed? Possible, but in any case the other error indicates an independent problem. > It just strikes me as odd that an > index on template1 would be corrupted - assuming it's your typical empty > template1 and is just being connected to during DB creation etc. Yeah, I was wondering about that too. PageIndexTupleDelete wouldn't even get called unless there was something to delete, which indicates a table that's been changed. Is the OP in the habit of doing real work in template1? regards, tom lane
Richard Huxton a écrit : > Alain wrote: >> Hello, >> >> System: Red Hat Linux 4 64bits running postgres-7.4.16 (production) >> >> Initial problem: >> >> # pg_dump -O dbname -Ft -f /tmp/database.tar >> pg_dump: query to get table columns failed: ERROR: invalid memory >> alloc request size 9000688640 >> >> After some research, it seems to be related to a corruption of the >> database. Running a vacum crashes the db: >> >> -bash-3.00$ vacuumdb -z -a >> vacuumdb: vacuuming database "template1" >> vacuumdb: vacuuming of database "template1" failed: PANIC: corrupted >> item pointer: offset = 3336, size = 20 > > It would be nice if it's just template1 that is damaged, but I'm not > sure that's the case. First, thank you Richard and Tom for helping me but I finally decided to restore the last backup so now, the problem is no longer critical. Anyway, being able to rescue some datas may be interesting. The strange point is that the database was running quite well for simple queries (nothing really visible but no backup possible). > 1. Have you had crashes or other hardware problems recently? No crash but we changed our server (<= seems the cause). First try was using a file system copy to reduce downtime as it was two same 7.4.x version but the result was not working (maybe related to architecture change 32bits => 64 bits) so I finally dropped the db and performed an dump/restore. I think, the db was a mix of 32/64 bits files. > 2. Can you vacuum the other databases? No. > 3. Can you just dump the schema for your selected database with > --schema-only? (your pg_dump seemed to fail fetching column details) No. > 4. Can you dump individual tables with --table=? (it might just be one > table that's damaged) No. > > http://www.postgresql.org/docs/7.4/static/app-pgdump.html > >> Can someone help me to track down the problem ? >> >> Can I recover the datas (last backup failed) ? > > How much can be recovered will depend on what corruption has occurred. > If it's just a damaged index, then reindexing will fix it. > If it's a damaged system catalogue we might be able to fix the catalogue > then read the data. > If data files are damaged we'll need to locate the damage and work > around it. You will probably lose data on any damaged pages. >
AlJeux wrote: > Richard Huxton a écrit : >> 1. Have you had crashes or other hardware problems recently? > > No crash but we changed our server (<= seems the cause). > > First try was using a file system copy to reduce downtime as it was two > same 7.4.x version but the result was not working (maybe related to > architecture change 32bits => 64 bits) so I finally dropped the db and > performed an dump/restore. I think, the db was a mix of 32/64 bits files. Ah! That'll do it. You will get problems with a filesystem copy if 1. The database is running. 2. The architectures aren't identical 3. The installation options are different (a simple date option difference can cause problems) I'm surprised you didn't get more problems frankly. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: >> First try was using a file system copy to reduce downtime as it was two >> same 7.4.x version but the result was not working (maybe related to >> architecture change 32bits => 64 bits) so I finally dropped the db and >> performed an dump/restore. I think, the db was a mix of 32/64 bits files. > Ah! That'll do it. You will get problems with a filesystem copy if > 1. The database is running. > 2. The architectures aren't identical > 3. The installation options are different (a simple date option > difference can cause problems) PG 8.1 and later record MAXALIGN in pg_control.h, and will refuse to start up if there is a 32/64 bit compatibility problem. But 7.4 has no such defense. It does check for --enable-integer-datetimes compatibility, though. regards, tom lane