Обсуждение: Data corruption zero a file - help!!

Поиск
Список
Период
Сортировка

Data corruption zero a file - help!!

От
Noel Faux
Дата:
Hi all,

I posted this on the novice mailing list and as yet had no response, hopefully someone here can help.

While we where trying to do a vacuum / pg_dump we encountered the following error:

postgres@db:~$ pg_dumpall -d > dump.pg
pg_dump: dumpClasses(): SQL command failed
pg_dump: Error message from server: ERROR:  invalid page header in block
9022921 of relation "gap"
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
pg_dumpall: pg_dump failed on database "monashprotein", exiting

Now after doing some searches I managed to work out that the data corruption starts at 902292.137
using this sql:
SELECT * FROM gap WHERE ctid = '(902292,$x)'
Where $x I changed from 1-150.

as mentioned on this post:http://archives.postgresql.org/pgsql-general/2005-11/msg01148.php

Following this post it seems all we need to do is re-zero from this point on. However we're not sure which file to do this in.

I've worked out the database/relation files are
$PGDATA/37958/111685332.* with the max * being 101.

Any help locating which file we need to do the re-zero thing would be really appreciated.

Cheers
Noel

Вложения

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Wed, Mar 01, 2006 at 04:12:53PM +1100, Noel Faux wrote:
> Now after doing some searches I managed to work out that the data
> corruption starts at 902292.137
> using this sql:
> SELECT * FROM gap WHERE ctid = '(902292,$x)'
> Where $x I changed from 1-150.
>
> as mentioned on this
> post:http://archives.postgresql.org/pgsql-general/2005-11/msg01148.php
>
> Following this post it seems all we need to do is re-zero from this
> point on. However we're not sure which file to do this in.

This earlier message in that thread should be helpful:

http://archives.postgresql.org/pgsql-general/2005-11/msg01141.php

> I've worked out the database/relation files are
> $PGDATA/37958/111685332.* with the max * being 101.

Is your table really over 100G?  Anyway, if the block size is 8192
then 902292 sould be in the .6 file.  If you can spare the time
then you might run the dd and od commands that Tom Lane mentions
in the above message and post the output.  I think the command would
be

dd bs=8k skip=115860 count=1 if=$PGDATA/37958/111685332.6 | od -x

See Tom's message for how I arrived at .6 and 115860 (and verify
the math yourself).

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Tue, Feb 28, 2006 at 10:54:48PM -0700, Michael Fuhr wrote:
> Is your table really over 100G?  Anyway, if the block size is 8192
> then 902292 sould be in the .6 file.  If you can spare the time
> then you might run the dd and od commands that Tom Lane mentions
> in the above message and post the output.

pg_filedump is also handy for examining the backend's data files.

http://sources.redhat.com/rhdb/utilities.html

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
Thanks for the pointers Michael!

Which config file will tell us how big the bock sizes are?

Cheers
Noel
Michael Fuhr wrote:
On Wed, Mar 01, 2006 at 04:12:53PM +1100, Noel Faux wrote: 
Now after doing some searches I managed to work out that the data 
corruption starts at 902292.137
using this sql:
SELECT * FROM gap WHERE ctid = '(902292,$x)'
Where $x I changed from 1-150.

as mentioned on this 
post:http://archives.postgresql.org/pgsql-general/2005-11/msg01148.php

Following this post it seems all we need to do is re-zero from this 
point on. However we're not sure which file to do this in.   
This earlier message in that thread should be helpful:

http://archives.postgresql.org/pgsql-general/2005-11/msg01141.php
 
I've worked out the database/relation files are
$PGDATA/37958/111685332.* with the max * being 101.   
Is your table really over 100G?  Anyway, if the block size is 8192
then 902292 sould be in the .6 file.  If you can spare the time
then you might run the dd and od commands that Tom Lane mentions
in the above message and post the output.  I think the command would
be

dd bs=8k skip=115860 count=1 if=$PGDATA/37958/111685332.6 | od -x

See Tom's message for how I arrived at .6 and 115860 (and verify
the math yourself).
 

Вложения

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Fri, Mar 03, 2006 at 09:56:40AM +1100, Noel Faux wrote:
> Which config file will tell us how big the bock sizes are?

Run the query "SHOW block_size" in the database or use pg_controldata
from the shell.  It's probably 8192; changing it is done at compile time.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
> Is your table really over 100G?
Yeap 600+ million rows.
> Anyway, if the block size is 8192
> then 902292 sould be in the .6 file.  If you can spare the time
> then you might run the dd and od commands that Tom Lane mentions
> in the above message and post the output.
Here's the output:

0000000 002a 0000 2880 def5 0010 0000 0234 0240
0000020 2000 2001 9fc8 0062 9f90 0062 9f58 0062
0000040 9f20 0062 9ee8 0062 9eb0 0062 9e78 0062
0000060 9e40 0062 9e08 0062 9dd0 0062 9d98 0062
0000100 9d60 0062 9d28 0062 9cf0 0062 9cb8 0062
0000120 9c80 0062 9c48 0062 9c10 0062 9bd8 0062
0000140 9ba0 0062 9b68 0062 9b30 0062 9af8 0062
0000160 9ac0 0062 9a88 0062 9a50 0062 9a18 0062
0000200 99e0 0062 99a8 0062 9970 0062 9938 0062
0000220 9900 0062 98c8 0062 9890 0062 9858 0062
0000240 9820 0062 97e8 0062 97b0 0062 9778 0062
0000260 9740 0062 9708 0062 96d0 0062 9698 0062
0000300 9660 0062 9628 0062 95f0 0062 95b8 0062
0000320 9580 0062 9548 0062 9510 0062 94d8 0062
0000340 94a0 0062 9468 0062 9430 0062 93f8 0062
0000360 93c0 0062 9388 0062 9350 0062 9318 0062
0000400 92e0 0062 92a8 0062 9270 0062 9238 0062
0000420 9200 0062 91c8 0062 9190 0062 9158 0062
0000440 9120 0062 90e8 0062 90b0 0062 9078 0062
0000460 9040 0062 9008 0062 8fd0 0062 8f98 0062
0000500 8f60 0062 8f28 0062 8ef0 0062 8eb8 0062
0000520 8e80 0062 8e48 0062 8e10 0062 8dd8 0062
0000540 8da0 0062 8d68 0062 8d30 0062 8cf8 0062
0000560 8cc0 0062 8c88 0062 8c50 0062 8c18 0062
0000600 8be0 0062 8ba8 0062 8b70 0062 8b38 0062
0000620 8b00 0062 8ac8 0062 8a90 0062 8a58 0062
0000640 8a20 0062 89e8 0062 89b0 0062 8978 0062
0000660 8940 0062 8908 0062 88d0 0062 8898 0062
0000700 8860 0062 8828 0062 87f0 0062 87b8 0062
0000720 8780 0062 8748 0062 8710 0062 86d8 0062
0000740 86a0 0062 8668 0062 8630 0062 85f8 0062
0000760 85c0 0062 8588 0062 8550 0062 8518 0062
0001000 84e0 0062 84a8 0062 8470 0062 8438 0062
0001020 8400 0062 83c8 0062 8390 0062 8358 0062
0001040 8320 0062 82e8 0062 82b0 0062 8278 0062
0001060 8240 0062 0000 0000 0000 0000 0000 0000
0001100 0002 0000 0000 0000 0000 0000 000d c494
0001120 0088 0005 0910 0020 0000 0000 c93c 303b
0001140 bbc9 07ed 41be 0266 0064 0000 000c 0000
0001160 0000 0000 0000 0000 0002 0000 0000 0000
0001200 0000 0000 000d c494 0087 0005 0910 0020
0001220 0000 0000 c93b 303b bbc6 07ed 41be 0266
0001240 001c 0000 0002 0000 0000 0000 0000 0000
0001260 0002 0000 0000 0000 0000 0000 000d c494
0001300 0086 0005 0910 0020 0000 0000 c93a 303b
0001320 bbc0 07ed 41bb 0266 0064 0000 0002 0000
0001340 0000 0000 0000 0000 0002 0000 0000 0000
0001360 0000 0000 000d c494 0085 0005 0910 0020
0001400 0000 0000 c939 303b bbbd 07ed 41bb 0266
0001420 0031 0000 0001 0000 0000 0000 0000 0000
0001440 0002 0000 0000 0000 0000 0000 000d c494
0001460 0084 0005 0910 0020 0000 0000 c938 303b
0001500 bbba 07ed 41ba 0266 0062 0000 0004 0000
0001520 0001 0000 0000 0000 0002 0000 0000 0000
0001540 0000 0000 000d c494 0083 0005 0910 0020
0001560 0000 0000 c937 303b bbb8 07ed 41ba 0266
0001600 0037 0000 0001 0000 0000 0000 0000 0000
0001620 0002 0000 0000 0000 0000 0000 000d c494
0001640 0082 0005 0910 0020 0000 0000 c936 303b
0001660 bbb3 07ed 41b7 0266 006a 0000 0004 0000
0001700 0000 0000 0000 0000 0002 0000 0000 0000
0001720 0000 0000 000d c494 0081 0005 0910 0020
0001740 0000 0000 c935 303b bbb2 07ed 41b7 0266
0001760 0020 0000 0001 0000 0000 0000 0000 0000
0002000 0002 0000 0000 0000 0000 0000 000d c494
0002020 0080 0005 0910 0020 0000 0000 c934 303b
0002040 bbae 07ed 41b6 0266 0062 0000 0003 0000
0002060 0000 0000 0000 0000 0002 0000 0000 0000
0002100 0000 0000 000d c494 007f 0005 0910 0020
0002120 0000 0000 c933 303b bbaa 07ed 41b3 0266
0002140 0023 0000 0002 0000 0000 0000 0000 0000
0002160 0002 0000 0000 0000 0000 0000 000d c494
0002200 007e 0005 0910 0020 0000 0000 c932 303b
0002220 bba6 07ed 41b2 0266 0064 0000 0002 0000
0002240 0000 0000 0000 0000 0002 0000 0000 0000
0002260 0000 0000 000d c494 007d 0005 0910 0020
0002300 0000 0000 c931 303b bba3 07ed 41b2 0266
0002320 0036 0000 0001 0000 0000 0000 0000 0000
0002340 0002 0000 0000 0000 0000 0000 000d c494
0002360 007c 0005 0910 0020 0000 0000 c930 303b
0002400 bba0 07ed 41b0 0266 006a 0000 000b 0000
0002420 0000 0000 0000 0000 0002 0000 0000 0000
0002440 0000 0000 000d c494 007b 0005 0910 0020
0002460 0000 0000 c92f 303b bb9d 07ed 41b0 0266
0002500 0039 0000 0001 0000 0000 0000 0000 0000
0002520 0002 0000 0000 0000 0000 0000 000d c494
0002540 007a 0005 0910 0020 0000 0000 c92e 303b
0002560 bb98 07ed 41ad 0266 006a 0000 000b 0000
0002600 0000 0000 0000 0000 0002 0000 0000 0000
0002620 0000 0000 000d c494 0079 0005 0910 0020
0002640 0000 0000 c92d 303b bb97 07ed 41ad 0266
0002660 0039 0000 0001 0000 0000 0000 0000 0000
0002700 0002 0000 0000 0000 0000 0000 000d c494
0002720 0078 0005 0910 0020 0000 0000 c92c 303b
0002740 bb94 07ed 41ac 0266 0065 0000 0001 0000
0002760 0000 0000 0000 0000 0002 0000 0000 0000
0003000 0000 0000 000d c494 0077 0005 0910 0020
0003020 0000 0000 c92b 303b bb91 07ed 41ac 0266
0003040 0039 0000 0001 0000 0000 0000 0000 0000
0003060 0002 0000 0000 0000 0000 0000 000d c494
0003100 0076 0005 0910 0020 0000 0000 c92a 303b
0003120 bb8d 07ed 41a9 0266 0065 0000 0001 0000
0003140 0000 0000 0000 0000 0002 0000 0000 0000
0003160 0000 0000 000d c494 0075 0005 0910 0020
0003200 0000 0000 c929 303b bb8b 07ed 41a9 0266
0003220 0039 0000 0001 0000 0000 0000 0000 0000
0003240 0002 0000 0000 0000 0000 0000 000d c494
0003260 0074 0005 0910 0020 0000 0000 c928 303b
0003300 bb86 07ed 41a6 0266 0065 0000 0001 0000
0003320 0000 0000 0000 0000 0002 0000 0000 0000
0003340 0000 0000 000d c494 0073 0005 0910 0020
0003360 0000 0000 c927 303b bb85 07ed 41a6 0266
0003400 0039 0000 0001 0000 0000 0000 0000 0000
0003420 0002 0000 0000 0000 0000 0000 000d c494
0003440 0072 0005 0910 0020 0000 0000 c926 303b
0003460 bb7f 07ed 41a3 0266 0065 0000 0001 0000
0003500 0000 0000 0000 0000 0002 0000 0000 0000
0003520 0000 0000 000d c494 0071 0005 0910 0020
0003540 0000 0000 c925 303b bb7d 07ed 41a3 0266
0003560 0039 0000 0001 0000 0000 0000 0000 0000
0003600 0002 0000 0000 0000 0000 0000 000d c494
0003620 0070 0005 0910 0020 0000 0000 c924 303b
0003640 bb76 07ed 41a1 0266 0068 0000 0004 0000
0003660 0000 0000 0000 0000 0002 0000 0000 0000
0003700 0000 0000 000d c494 006f 0005 0910 0020
0003720 0000 0000 c923 303b bb74 07ed 41a1 0266
0003740 0034 0000 0001 0000 0000 0000 0000 0000
0003760 0002 0000 0000 0000 0000 0000 000d c494
0004000 006e 0005 0910 0020 0000 0000 c922 303b
0004020 bb70 07ed 419d 0266 0050 0000 0001 0000
0004040 0001 0000 0000 0000 0002 0000 0000 0000
0004060 0000 0000 000d c494 006d 0005 0910 0020
0004100 0000 0000 c921 303b bb6e 07ed 419d 0266
0004120 0066 0000 0001 0000 0000 0000 0000 0000
0004140 0002 0000 0000 0000 0000 0000 000d c494
0004160 006c 0005 0910 0020 0000 0000 c920 303b
0004200 bb6a 07ed 419b 0266 0065 0000 0001 0000
0004220 0000 0000 0000 0000 0002 0000 0000 0000
0004240 0000 0000 000d c494 006b 0005 0910 0020
0004260 0000 0000 c91f 303b bb67 07ed 419b 0266
0004300 0038 0000 0001 0000 0000 0000 0000 0000
0004320 0002 0000 0000 0000 0000 0000 000d c494
0004340 006a 0005 0910 0020 0000 0000 c91e 303b
0004360 bb64 07ed 4199 0266 0069 0000 0001 0000
0004400 0001 0000 0000 0000 0002 0000 0000 0000
0004420 0000 0000 000d c494 0069 0005 0910 0020
0004440 0000 0000 c91d 303b bb61 07ed 4199 0266
0004460 0060 0000 0005 0000 0000 0000 0000 0000
0004500 0002 0000 0000 0000 0000 0000 000d c494
0004520 0068 0005 0910 0020 0000 0000 c91c 303b
0004540 bb5c 07ed 4196 0266 00f4 0000 0001 0000
0004560 0001 0000 0000 0000 0002 0000 0000 0000
0004600 0000 0000 000d c494 0067 0005 0910 0020
0004620 0000 0000 c91b 303b bb5b 07ed 4196 0266
0004640 00ce 0000 0006 0000 0001 0000 0000 0000
0004660 0002 0000 0000 0000 0000 0000 000d c494
0004700 0066 0005 0910 0020 0000 0000 c91a 303b
0004720 bb59 07ed 4196 0266 00a9 0000 0002 0000
0004740 0001 0000 0000 0000 0002 0000 0000 0000
0004760 0000 0000 000d c494 0065 0005 0910 0020
0005000 0000 0000 c919 303b bb57 07ed 4196 0266
0005020 000c 0000 0001 0000 0000 0000 0000 0000
0005040 0002 0000 0000 0000 0000 0000 000d c494
0005060 0064 0005 0910 0020 0000 0000 c918 303b
0005100 bb51 07ed 4195 0266 0065 0000 0004 0000
0005120 0000 0000 0000 0000 0002 0000 0000 0000
0005140 0000 0000 000d c494 0063 0005 0910 0020
0005160 0000 0000 c917 303b bb4f 07ed 4195 0266
0005200 0035 0000 0001 0000 0000 0000 0000 0000
0005220 0002 0000 0000 0000 0000 0000 000d c494
0005240 0062 0005 0910 0020 0000 0000 c916 303b
0005260 bb4d 07ed 4192 0266 0063 0000 0004 0000
0005300 0000 0000 0000 0000 0002 0000 0000 0000
0005320 0000 0000 000d c494 0061 0005 0910 0020
0005340 0000 0000 c915 303b bb4a 07ed 4192 0266
0005360 003a 0000 0001 0000 0000 0000 0000 0000
0005400 0002 0000 0000 0000 0000 0000 000d c494
0005420 0060 0005 0910 0020 0000 0000 c914 303b
0005440 bb45 07ed 418f 0266 0067 0000 0007 0000
0005460 0000 0000 0000 0000 0002 0000 0000 0000
0005500 0000 0000 000d c494 005f 0005 0910 0020
0005520 0000 0000 c913 303b bb43 07ed 418f 0266
0005540 0019 0000 0002 0000 0000 0000 0000 0000
0005560 0002 0000 0000 0000 0000 0000 000d c494
0005600 005e 0005 0910 0020 0000 0000 c912 303b
0005620 bb3f 07ed 418d 0266 0066 0000 0003 0000
0005640 0000 0000 0000 0000 0002 0000 0000 0000
0005660 0000 0000 000d c494 005d 0005 0910 0020
0005700 0000 0000 c911 303b bb3d 07ed 418d 0266
0005720 0039 0000 0003 0000 0000 0000 0000 0000
0005740 0002 0000 0000 0000 0000 0000 000d c494
0005760 005c 0005 0910 0020 0000 0000 c910 303b
0006000 bb39 07ed 418a 0266 0063 0000 0003 0000
0006020 0000 0000 0000 0000 0002 0000 0000 0000
0006040 0000 0000 000d c494 005b 0005 0910 0020
0006060 0000 0000 c90f 303b bb31 07ed 4188 0266
0006100 0065 0000 000a 0000 0000 0000 0000 0000
0006120 0002 0000 0000 0000 0000 0000 000d c494
0006140 005a 0005 0910 0020 0000 0000 c90e 303b
0006160 bb2e 07ed 4188 0266 0035 0000 0001 0000
0006200 0000 0000 0000 0000 0002 0000 0000 0000
0006220 0000 0000 000d c494 0059 0005 0910 0020
0006240 0000 0000 c90d 303b bb29 07ed 4186 0266
0006260 0067 0000 0001 0000 0000 0000 0000 0000
0006300 0002 0000 0000 0000 0000 0000 000d c494
0006320 0058 0005 0910 0020 0000 0000 c90c 303b
0006340 bb26 07ed 4186 0266 0037 0000 0001 0000
0006360 0000 0000 0000 0000 0002 0000 0000 0000
0006400 0000 0000 000d c494 0057 0005 0910 0020
0006420 0000 0000 c90b 303b bb22 07ed 4183 0266
0006440 0065 0000 0003 0000 0000 0000 0000 0000
0006460 0002 0000 0000 0000 0000 0000 000d c494
0006500 0056 0005 0910 0020 0000 0000 c90a 303b
0006520 bb1f 07ed 4183 0266 0039 0000 0001 0000
0006540 0000 0000 0000 0000 0002 0000 0000 0000
0006560 0000 0000 000d c494 0055 0005 0910 0020
0006600 0000 0000 c909 303b bb1c 07ed 4181 0266
0006620 0039 0000 0001 0000 0000 0000 0000 0000
0006640 0002 0000 0000 0000 0000 0000 000d c494
0006660 0054 0005 0910 0020 0000 0000 c908 303b
0006700 bb19 07ed 417e 0266 0037 0000 0001 0000
0006720 0000 0000 0000 0000 0002 0000 0000 0000
0006740 0000 0000 000d c494 0053 0005 0910 0020
0006760 0000 0000 c907 303b bb14 07ed 417c 0266
0007000 0063 0000 0003 0000 0000 0000 0000 0000
0007020 0002 0000 0000 0000 0000 0000 000d c494
0007040 0052 0005 0910 0020 0000 0000 c906 303b
0007060 bb11 07ed 417c 0266 0039 0000 0001 0000
0007100 0000 0000 0000 0000 0002 0000 0000 0000
0007120 0000 0000 000d c494 0051 0005 0910 0020
0007140 0000 0000 c905 303b bb0d 07ed 417b 0266
0007160 0064 0000 0003 0000 0000 0000 0000 0000
0007200 0002 0000 0000 0000 0000 0000 000d c494
0007220 0050 0005 0910 0020 0000 0000 c904 303b
0007240 bb0a 07ed 417b 0266 0035 0000 0001 0000
0007260 0000 0000 0000 0000 0002 0000 0000 0000
0007300 0000 0000 000d c494 004f 0005 0910 0020
0007320 0000 0000 c903 303b bb06 07ed 4178 0266
0007340 006a 0000 0004 0000 0000 0000 0000 0000
0007360 0002 0000 0000 0000 0000 0000 000d c494
0007400 004e 0005 0910 0020 0000 0000 c902 303b
0007420 bb04 07ed 4178 0266 0033 0000 0002 0000
0007440 0000 0000 0000 0000 0002 0000 0000 0000
0007460 0000 0000 000d c494 004d 0005 0910 0020
0007500 0000 0000 c901 303b bb01 07ed 4178 0266
0007520 0021 0000 0002 0000 0000 0000 0000 0000
0007540 0002 0000 0000 0000 0000 0000 000d c494
0007560 004c 0005 0910 0020 0000 0000 c900 303b
0007600 bafd 07ed 4176 0266 0036 0000 0001 0000
0007620 0000 0000 0000 0000 0002 0000 0000 0000
0007640 0000 0000 000d c494 004b 0005 0910 0020
0007660 0000 0000 c8ff 303b baf7 07ed 4174 0266
0007700 0031 0000 0001 0000 0000 0000 0000 0000
0007720 0002 0000 0000 0000 0000 0000 000d c494
0007740 004a 0005 0910 0020 0000 0000 c8fe 303b
0007760 baf5 07ed 4172 0266 0039 0000 0001 0000
0010000 0000 0000 0000 0000 0002 0000 0000 0000
0010020 0000 0000 000d c494 0049 0005 0910 0020
0010040 0000 0000 c8fd 303b baef 07ed 416f 0266
0010060 0064 0000 000c 0000 0000 0000 0000 0000
0010100 0002 0000 0000 0000 0000 0000 000d c494
0010120 0048 0005 0910 0020 0000 0000 c8fc 303b
0010140 baee 07ed 416f 0266 0018 0000 0002 0000
0010160 0000 0000 0000 0000 0002 0000 0000 0000
0010200 0000 0000 000d c494 0047 0005 0910 0020
0010220 0000 0000 c8fb 303b baea 07ed 416e 0266
0010240 0037 0000 0001 0000 0000 0000 0000 0000
0010260 0002 0000 0000 0000 0000 0000 000d c494
0010300 0046 0005 0910 0020 0000 0000 c8fa 303b
0010320 bae5 07ed 416c 0266 0055 0000 0001 0000
0010340 0001 0000 0000 0000 0002 0000 0000 0000
0010360 0000 0000 000d c494 0045 0005 0910 0020
0010400 0000 0000 c8f9 303b bae3 07ed 416c 0266
0010420 0038 0000 0001 0000 0000 0000 0000 0000
0010440 0002 0000 0000 0000 0000 0000 000d c494
0010460 0044 0005 0910 0020 0000 0000 c8f8 303b
0010500 badf 07ed 416a 0266 0039 0000 0003 0000
0010520 0000 0000 0000 0000 0002 0000 0000 0000
0010540 0000 0000 000d c494 0043 0005 0910 0020
0010560 0000 0000 c8f7 303b badb 07ed 4168 0266
0010600 0062 0000 0002 0000 0000 0000 0000 0000
0010620 0002 0000 0000 0000 0000 0000 000d c494
0010640 0042 0005 0910 0020 0000 0000 c8f6 303b
0010660 bad6 07ed 4166 0266 0036 0000 0001 0000
0010700 0000 0000 0000 0000 0002 0000 0000 0000
0010720 0000 0000 000d c494 0041 0005 0910 0020
0010740 0000 0000 c8f5 303b bad2 07ed 4164 0266
0010760 0064 0000 0002 0000 0000 0000 0000 0000
0011000 0002 0000 0000 0000 0000 0000 000d c494
0011020 0040 0005 0910 0020 0000 0000 c8f4 303b
0011040 bacf 07ed 4164 0266 001f 0000 0001 0000
0011060 0000 0000 0000 0000 0002 0000 0000 0000
0011100 0000 0000 000d c494 003f 0005 0910 0020
0011120 0000 0000 c8f3 303b baca 07ed 4162 0266
0011140 0037 0000 0001 0000 0000 0000 0000 0000
0011160 0002 0000 0000 0000 0000 0000 000d c494
0011200 003e 0005 0910 0020 0000 0000 c8f2 303b
0011220 bac5 07ed 4160 0266 0067 0000 0004 0000
0011240 0000 0000 0000 0000 0002 0000 0000 0000
0011260 0000 0000 000d c494 003d 0005 0910 0020
0011300 0000 0000 c8f1 303b bac1 07ed 415e 0266
0011320 0037 0000 0001 0000 0000 0000 0000 0000
0011340 0002 0000 0000 0000 0000 0000 000d c494
0011360 003c 0005 0910 0020 0000 0000 c8f0 303b
0011400 babc 07ed 415b 0266 0066 0000 0006 0000
0011420 0000 0000 0000 0000 0002 0000 0000 0000
0011440 0000 0000 000d c494 003b 0005 0910 0020
0011460 0000 0000 c8ef 303b baba 07ed 415b 0266
0011500 0039 0000 0001 0000 0000 0000 0000 0000
0011520 0002 0000 0000 0000 0000 0000 000d c494
0011540 003a 0005 0910 0020 0000 0000 c8ee 303b
0011560 bab7 07ed 415a 0266 0064 0000 000b 0000
0011600 0000 0000 0000 0000 0002 0000 0000 0000
0011620 0000 0000 000d c494 0039 0005 0910 0020
0011640 0000 0000 c8ed 303b bab4 07ed 415a 0266
0011660 0020 0000 0002 0000 0000 0000 0000 0000
0011700 0002 0000 0000 0000 0000 0000 000d c494
0011720 0038 0005 0910 0020 0000 0000 c8ec 303b
0011740 bab0 07ed 4158 0266 0063 0000 0003 0000
0011760 0000 0000 0000 0000 0002 0000 0000 0000
0012000 0000 0000 000d c494 0037 0005 0910 0020
0012020 0000 0000 c8eb 303b baaa 07ed 4153 0266
0012040 0062 0000 0002 0000 0000 0000 0000 0000
0012060 0002 0000 0000 0000 0000 0000 000d c494
0012100 0036 0005 0910 0020 0000 0000 c8ea 303b
0012120 baa8 07ed 4153 0266 0037 0000 0001 0000
0012140 0000 0000 0000 0000 0002 0000 0000 0000
0012160 0000 0000 000d c494 0035 0005 0910 0020
0012200 0000 0000 c8e9 303b baa2 07ed 4150 0266
0012220 0065 0000 000a 0000 0000 0000 0000 0000
0012240 0002 0000 0000 0000 0000 0000 000d c494
0012260 0034 0005 0910 0020 0000 0000 c8e8 303b
0012300 ba9f 07ed 414f 0266 0065 0000 0002 0000
0012320 0000 0000 0000 0000 0002 0000 0000 0000
0012340 0000 0000 000d c494 0033 0005 0910 0020
0012360 0000 0000 c8e7 303b ba9c 07ed 414f 0266
0012400 0034 0000 0001 0000 0000 0000 0000 0000
0012420 0002 0000 0000 0000 0000 0000 000d c494
0012440 0032 0005 0910 0020 0000 0000 c8e6 303b
0012460 ba98 07ed 414d 0266 006b 0000 0002 0000
0012500 0000 0000 0000 0000 0002 0000 0000 0000
0012520 0000 0000 000d c494 0031 0005 0910 0020
0012540 0000 0000 c8e5 303b ba95 07ed 414d 0266
0012560 0037 0000 0001 0000 0000 0000 0000 0000
0012600 0002 0000 0000 0000 0000 0000 000d c494
0012620 0030 0005 0910 0020 0000 0000 c8e4 303b
0012640 ba91 07ed 414b 0266 0066 0000 0002 0000
0012660 0001 0000 0000 0000 0002 0000 0000 0000
0012700 0000 0000 000d c494 002f 0005 0910 0020
0012720 0000 0000 c8e3 303b ba8d 07ed 4149 0266
0012740 0064 0000 0003 0000 0001 0000 0000 0000
0012760 0002 0000 0000 0000 0000 0000 000d c494
0013000 002e 0005 0910 0020 0000 0000 c8e2 303b
0013020 ba89 07ed 4148 0266 0037 0000 0001 0000
0013040 0000 0000 0000 0000 0002 0000 0000 0000
0013060 0000 0000 000d c494 002d 0005 0910 0020
0013100 0000 0000 c8e1 303b ba85 07ed 4145 0266
0013120 005e 0000 0003 0000 0001 0000 0000 0000
0013140 0002 0000 0000 0000 0000 0000 000d c494
0013160 002c 0005 0910 0020 0000 0000 c8e0 303b
0013200 ba7f 07ed 4142 0266 0068 0000 0009 0000
0013220 0000 0000 0000 0000 0002 0000 0000 0000
0013240 0000 0000 000d c494 002b 0005 0910 0020
0013260 0000 0000 c8df 303b ba7d 07ed 4142 0266
0013300 0019 0000 0002 0000 0000 0000 0000 0000
0013320 0002 0000 0000 0000 0000 0000 000d c494
0013340 002a 0005 0910 0020 0000 0000 c8de 303b
0013360 ba79 07ed 4140 0266 0062 0000 0008 0000
0013400 0000 0000 0000 0000 0002 0000 0000 0000
0013420 0000 0000 000d c494 0029 0005 0910 0020
0013440 0000 0000 c8dd 303b ba76 07ed 4140 0266
0013460 0039 0000 0001 0000 0000 0000 0000 0000
0013500 0002 0000 0000 0000 0000 0000 000d c494
0013520 0028 0005 0910 0020 0000 0000 c8dc 303b
0013540 ba72 07ed 413f 0266 0067 0000 0004 0000
0013560 0000 0000 0000 0000 0002 0000 0000 0000
0013600 0000 0000 000d c494 0027 0005 0910 0020
0013620 0000 0000 c8db 303b ba6a 07ed 4139 0266
0013640 0063 0000 0004 0000 0000 0000 0000 0000
0013660 0002 0000 0000 0000 0000 0000 000d c494
0013700 0026 0005 0910 0020 0000 0000 c8da 303b
0013720 ba69 07ed 4139 0266 0037 0000 0001 0000
0013740 0000 0000 0000 0000 0002 0000 0000 0000
0013760 0000 0000 000d c494 0025 0005 0910 0020
0014000 0000 0000 c8d9 303b ba63 07ed 4137 0266
0014020 0038 0000 0001 0000 0000 0000 0000 0000
0014040 0002 0000 0000 0000 0000 0000 000d c494
0014060 0024 0005 0910 0020 0000 0000 c8d8 303b
0014100 ba5f 07ed 4135 0266 0035 0000 0001 0000
0014120 0000 0000 0000 0000 0002 0000 0000 0000
0014140 0000 0000 000d c494 0023 0005 0910 0020
0014160 0000 0000 c8d7 303b ba5c 07ed 4132 0266
0014200 0036 0000 0001 0000 0000 0000 0000 0000
0014220 0002 0000 0000 0000 0000 0000 000d c494
0014240 0022 0005 0910 0020 0000 0000 c8d6 303b
0014260 ba56 07ed 412f 0266 0067 0000 0002 0000
0014300 0000 0000 0000 0000 0002 0000 0000 0000
0014320 0000 0000 000d c494 0021 0005 0910 0020
0014340 0000 0000 c8d5 303b ba55 07ed 412f 0266
0014360 0036 0000 0001 0000 0000 0000 0000 0000
0014400 0002 0000 0000 0000 0000 0000 000d c494
0014420 0020 0005 0910 0020 0000 0000 c8d4 303b
0014440 ba51 07ed 412c 0266 0067 0000 0002 0000
0014460 0000 0000 0000 0000 0002 0000 0000 0000
0014500 0000 0000 000d c494 001f 0005 0910 0020
0014520 0000 0000 c8d3 303b ba4e 07ed 412c 0266
0014540 0037 0000 0001 0000 0000 0000 0000 0000
0014560 0002 0000 0000 0000 0000 0000 000d c494
0014600 001e 0005 0910 0020 0000 0000 c8d2 303b
0014620 ba4b 07ed 412a 0266 0066 0000 0004 0000
0014640 0000 0000 0000 0000 0002 0000 0000 0000
0014660 0000 0000 000d c494 001d 0005 0910 0020
0014700 0000 0000 c8d1 303b ba48 07ed 412a 0266
0014720 0038 0000 0001 0000 0000 0000 0000 0000
0014740 0002 0000 0000 0000 0000 0000 000d c494
0014760 001c 0005 0910 0020 0000 0000 c8d0 303b
0015000 ba44 07ed 4127 0266 0069 0000 0003 0000
0015020 0000 0000 0000 0000 0002 0000 0000 0000
0015040 0000 0000 000d c494 001b 0005 0910 0020
0015060 0000 0000 c8cf 303b ba42 07ed 4127 0266
0015100 0039 0000 0003 0000 0000 0000 0000 0000
0015120 0002 0000 0000 0000 0000 0000 000d c494
0015140 001a 0005 0910 0020 0000 0000 c8ce 303b
0015160 ba3d 07ed 4124 0266 0068 0000 0002 0000
0015200 0000 0000 0000 0000 0002 0000 0000 0000
0015220 0000 0000 000d c494 0019 0005 0910 0020
0015240 0000 0000 c8cd 303b ba3b 07ed 4124 0266
0015260 0037 0000 0001 0000 0000 0000 0000 0000
0015300 0002 0000 0000 0000 0000 0000 000d c494
0015320 0018 0005 0910 0020 0000 0000 c8cc 303b
0015340 ba37 07ed 4122 0266 0067 0000 0001 0000
0015360 0000 0000 0000 0000 0002 0000 0000 0000
0015400 0000 0000 000d c494 0017 0005 0910 0020
0015420 0000 0000 c8cb 303b ba35 07ed 4122 0266
0015440 0037 0000 0001 0000 0000 0000 0000 0000
0015460 0002 0000 0000 0000 0000 0000 000d c494
0015500 0016 0005 0910 0020 0000 0000 c8ca 303b
0015520 ba2f 07ed 4121 0266 0039 0000 0001 0000
0015540 0000 0000 0000 0000 0002 0000 0000 0000
0015560 0000 0000 000d c494 0015 0005 0910 0020
0015600 0000 0000 c8c9 303b ba2d 07ed 411f 0266
0015620 0036 0000 0001 0000 0000 0000 0000 0000
0015640 0002 0000 0000 0000 0000 0000 000d c494
0015660 0014 0005 0910 0020 0000 0000 c8c8 303b
0015700 ba2a 07ed 411c 0266 0036 0000 0001 0000
0015720 0000 0000 0000 0000 0002 0000 0000 0000
0015740 0000 0000 000d c494 0013 0005 0910 0020
0015760 0000 0000 c8c7 303b ba26 07ed 411b 0266
0016000 0068 0000 0003 0000 0000 0000 0000 0000
0016020 0002 0000 0000 0000 0000 0000 000d c494
0016040 0012 0005 0910 0020 0000 0000 c8c6 303b
0016060 ba21 07ed 411b 0266 001e 0000 0002 0000
0016100 0000 0000 0000 0000 0002 0000 0000 0000
0016120 0000 0000 000d c494 0011 0005 0910 0020
0016140 0000 0000 c8c5 303b ba1f 07ed 4118 0266
0016160 0035 0000 0001 0000 0000 0000 0000 0000
0016200 0002 0000 0000 0000 0000 0000 000d c494
0016220 0010 0005 0910 0020 0000 0000 c8c4 303b
0016240 ba1b 07ed 4116 0266 0064 0000 0001 0000
0016260 0000 0000 0000 0000 0002 0000 0000 0000
0016300 0000 0000 000d c494 000f 0005 0910 0020
0016320 0000 0000 c8c3 303b ba19 07ed 4116 0266
0016340 0023 0000 0002 0000 0000 0000 0000 0000
0016360 0002 0000 0000 0000 0000 0000 000d c494
0016400 000e 0005 0910 0020 0000 0000 c8c2 303b
0016420 ba15 07ed 4114 0266 0063 0000 0001 0000
0016440 0000 0000 0000 0000 0002 0000 0000 0000
0016460 0000 0000 000d c494 000d 0005 0910 0020
0016500 0000 0000 c8c1 303b ba13 07ed 4114 0266
0016520 0036 0000 0001 0000 0000 0000 0000 0000
0016540 0002 0000 0000 0000 0000 0000 000d c494
0016560 000c 0005 0910 0020 0000 0000 c8c0 303b
0016600 ba0e 07ed 4112 0266 0036 0000 0001 0000
0016620 0000 0000 0000 0000 0002 0000 0000 0000
0016640 0000 0000 000d c494 000b 0005 0910 0020
0016660 0000 0000 c8bf 303b ba0b 07ed 410f 0266
0016700 0036 0000 0001 0000 0000 0000 0000 0000
0016720 0002 0000 0000 0000 0000 0000 000d c494
0016740 000a 0005 0910 0020 0000 0000 c8be 303b
0016760 ba07 07ed 410e 0266 0039 0000 0003 0000
0017000 0000 0000 0000 0000 0002 0000 0000 0000
0017020 0000 0000 000d c494 0009 0005 0910 0020
0017040 0000 0000 c8bd 303b ba03 07ed 410c 0266
0017060 000c 0000 0001 0000 0001 0000 0000 0000
0017100 0002 0000 0000 0000 0000 0000 000d c494
0017120 0008 0005 0910 0020 0000 0000 c8bc 303b
0017140 ba00 07ed 410c 0266 0037 0000 0001 0000
0017160 0000 0000 0000 0000 0002 0000 0000 0000
0017200 0000 0000 000d c494 0007 0005 0910 0020
0017220 0000 0000 c8bb 303b b9fd 07ed 4109 0266
0017240 0062 0000 0007 0000 0000 0000 0000 0000
0017260 0002 0000 0000 0000 0000 0000 000d c494
0017300 0006 0005 0910 0020 0000 0000 c8ba 303b
0017320 b9fb 07ed 4109 0266 003a 0000 0001 0000
0017340 0000 0000 0000 0000 0002 0000 0000 0000
0017360 0000 0000 000d c494 0005 0005 0910 0020
0017400 0000 0000 c8b9 303b b9f8 07ed 4107 0266
0017420 005f 0000 0001 0000 0001 0000 0000 0000
0017440 0002 0000 0000 0000 0000 0000 000d c494
0017460 0004 0005 0910 0020 0000 0000 c8b8 303b
0017500 b9f5 07ed 4104 0266 0064 0000 0005 0000
0017520 0000 0000 0000 0000 0002 0000 0000 0000
0017540 0000 0000 000d c494 0003 0005 0910 0020
0017560 0000 0000 c8b7 303b b9f3 07ed 4104 0266
0017600 0039 0000 0001 0000 0000 0000 0000 0000
0017620 0002 0000 0000 0000 0000 0000 000d c494
0017640 0002 0005 0910 0020 0000 0000 c8b6 303b
0017660 b9ee 07ed 4101 0266 0067 0000 0018 0000
0017700 0000 0000 0000 0000 0002 0000 0000 0000
0017720 0000 0000 000d c494 0001 0005 0910 0020
0017740 0000 0000 c8b5 303b b9ed 07ed 4101 0266
0017760 0035 0000 0003 0000 0000 0000 0000 0000
0020000


Being a complete novice I've got no idea what I'm looking at.
Is there anything in the above that looks out of place?
Is the next step similar to that of:
http://archives.postgresql.org/pgsql-general/2005-11/msg01150.php ?

Cheers
Noel

Вложения

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Mon, Mar 06, 2006 at 11:57:53AM +1100, Noel Faux wrote:
> >Anyway, if the block size is 8192
> >then 902292 sould be in the .6 file.  If you can spare the time
> >then you might run the dd and od commands that Tom Lane mentions
> >in the above message and post the output.
> Here's the output:

I just noticed that the error in your original message was:

> pg_dump: Error message from server: ERROR:  invalid page header in block
> 9022921 of relation "gap"

That's a different block than the one you said you found:

> Now after doing some searches I managed to work out that the data
> corruption starts at 902292.137 using this sql:
> SELECT * FROM gap WHERE ctid = '(902292,$x)'
> Where $x I changed from 1-150.

I suggested looking in the .6 file based on block 902292; if the
real bad block is 9022921 then I think it would be block 110025
in file .68 (again, check the math yourself).

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Mon, Mar 06, 2006 at 05:17:54PM +1100, Noel Faux wrote:
> dd bs=8k skip=115860 count=1
> if=/usr/local/postgresql/postgresql-7.4.8/data/base/37958/111685332.68 |
> od -x

Wrong block (115860) -- you used the number from my earlier message,
which was based on the bad block being 902292.  After noticing that
the error message said the bad block was 9022921 I corrected both
the file number and the block:

> >I suggested looking in the .6 file based on block 902292; if the
> >real bad block is 9022921 then I think it would be block 110025
> >in file .68 (again, check the math yourself).

Try skip=110025.  You can use pg_filedump to examine the block in
an easier-to-read format; since you're running 7.4 you'd need
pg_filedump 3.0.

http://sources.redhat.com/rhdb/utilities.html

Try running this command:

pg_filedump -if -R 110024 110026 /path/111685332.68

That should show the bad block (110025) and the block before and
after it.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Tue, Mar 07, 2006 at 01:41:44PM +1100, Noel Faux wrote:
> Here is the output from the pg_filedump; is there anything which looks
> suss and where would we re-zero the data, if that's the next step:
[...]
> Block 110025 ********************************************************
> <Header> -----
> Block Offset: 0x35b92000         Offsets: Lower       0 (0x0000)
> Block: Size    0  Version   24            Upper       2 (0x0002)
> LSN:  logid      0 recoff 0x00000000      Special     0 (0x0000)
> Items:    0                   Free Space:    2
> Length (including item array): 24
>
> Error: Invalid header information.
>
>  0000: 00000000 00000000 00000000 00000200  ................
>  0010: 00001800 af459a00                    .....E..
>
> <Data> ------
> Empty block - no items listed
>
> <Special Section> -----
> Error: Invalid special section encountered.
> Error: Special section points off page. Unable to dump contents.

Looks like we've successfully identified the bad block; contrast
these header values and the hex dump with the good blocks and you
can see at a glance that this one is different.  It might be
interesting to you (but probably not to us, so don't send the output)
to see if the block's contents are recognizable, as though they
came from some unrelated file (which might suggest an OS bug).
Check your local documentation to see what od/hd/hexdump/whatever
options will give you an ASCII dump and use dd to fetch the page
and pipe it into that command.  Try this (substitute the hd command
with whatever works on your system):

dd bs=8k skip=110025 count=1 if=/path/file | hd

Even if you don't care about the block's current contents, you might
want to redirect dd's output to a file to save a copy of the block
in case you do ever want to examine it further.  And it would be
prudent to verify that the data shown by the above dd command matches
the data in the pg_filedump output before doing anything destructive.

When you're ready to zero the file, shut down the postmaster and
run a command like the following (but keep reading before doing
so):

dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero of=/path/file

Before running that command I would strongly advise reading the dd
manual page on your system to make sure the options are correct and
that you understand them.  I'd also suggest practicing on a test
table: create a table, populate it with arbitrary data, pick a page
to zero, identify the file and block, run a command like the above,
and verify that the table is intact except for the missing block.
Make *sure* you know what you're doing and that the above command
works before running it -- if you botch it you might lose a 1G file
instead of an 8K block.

In one of his messages Tom Lane suggested vacuuming the table after
zeroing the bad block to see if vacuum discovers any other bad
blocks.  During the vacuum you should see a message like this:

WARNING:  relation "foo" page 110025 is uninitialized --- fixing

If you see any other errors or warnings then please post them.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
Ok it worked but we ran into another bad block :(
vacuumdb: vacuuming of database "monashprotein" failed: ERROR:  invalid
page header in block 9022937 of relation "gap"

So the command we used was:
dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero
of=/usr/local/postgresql/postgresql-7.4.8/data/base/37958/111685332.68

I'm tried to work out the formula for finding the file (i.e. the 111685332.*) to fix and the value to seek to, but as a complete novice I'm lost, any pointers would be a great help.  We checked the block size and it's 8192.

Cheers
Noel

Michael Fuhr wrote:
On Tue, Mar 07, 2006 at 01:41:44PM +1100, Noel Faux wrote: 
Here is the output from the pg_filedump; is there anything which looks 
suss and where would we re-zero the data, if that's the next step:   
[...] 
Block 110025 ********************************************************
<Header> -----
Block Offset: 0x35b92000         Offsets: Lower       0 (0x0000)
Block: Size    0  Version   24            Upper       2 (0x0002)
LSN:  logid      0 recoff 0x00000000      Special     0 (0x0000)
Items:    0                   Free Space:    2
Length (including item array): 24

Error: Invalid header information.
0000: 00000000 00000000 00000000 00000200  ................0010: 00001800 af459a00                    .....E..       

<Data> ------
Empty block - no items listed

<Special Section> -----
Error: Invalid special section encountered.
Error: Special section points off page. Unable to dump contents.   
Looks like we've successfully identified the bad block; contrast
these header values and the hex dump with the good blocks and you
can see at a glance that this one is different.  It might be
interesting to you (but probably not to us, so don't send the output)
to see if the block's contents are recognizable, as though they
came from some unrelated file (which might suggest an OS bug).
Check your local documentation to see what od/hd/hexdump/whatever
options will give you an ASCII dump and use dd to fetch the page
and pipe it into that command.  Try this (substitute the hd command
with whatever works on your system):

dd bs=8k skip=110025 count=1 if=/path/file | hd

Even if you don't care about the block's current contents, you might
want to redirect dd's output to a file to save a copy of the block
in case you do ever want to examine it further.  And it would be
prudent to verify that the data shown by the above dd command matches
the data in the pg_filedump output before doing anything destructive.

When you're ready to zero the file, shut down the postmaster and
run a command like the following (but keep reading before doing
so):

dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero of=/path/file

Before running that command I would strongly advise reading the dd
manual page on your system to make sure the options are correct and
that you understand them.  I'd also suggest practicing on a test
table: create a table, populate it with arbitrary data, pick a page
to zero, identify the file and block, run a command like the above,
and verify that the table is intact except for the missing block.
Make *sure* you know what you're doing and that the above command
works before running it -- if you botch it you might lose a 1G file
instead of an 8K block.

In one of his messages Tom Lane suggested vacuuming the table after
zeroing the bad block to see if vacuum discovers any other bad
blocks.  During the vacuum you should see a message like this:

WARNING:  relation "foo" page 110025 is uninitialized --- fixing

If you see any other errors or warnings then please post them.
 

Вложения

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Thu, Mar 09, 2006 at 11:13:40AM +1100, Noel Faux wrote:
> Ok it worked but we ran into another bad block :(
> /vacuumdb: vacuuming of database "monashprotein" failed: ERROR:  invalid
> page header in block 9022937 of relation "gap"
> /
> So the command we used was:
> dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero
> of=/usr/local/postgresql/postgresql-7.4.8/data/base/37958/111685332.68
>
> I'm tried to work out the formula for finding the file (i.e. the
> 111685332.*) to fix and the value to seek to, but as a complete novice
> I'm lost, any pointers would be a great help.  We checked the block size
> and it's 8192.

The database files are 1G, or 131072 8k blocks.  The bad block you
zeroed was 9022921; here's how you could have determined the file
and block number within that file:

test=> SELECT 9022921 / 131072 AS filenum, 9022921 % 131072 AS blocknum;
 filenum | blocknum
---------+----------
      68 |   110025
(1 row)

The new bad block is 9022937 so the query would be:

test=> SELECT 9022937 / 131072 AS filenum, 9022937 % 131072 AS blocknum;
 filenum | blocknum
---------+----------
      68 |   110041
(1 row)

If you're running 7.4.8 then consider upgrading to 7.4.12.  Offhand
I don't know if any bugs have been fixed that might cause the problem
you're seeing, but there have been other bug fixes.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
Thanks for all your help Michael, we wish to do a vacuum and dump before the upgrade to 8.02.  Do you believe this data corruption is a postgres issue of an OS / hardware issue?

Cheers
Noel

Michael Fuhr wrote:
On Thu, Mar 09, 2006 at 11:13:40AM +1100, Noel Faux wrote: 
Ok it worked but we ran into another bad block :(
/vacuumdb: vacuuming of database "monashprotein" failed: ERROR:  invalid
page header in block 9022937 of relation "gap"
/
So the command we used was:
dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero
of=/usr/local/postgresql/postgresql-7.4.8/data/base/37958/111685332.68

I'm tried to work out the formula for finding the file (i.e. the 
111685332.*) to fix and the value to seek to, but as a complete novice 
I'm lost, any pointers would be a great help.  We checked the block size 
and it's 8192.   
The database files are 1G, or 131072 8k blocks.  The bad block you
zeroed was 9022921; here's how you could have determined the file
and block number within that file:

test=> SELECT 9022921 / 131072 AS filenum, 9022921 % 131072 AS blocknum;filenum | blocknum 
---------+----------     68 |   110025
(1 row)

The new bad block is 9022937 so the query would be:

test=> SELECT 9022937 / 131072 AS filenum, 9022937 % 131072 AS blocknum;filenum | blocknum 
---------+----------     68 |   110041
(1 row)

If you're running 7.4.8 then consider upgrading to 7.4.12.  Offhand
I don't know if any bugs have been fixed that might cause the problem
you're seeing, but there have been other bug fixes.
 

Вложения

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
I've been watching the post: Re: [GENERAL] Fixing up a corrupted toast table
In there they mention deletion of the bad rows from the table based on the citid.  If I could come up with a def of a back row, would this work, or are there other issues?

Cheers
Noel

Michael Fuhr wrote:
On Thu, Mar 09, 2006 at 11:13:40AM +1100, Noel Faux wrote: 
Ok it worked but we ran into another bad block :(
/vacuumdb: vacuuming of database "monashprotein" failed: ERROR:  invalid
page header in block 9022937 of relation "gap"
/
So the command we used was:
dd bs=8k seek=110025 conv=notrunc count=1 if=/dev/zero
of=/usr/local/postgresql/postgresql-7.4.8/data/base/37958/111685332.68

I'm tried to work out the formula for finding the file (i.e. the 
111685332.*) to fix and the value to seek to, but as a complete novice 
I'm lost, any pointers would be a great help.  We checked the block size 
and it's 8192.   
The database files are 1G, or 131072 8k blocks.  The bad block you
zeroed was 9022921; here's how you could have determined the file
and block number within that file:

test=> SELECT 9022921 / 131072 AS filenum, 9022921 % 131072 AS blocknum;filenum | blocknum 
---------+----------     68 |   110025
(1 row)

The new bad block is 9022937 so the query would be:

test=> SELECT 9022937 / 131072 AS filenum, 9022937 % 131072 AS blocknum;filenum | blocknum 
---------+----------     68 |   110041
(1 row)

If you're running 7.4.8 then consider upgrading to 7.4.12.  Offhand
I don't know if any bugs have been fixed that might cause the problem
you're seeing, but there have been other bug fixes.
 

Вложения

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Thu, Mar 09, 2006 at 12:29:17PM +1100, Noel Faux wrote:
> Thanks for all your help Michael, we wish to do a vacuum and dump before
> the upgrade to 8.02.

8.0.7 and 8.1.3 are the latest versions in their respective branches;
those are the versions to run to get the latest bug fixes.

> Do you believe this data corruption is a postgres issue of an
> OS / hardware issue?

Beats me; it could be any or all of them.  Certain filesystem and
hardware configurations are more prone to data corruption than
others, especially in the event of a system crash, so those are
among the usual suspects.  One reason to look at the data in the
bad block is to see what's there: if you see data that obviously
came from outside the database then that would tend to exonerate
PostgreSQL.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Thu, Mar 09, 2006 at 12:37:52PM +1100, Noel Faux wrote:
> I've been watching the post: Re: [GENERAL] Fixing up a corrupted toast table
> In there they mention deletion of the bad rows from the table based on
> the citid.  If I could come up with a def of a back row, would this
> work, or are there other issues?

If you have a corrupt tuple within an otherwise good block then you
can try deleting that tuple, but if the block header is corrupt
then you have no way of addressing any of that block's tuples.
Errors implying a bad tuple include "missing chunk number" and
"invalid memory alloc request size"; but "invalid page header in
block" means the block itself is bad.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
Given that this seems problem has occurred a number of times for a number I've written a small step by step procedure to address this issue.  Is there any other comments you which to add.  I was thinking that this should be added to the FAQ / troubleshooting in the docs.

How to repair corrupted data due to "ERROR: invalid page header in block X of relation "Y":

CAUTION this will permanently remove the data defined in the bad block

   1. To identify which file(s) the relation is in:
         1. <THE SELECT STATEMENTS TO COLLECT THIS DATA> I can't remember how I did it, but will keep hunting for my notes :)
         2. so the file(s) are $PGDATA/<databaseDIR>/<relFile>.* The * is any number which is defined below
   2. To calculate the * value:
         1. SELECT <block> / 131072 AS filenum, <block> % 131072 AS blocknum;
            filenum    |  blocknum
         -----------+----------------
          <filenum> | <blocknum>
               1. 131072 comes from "each database file is 1G, or 131072 * 8k blocks"
               2. The block size is determined when compiling postgres
               3. Use "SHOW block_size in the database or use pg_controldata from the shell." to confirm this. The default is 8k.
   3. Now you need to re-zero this block using the following command:
         1. dd bs=8k seek=<blocknum> conv=notrunc count=1 if=/dev/zero of=$PGDATA/base/<databaseDIR>/<relFile>.<filenum>
               1. Before you do this it is best to backup the block:  "dd bs=8k skip=<blocknum> count=1 if=/path/file | hd"
                    See this post: http://archives.postgresql.org/pgsql-general/2006-03/msg00002.php

Your thoughts / comments...

Cheers
Noel

Michael Fuhr wrote:
On Thu, Mar 09, 2006 at 12:37:52PM +1100, Noel Faux wrote: 
I've been watching the post: Re: [GENERAL] Fixing up a corrupted toast table
In there they mention deletion of the bad rows from the table based on 
the citid.  If I could come up with a def of a back row, would this 
work, or are there other issues?   
If you have a corrupt tuple within an otherwise good block then you
can try deleting that tuple, but if the block header is corrupt
then you have no way of addressing any of that block's tuples.
Errors implying a bad tuple include "missing chunk number" and
"invalid memory alloc request size"; but "invalid page header in
block" means the block itself is bad.
 

Вложения

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Thu, Mar 09, 2006 at 03:57:46PM +1100, Noel Faux wrote:
> Given that this seems problem has occurred a number of times for a
> number I've written a small step by step procedure to address this
> issue.  Is there any other comments you which to add.  I was thinking
> that this should be added to the FAQ / troubleshooting in the docs.
>
> How to repair corrupted data due to "ERROR: invalid page header in block
> X of relation "Y":

The word "repair" might be misleading.  The operation repairs the
table in a sense, but as the following caution points out it does
so by completing the destruction that something else began.

> CAUTION this will permanently remove the data defined in the bad block
>
>   1. To identify which file(s) the relation is in:
>         1. <THE SELECT STATEMENTS TO COLLECT THIS DATA> I can't
> remember how I did it, but will keep hunting for my notes :)

Hint: pg_class.relfilenode

http://www.postgresql.org/docs/8.1/interactive/catalog-pg-class.html

>         2. so the file(s) are $PGDATA/<databaseDIR>/<relFile>.* The *
> is any number which is defined below

You might want to show how to determine <databaseDIR>.  The actual
location might not be under $PGDATA -- 8.0 and later have tablespaces
and earlier versions support alternative locations, so instructions
should account for that.  Also, relations smaller than 1G won't have
any .N files.

http://www.postgresql.org/docs/8.1/interactive/manage-ag-tablespaces.html
http://www.postgresql.org/docs/8.1/interactive/storage.html
http://www.postgresql.org/docs/7.4/interactive/manage-ag-alternate-locs.html

>   2. To calculate the * value:
>         1. SELECT <block> / 131072 AS filenum, <block> % 131072 AS blocknum;
>            filenum    |  blocknum
>         -----------+----------------
>          <filenum> | <blocknum>
>               1. 131072 comes from "each database file is 1G, or 131072
> * 8k blocks"

The 1G figure obviously applies only to tables that require that
much space.  If <filenum> comes back zero then you'd use the file
without any .N suffix.  If the bad block is less than 131072 (or
however many other-than-8k blocks fit in 1G) then you needn't bother
with the calculation.

>   3. Now you need to re-zero this block using the following command:
>         1. dd bs=8k seek=<blocknum> conv=notrunc count=1 if=/dev/zero
> of=$PGDATA/base/<databaseDIR>/<relFile>.<filenum>

I'd recommend testing the command on a throwaway file before working
with real data -- "measure twice, cut once" as it were.  To gain
confidence in what you're doing you could create a test table,
populate it with data, corrupt its data file, then zero its bad
blocks until you can select all of the remaining data.  Playing
around in a production database is probably a bad idea; a safer way
would be to initdb a test cluster and run a separate postmaster
(listening on a different port if you're on the same machine as the
real database).

It's probably best to shut down the postmaster while you're mucking
around with the data files.

>               1. Before you do this it is best to backup the block:
> "dd bs=8k skip=<blocknum> count=1 if=/path/file | hd"

This command doesn't back up the block, it pipes the block into a
command that on some systems will display a hex and ASCII dump of
the data (some systems will require a command other than hd).  You
could back up the block by redirecting the dd output to a file
instead of piping it into another command.

Incidentally, I was looking at your web site and your project
might make an interesting case study for the PostgreSQL web site
(Community -> In The Real World -> Case studies).

http://www.postgresql.org/about/casestudies/

Some users and potential users might be interested in reading about
how you're using PostgreSQL with a 100G+ database.  Post a message
to pgsql-www if you'd be interested in providing a write-up.

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Michael Fuhr
Дата:
On Thu, Mar 09, 2006 at 03:57:46PM +1100, Noel Faux wrote:
> Given that this seems problem has occurred a number of times for a
> number I've written a small step by step procedure to address this
> issue.  Is there any other comments you which to add.  I was thinking
> that this should be added to the FAQ / troubleshooting in the docs.
>
> How to repair corrupted data due to "ERROR: invalid page header in block
> X of relation "Y":

In pgsql-novice Christopher Goodfellow mentioned the zero_damaged_pages
option that I had forgotten about.  If you don't care about examining
the bad pages then you could set this option in a session and execute
a statement that hits every page in the file (Tom Lane mentioned
VACUUM and SELECT COUNT(*)).  Here's the example I posted in reply:

test=# select count(*) from foo;
ERROR:  invalid page header in block 10 of relation "foo"
test=# set zero_damaged_pages to on;
SET
test=# select count(*) from foo;
WARNING:  invalid page header in block 10 of relation "foo"; zeroing out page
WARNING:  invalid page header in block 20 of relation "foo"; zeroing out page
WARNING:  invalid page header in block 30 of relation "foo"; zeroing out page
 count
-------
  9445
(1 row)

test=# set zero_damaged_pages to off;
SET

--
Michael Fuhr

Re: Data corruption zero a file - help!!

От
Noel Faux
Дата:
To clarify, when set on, every time it hits this error, postgres will rezero that block?

Michael Fuhr wrote:
On Thu, Mar 09, 2006 at 03:57:46PM +1100, Noel Faux wrote: 
Given that this seems problem has occurred a number of times for a 
number I've written a small step by step procedure to address this 
issue.  Is there any other comments you which to add.  I was thinking 
that this should be added to the FAQ / troubleshooting in the docs.

How to repair corrupted data due to "ERROR: invalid page header in block 
X of relation "Y":   
In pgsql-novice Christopher Goodfellow mentioned the zero_damaged_pages
option that I had forgotten about.  If you don't care about examining
the bad pages then you could set this option in a session and execute
a statement that hits every page in the file (Tom Lane mentioned
VACUUM and SELECT COUNT(*)).  Here's the example I posted in reply:

test=# select count(*) from foo;
ERROR:  invalid page header in block 10 of relation "foo"
test=# set zero_damaged_pages to on;
SET
test=# select count(*) from foo;
WARNING:  invalid page header in block 10 of relation "foo"; zeroing out page
WARNING:  invalid page header in block 20 of relation "foo"; zeroing out page
WARNING:  invalid page header in block 30 of relation "foo"; zeroing out pagecount 
------- 9445
(1 row)

test=# set zero_damaged_pages to off;
SET
 

Вложения

Re: Data corruption zero a file - help!!

От
Tom Lane
Дата:
Noel Faux <noel.faux@med.monash.edu.au> writes:
> To clarify, when set on, every time it hits this error, postgres will
> rezero that block?

It'll only "re" zero if the page gets dropped from shared memory without
there having been any occasion to write it out.  Otherwise, the first
write will clobber the bad data on disk and that's the end of it.

My suggestion to use either VACUUM or SELECT COUNT(*) failed to take
that behavior into account --- VACUUM *will* rewrite the page, but a
SELECT scan won't dirty the page.  So you might consider a SELECT to see
how bad the situation is (how many bad pages) and then a VACUUM if you
want them cleaned up.

            regards, tom lane