Re: Database corruption: finding the bad block
От | Erik Jones |
---|---|
Тема | Re: Database corruption: finding the bad block |
Дата | |
Msg-id | 3A298647-2011-4FA4-AB28-CD5B4BC1638A@myemma.com обсуждение исходный текст |
Ответ на | Database corruption: finding the bad block (Csaba Nagy <nagy@ecircle-ag.com>) |
Список | pgsql-general |
On Jul 12, 2007, at 8:09 AM, Csaba Nagy wrote: > Hi all, > > I just had the following error on one of our data bases: > > ERROR: could not access status of transaction 1038286848 > DETAIL: could not open file "pg_clog/03DE": No such file or directory > > I researched on the mailing list and it looks like the usual > suspect is > disk page corruption. There are few previous discussions about how to > dump the suspect disk page, e.g.: > > http://archives.postgresql.org/pgsql-general/2006-10/msg01372.php > http://groups.google.com/group/comp.databases.postgresql.hackers/ > browse_frm/thread/4988752a6939f45a/fd9f12468e86dd3? > hl=en&lr=&ie=UTF-8&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den% > 26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708% > 2540quartz.newn.cam.ac.uk%26rnum%3D8#fd9f12468e86dd3 > > You can probably find more searching for "ERROR: could not access > status of transaction" or "pg_filedump". > > What I could not find was a simple and fast way to find the bad block. > The error message itself is not useful in this regard, and the "binary > search" method is anything but fast on a big table. > > So I wrote the following plpgsql function: > > create or replace function > find_bad_block(p_TableName text) > returns tid > as $find_bad_block$ > declare > result tid; > crtRow record; > count bigint := 0; > begin > for crtRow > in execute 'select ctid from ' || p_TableName > loop > result = crtRow.ctid; > count := count + 1; > if count % 500000 = 0 then > raise notice 'rows processed: %', count; > end if; > end loop; > return result; > exception > when OTHERS then > raise notice '%: %', SQLSTATE, SQLERRM; > return result; > end; > $find_bad_block$ > language plpgsql; > > > This will spit out the error + the last valid block visited. If > there's > no bad block, you won't get the notice with the error, just the last > block of the table... worked fine for me, resulting in: > > NOTICE: 58P01: could not access status of transaction 1038286848 > find_bad_block > ---------------- > (657780,157) > (1 row) > > Now to finding the file I should dump: > > select oid from pg_database where datname = 'dbname'; > oid > ------- > 16389 > (1 row) > > select relfilenode from pg_class where relname='table_name'; > relfilenode > ------------- > 20418 > (1 row) > > The file is then 'db_path/base/16389/20418'. Or a collection of > '20418.x' files if the table's data is more than 1 GB, each file > being a > 1GB chunk of the table... so which one to dump ? > > First calculate the block count in one chunk: 1024 * 1024 / > block_size, > where block_size is the block size in KB with which postgres was > compiled. That's normally 8, but we have systems where it is set to > 32. > If you didn't change that yourself, it is likely 8. > > So in my case the block count per chunk was: 1024 * 1024 / 8 = 131072. > > So the chunk number will be: > > blockId / blocks_per_chunk = 657780 / 131072 = 5 > > So I should look in the file 'db_path/base/16389/20418.5'... but wait, > the block id has to be made relative to the chunk file's start: > chunk_block_id = block_id % blocks_per_chunk > > So the block id to use with pg_filedump is: 657780 % 131072 = 2420 > > So my final command line was: > > pg_filedump -if -R 2420 db_path/base/16389/20418.5 > > resulting in something like: > > [snip] > <Data> ------ > Item 1 -- Length: 44 Offset: 8148 (0x1fd4) Flags: USED > XMIN: 1048587 CMIN: 90130188 XMAX: 0 CMAX|XVAC: 1036845056 > Block Id: 655376 linp Index: 18451 Attributes: 1375 Size: 0 > infomask: 0x0200 (XMIN_INVALID) > Error: Computed header length not equal to header size. > Computed <28> Header: <0> > > 1fd4: 0b001000 0c475f05 00000000 0000cd3d .....G_........= > 1fe4: 0a001000 13485f05 00020000 2b030000 .....H_.....+... > 1ff4: 2d030000 00000000 01000000 -........... > > [snip] > > So I found my bad block, and the previous and following looked OK. > > Now I want to fix just that one block even if the rows on it are lost, > as the table data is not detail-critical (massive amounts of lost data > would be critical, small loss is tolerable). > > Now to the actual fixing: from my searches it looks like zeroing > out the > complete block + reindex the table seems to be the recommended > solution > if it is not possible to downright drop the table and restore from > backup (in my case that is not possible - this error is there from > last > year's October, and all our backups failed from then to now - and > nobody > checked the logs till now when I tried to upgrade postgres via > slony and > failed to do it because of this error - rather telling for the > importance of this DB). > > So, how to zero out the page ? > > The following message shows a way to do it: > > http://www.webservertalk.com/archive142-2004-7-319751.html > > Basically set the 'zero_damaged_pages' setting to 'on': > > http://www.postgresql.org/docs/8.1/interactive/runtime-config- > developer.html > > ... and select that page again. Unfortunately this did not work for > me... looks like if the page header is not corrupt but only individual > tuples are a problem, the 'zero_damaged_pages' thing doesn't work. > > Next try: > > http://usr-share-man.org/man1/dd.html > > The start of my bad block is at: 8 * 2420 = 19360K, so I first dump > the > block for cross checking that I got the arguments right: > > dd ibs=1024 if=db_path/base/16389/20418.5 count=8 skip=19360 > of=cross_check_file > > Then I used: > > pg_filedump -if -R 0 cross_check_file > > and diffed against the original dump, to make it absolutely sure it's > that right block... call me paranoid, but I'm not a dd expert and I > had > to be sure. It matched. > > So now to the destructive part of the action: > > dd ibs=1024 obs=1024 if=/dev/zero of=db_path/base/16389/20418.5 > count=8 > seek=19360 conv=notrunc > > Ok, I admit that on first try I didn't put the 'conv=notrunc' there... > so with all the former paranoia I still ended up with a truncated > chunk. > Luckily I remembered I have a WAL logging based replica, so I > recovered > the rest of the truncated file from the replica's same file... this > being an insert only table I was lucky I guess that this was an > option. > To my surprise, the same block on the replica was not mangled... I say > to my surprise, because on other occasions the bad blocks readily > replicated over. In any case if you have a WAL logged replica you > might > be lucky to recover the corrupt block(s) from there (or just switch > over, but that is risky too, you can't know for sure in what state the > replica is, and that is actually harder to investigate than the > master, > as you can execute no SQL on the replica). > > Anyway, that's it... it looks the problem is gone, and the DB will be > moved to another box to postgres 8.2.4 via slony. I'm not sure how > much > of that table I mangled with my dd experiments, hope that others can > learn from my mistakes... > > Cheers, > Csaba. Wow. Csaba, thanks for sharing. I'm definitely archiving this. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
В списке pgsql-general по дате отправления: