Re: Database corruption: finding the bad block

Поиск
Список
Период
Сортировка
От Erik Jones
Тема Re: Database corruption: finding the bad block
Дата
Msg-id 3A298647-2011-4FA4-AB28-CD5B4BC1638A@myemma.com
обсуждение исходный текст
Ответ на Database corruption: finding the bad block  (Csaba Nagy <nagy@ecircle-ag.com>)
Список pgsql-general
On Jul 12, 2007, at 8:09 AM, Csaba Nagy wrote:

> Hi all,
>
> I just had the following error on one of our data bases:
>
> ERROR:  could not access status of transaction 1038286848
> DETAIL:  could not open file "pg_clog/03DE": No such file or directory
>
> I researched on the mailing list and it looks like the usual
> suspect is
> disk page corruption. There are few previous discussions about how to
> dump the suspect disk page, e.g.:
>
> http://archives.postgresql.org/pgsql-general/2006-10/msg01372.php
> http://groups.google.com/group/comp.databases.postgresql.hackers/
> browse_frm/thread/4988752a6939f45a/fd9f12468e86dd3?
> hl=en&lr=&ie=UTF-8&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%
> 26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%
> 2540quartz.newn.cam.ac.uk%26rnum%3D8#fd9f12468e86dd3
>
> You can probably find more searching for "ERROR:  could not access
> status of transaction" or "pg_filedump".
>
> What I could not find was a simple and fast way to find the bad block.
> The error message itself is not useful in this regard, and the "binary
> search" method is anything but fast on a big table.
>
> So I wrote the following plpgsql function:
>
> create or replace function
>   find_bad_block(p_TableName text)
>   returns tid
> as $find_bad_block$
>   declare
>     result tid;
>     crtRow record;
>     count bigint := 0;
>   begin
>     for crtRow
>     in execute 'select ctid from ' || p_TableName
>     loop
>       result = crtRow.ctid;
>       count := count + 1;
>       if count % 500000 = 0 then
>         raise notice 'rows processed: %', count;
>       end if;
>     end loop;
>     return result;
>   exception
>     when OTHERS then
>         raise notice '%: %', SQLSTATE, SQLERRM;
>         return result;
>   end;
> $find_bad_block$
> language plpgsql;
>
>
> This will spit out the error + the last valid block visited. If
> there's
> no bad block, you won't get the notice with the error, just the last
> block of the table... worked fine for me, resulting in:
>
> NOTICE:  58P01: could not access status of transaction 1038286848
>  find_bad_block
> ----------------
>    (657780,157)
> (1 row)
>
> Now to finding the file I should dump:
>
> select oid from pg_database where datname = 'dbname';
>   oid
> -------
>  16389
> (1 row)
>
> select relfilenode from pg_class where relname='table_name';
>  relfilenode
> -------------
>        20418
> (1 row)
>
> The file is then 'db_path/base/16389/20418'. Or a collection of
> '20418.x' files if the table's data is more than 1 GB, each file
> being a
> 1GB chunk of the table... so which one to dump ?
>
> First calculate the block count in one chunk: 1024 * 1024 /
> block_size,
> where block_size is the block size in KB with which postgres was
> compiled. That's normally 8, but we have systems where it is set to
> 32.
> If you didn't change that yourself, it is likely 8.
>
> So in my case the block count per chunk was: 1024 * 1024 / 8 = 131072.
>
> So the chunk number will be:
>
> blockId / blocks_per_chunk = 657780 / 131072 = 5
>
> So I should look in the file 'db_path/base/16389/20418.5'... but wait,
> the block id has to be made relative to the chunk file's start:
> chunk_block_id = block_id % blocks_per_chunk
>
> So the block id to use with pg_filedump is: 657780 % 131072 = 2420
>
> So my final command line was:
>
> pg_filedump -if -R 2420 db_path/base/16389/20418.5
>
> resulting in something like:
>
> [snip]
> <Data> ------
>  Item   1 -- Length:   44  Offset: 8148 (0x1fd4)  Flags: USED
>   XMIN: 1048587  CMIN: 90130188  XMAX: 0  CMAX|XVAC: 1036845056
>   Block Id: 655376  linp Index: 18451   Attributes: 1375   Size: 0
>   infomask: 0x0200 (XMIN_INVALID)
>   Error: Computed header length not equal to header size.
>          Computed <28>  Header: <0>
>
>   1fd4: 0b001000 0c475f05 00000000 0000cd3d  .....G_........=
>   1fe4: 0a001000 13485f05 00020000 2b030000  .....H_.....+...
>   1ff4: 2d030000 00000000 01000000           -...........
>
> [snip]
>
> So I found my bad block, and the previous and following looked OK.
>
> Now I want to fix just that one block even if the rows on it are lost,
> as the table data is not detail-critical (massive amounts of lost data
> would be critical, small loss is tolerable).
>
> Now to the actual fixing: from my searches it looks like zeroing
> out the
> complete block + reindex the table seems to be the recommended
> solution
> if it is not possible to downright drop the table and restore from
> backup (in my case that is not possible - this error is there from
> last
> year's October, and all our backups failed from then to now - and
> nobody
> checked the logs till now when I tried to upgrade postgres via
> slony and
> failed to do it because of this error - rather telling for the
> importance of this DB).
>
> So, how to zero out the page ?
>
> The following message shows a way to do it:
>
> http://www.webservertalk.com/archive142-2004-7-319751.html
>
> Basically set the 'zero_damaged_pages' setting to 'on':
>
> http://www.postgresql.org/docs/8.1/interactive/runtime-config-
> developer.html
>
> ... and select that page again. Unfortunately this did not work for
> me... looks like if the page header is not corrupt but only individual
> tuples are a problem, the 'zero_damaged_pages' thing doesn't work.
>
> Next try:
>
> http://usr-share-man.org/man1/dd.html
>
> The start of my bad block is at: 8 * 2420 = 19360K, so I first dump
> the
> block for cross checking that I got the arguments right:
>
> dd ibs=1024 if=db_path/base/16389/20418.5 count=8 skip=19360
> of=cross_check_file
>
> Then I used:
>
> pg_filedump -if -R 0 cross_check_file
>
> and diffed against the original dump, to make it absolutely sure it's
> that right block... call me paranoid, but I'm not a dd expert and I
> had
> to be sure. It matched.
>
> So now to the destructive part of the action:
>
> dd ibs=1024 obs=1024 if=/dev/zero of=db_path/base/16389/20418.5
> count=8
> seek=19360 conv=notrunc
>
> Ok, I admit that on first try I didn't put the 'conv=notrunc' there...
> so with all the former paranoia I still ended up with a truncated
> chunk.
> Luckily I remembered I have a WAL logging based replica, so I
> recovered
> the rest of the truncated file from the replica's same file... this
> being an insert only table I was lucky I guess that this was an
> option.
> To my surprise, the same block on the replica was not mangled... I say
> to my surprise, because on other occasions the bad blocks readily
> replicated over. In any case if you have a WAL logged replica you
> might
> be lucky to recover the corrupt block(s) from there (or just switch
> over, but that is risky too, you can't know for sure in what state the
> replica is, and that is actually harder to investigate than the
> master,
> as you can execute no SQL on the replica).
>
> Anyway, that's it... it looks the problem is gone, and the DB will be
> moved to another box to postgres 8.2.4 via slony. I'm not sure how
> much
> of that table I mangled with my dd experiments, hope that others can
> learn from my mistakes...
>
> Cheers,
> Csaba.

Wow.  Csaba, thanks for sharing.  I'm definitely archiving this.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



В списке pgsql-general по дате отправления:

Предыдущее
От: Koen Vermeer
Дата:
Сообщение: Re: Force SSL / username combination
Следующее
От: Koen Vermeer
Дата:
Сообщение: libpq: Specify pass phrase for SSL key