Обсуждение: Servercrash

Поиск
Список
Период
Сортировка

Servercrash

От
Stefan Holzheu
Дата:
Hallo Lists,

yesterday our database-server crashed (raid-error). After reparing the
filesystem (ext2) we have a problem with our lagest table in the database.

vaccuum aborts with the following meassage (sorry it is in German):

DEBUG:  vacuume »messungen.massendaten«
HINWEIS:  Relation »massendaten« TID 211540/73:
DeleteTransactionInProgress 16785658 --- kann Relation nicht verkleinern
(cannot shrink relation)
DEBUG:  AbortCurrentTransaction
FEHLER:  ungültiger Seitenkopf in Block 354500 von Relation
»massendaten« (unvalid pagehead in block ...)

All other operations using ctid 211540/73 give:

FEHLER:  konnte auf den Status von Transaktion 16785658 nicht zugreifen
(could not get status of transaction ...)
DETAIL:  konnte Datei »/var/lib/pgsql/data/pg_clog/0010« nicht öffnen:
Datei oder Verzeichnis nicht gefunden

Is there a way to repair the database?

Help welcome!

Regards

    Stefan


--
-----------------------------
Dr. Stefan Holzheu
Tel.: 0921/55-5720
Fax.: 0921/55-5799
BITOeK Wiss. Sekretariat
Universitaet Bayreuth
D-95440 Bayreuth
-----------------------------

Re: Servercrash

От
Tom Lane
Дата:
Stefan Holzheu <stefan.holzheu@bitoek.uni-bayreuth.de> writes:
> yesterday our database-server crashed (raid-error). After reparing the
> filesystem (ext2) we have a problem with our lagest table in the database.

> vaccuum aborts with the following meassage (sorry it is in German):

> DEBUG:  vacuume �messungen.massendaten�
> HINWEIS:  Relation �massendaten� TID 211540/73:
> DeleteTransactionInProgress 16785658 --- kann Relation nicht verkleinern
> (cannot shrink relation)
> DEBUG:  AbortCurrentTransaction
> FEHLER:  ung�ltiger Seitenkopf in Block 354500 von Relation
> �massendaten� (unvalid pagehead in block ...)

> All other operations using ctid 211540/73 give:

> FEHLER:  konnte auf den Status von Transaktion 16785658 nicht zugreifen
> (could not get status of transaction ...)
> DETAIL:  konnte Datei �/var/lib/pgsql/data/pg_clog/0010� nicht �ffnen:
> Datei oder Verzeichnis nicht gefunden

> Is there a way to repair the database?

You have at least two corrupted pages in that table: page 354500 has a
header problem, and in page 211540 there's a bogus transaction ID in a
tuple header.  These are the *minimum* descriptions of the data lossage,
it's entirely likely that large parts of the pages involved are junk.

I would suggest proceeding by examining those pages with pg_filedump or
another tool.  If you can make some sense of the damage it might be
possible to do a selective repair.  If not, your best bet is to just
zero out the damaged pages --- this will lose the rows that are on those
pages, but at least you can get the rest of the table operational again.

You can find more about this by looking in the mail list archives.
Threads mentioning pg_filedump would be good places to start.

            regards, tom lane

Re: Servercrash

От
Stefan Holzheu
Дата:
>
>>yesterday our database-server crashed (raid-error). After reparing the
>>filesystem (ext2) we have a problem with our lagest table in the database.
>
>
>>vaccuum aborts with the following meassage (sorry it is in German):
>
>
>>DEBUG:  vacuume »messungen.massendaten«
>>HINWEIS:  Relation »massendaten« TID 211540/73:
>>DeleteTransactionInProgress 16785658 --- kann Relation nicht verkleinern
>>(cannot shrink relation)
>>DEBUG:  AbortCurrentTransaction
>>FEHLER:  ungültiger Seitenkopf in Block 354500 von Relation
>>»massendaten« (unvalid pagehead in block ...)
>
>
>>All other operations using ctid 211540/73 give:
>
>
>>FEHLER:  konnte auf den Status von Transaktion 16785658 nicht zugreifen
>>(could not get status of transaction ...)
>>DETAIL:  konnte Datei »/var/lib/pgsql/data/pg_clog/0010« nicht öffnen:
>>Datei oder Verzeichnis nicht gefunden
>
>
>>Is there a way to repair the database?
>
>
> You have at least two corrupted pages in that table: page 354500 has a
> header problem, and in page 211540 there's a bogus transaction ID in a
> tuple header.  These are the *minimum* descriptions of the data lossage,
> it's entirely likely that large parts of the pages involved are junk.
>
> I would suggest proceeding by examining those pages with pg_filedump or
> another tool.  If you can make some sense of the damage it might be
> possible to do a selective repair.  If not, your best bet is to just
> zero out the damaged pages --- this will lose the rows that are on those
> pages, but at least you can get the rest of the table operational again.
>
> You can find more about this by looking in the mail list archives.
> Threads mentioning pg_filedump would be good places to start.
>
>             regards, tom lane
>
>
We set zero_damaged_pages to on and get rid of page 354500. But the
error in page 211540 remains :-(...

Any idea?

Stefan
--
-----------------------------
Dr. Stefan Holzheu
Tel.: 0921/55-5720
Fax.: 0921/55-5799
BITOeK Wiss. Sekretariat
Universitaet Bayreuth
D-95440 Bayreuth
-----------------------------