Re: How to deal with corrupted database?

Поиск
Список
Период
Сортировка
От Ruslan A. Bondar
Тема Re: How to deal with corrupted database?
Дата
Msg-id 20111109173744.7592798e@list.ru
обсуждение исходный текст
Ответ на Re: How to deal with corrupted database?  (Craig Ringer <ringerc@ringerc.id.au>)
Ответы Re: How to deal with corrupted database?  (Craig Ringer <ringerc@ringerc.id.au>)
Список pgsql-admin
There were no unexpected reboots.
First issue was some kind of deadlock (concurrent insert and concurrent delete on a table) I saw them wile reindexing
thedatabase.  
Also mesages like this were in dmesg:
[3681001.529179] INFO: task postgres:12432 blocked for more than 120 seconds.
[3681001.529191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[3681001.529205] postgres      D ed8c0e98     0 12432    740 0x00000000
[3681001.529225]  ec83f700 00000286 ec361080 ed8c0e98 c1f8bee0 c145ae20 c145ae20 c1456354
[3681001.529262]  ec83f8bc c3e54e20 00000000 00000000 00000000 00000001 ffffffff ec361080
[3681001.529312]  c3e50354 ec83f8bc 36d8262f 00000000 ec83f700 00000000 00000000 00000000
[3681001.529369] Call Trace:
[3681001.529385]  [<c128d717>] ? __mutex_lock_common+0xe8/0x13b
[3681001.529401]  [<c128d779>] ? __mutex_lock_slowpath+0xf/0x11
[3681001.529416]  [<c128d80a>] ? mutex_lock+0x17/0x24
[3681001.529429]  [<c128d80a>] ? mutex_lock+0x17/0x24
[3681001.529444]  [<c10bc2a3>] ? generic_file_llseek+0x17/0x44
[3681001.529458]  [<c10bc28c>] ? generic_file_llseek+0x0/0x44
[3681001.529473]  [<c10bb145>] ? vfs_llseek+0x30/0x34
[3681001.529487]  [<c10bc1a1>] ? sys_llseek+0x3a/0x7a
[3681001.529501]  [<c1008efc>] ? syscall_call+0x7/0xb

So I've stopped software caused these inserts and deletes, but reindexing shows same warnings. I've restarted
postgresqlserver. Postgres restarted successfully, but the database became unaccessible. Filesystem is clean. File
base/16387/86057840exists but is zero length. File pg_subtrans/00F2 does not exists. 

Also after several restarts postgres can't start. Messages are:

2011-11-09 16:25:04 MSK LOG:  database system shutdown was interrupted; last known up at 2011-11-09 14:13:38 MSK
2011-11-09 16:25:04 MSK LOG:  database system was not properly shut down; automatic recovery in progress
2011-11-09 16:25:04 MSK FATAL:  the database system is starting up
2011-11-09 16:25:04 MSK LOG:  consistent recovery state reached at 171/19BE8060
2011-11-09 16:25:04 MSK LOG:  redo starts at 171/19BE8060
2011-11-09 16:25:04 MSK LOG:  incomplete startup packet
2011-11-09 16:25:04 MSK LOG:  record with zero length at 171/19C26010
2011-11-09 16:25:04 MSK LOG:  redo done at 171/19C25FB4
2011-11-09 16:25:04 MSK LOG:  last completed transaction was at log time 2011-11-09 13:05:20.105323+03
2011-11-09 16:25:04 MSK FATAL:  xlog flush request 171/1B1374E0 is not satisfied --- flushed only to 171/19C26010
2011-11-09 16:25:04 MSK CONTEXT:  writing block 0 of relation base/16385/86064815_vm
2011-11-09 16:25:04 MSK LOG:  startup process (PID 3570) exited with exit code 1
2011-11-09 16:25:04 MSK LOG:  aborting startup due to startup process failure

This database isn't mission critical, so if you want - I can experiment on this.

On Wed, 09 Nov 2011 21:04:47 +0800
Craig Ringer <ringerc@ringerc.id.au> wrote:

> On 11/09/2011 07:02 PM, Ruslan A. Bondar wrote:
> > Hello all.
> >
> > This is a first time I receive such an issue.
> > My database was corrupted some way.
> Before you do ANYTHING else, make a copy of your database somewhere
> safe. See:
>
> http://wiki.postgresql.org/wiki/Corruption
> > When I'm trying to access the database via psql I receive:
> >
> > root@udb:/etc/bacula# psql -U bacula
> > psql: FATAL:  could not read block 0 in file
> > "base/16387/86057840": read only 0 of 8192 bytes
> >
> >
> > So I want to drop it, and recover from backup. But when I'm trying
> > to drop the database I see:
> >
> > postgres=# drop database bacula;
> > ERROR:  could not access status of transaction 15892843
> > DETAIL:  Could not open file "pg_subtrans/00F2": No such file or
> > directory.
> >
> >
> > Is there any way to recover the database to working state, or drop
> > it?
> >
> *ONLY* once you've made a full backup copy, you may be able to set
> zero_damaged_pages to get a usable dump.
>
> Do you know what caused this? The missing files suggest it was
> probably file system corruption - was there a disk failure? fsck run
> with errors? Unexpected reboot on a RAID controller with a dead
> backup battery?
>
> --
> Craig Ringer


В списке pgsql-admin по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: How to deal with corrupted database?
Следующее
От: Geoffrey Myers
Дата:
Сообщение: setting timezone