Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)

Поиск

Список

Период

Сортировка

От	Sim Zacks
Тема	Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)
Дата	21 мая 2008 г. 12:51:56
Msg-id	48341746.3080902@compulab.co.il обсуждение исходный текст
Ответ на	Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0) (Peter Petrov <peter@demabg.com>)
Список	pgsql-general

Дерево обсуждения

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you have a backup, the easiest way would be to restore it. There is
also a way to run the database logfile into the database from a point in
time (ie. from the time f last backup) so that you can get your data.
I've never actually seen it work though.



Peter Petrov wrote:
> Hi,
>
> Today one of the disk was marked as as failed .... and now some files
> are corrupted.
> I've decided to copy the pgsqldata directory and try to fix PG_VERSION
> (see below for information - what PostgreSQL don't like) files ... and
> see if the database will come up.
> During copying files and etc. I'll be open for any other idea how to
> deal with the problem ;)
>
> PostgreSQL's log offer me to run initdb (HINT message from LOG file) -
> what will happen if then I try to copy the rest ot the structure into
> the newly created database cluster ?
>
> linux (Slackware 12.0.0), software RAID5 (partition based) + PostgreSQL
> 8.3.0:
>
> Here's what happen (from dmesg):
>
> ---------------------------------------
> # uname -a
> Linux xeonito 2.6.21.5 #3 SMP Tue Oct 2 16:20:48 EEST 2007 i686 Intel(R)
> Xeon(R) CPU           E5335  @ 2.00GHz GenuineIntel GNU/Linux
>
> ---------------------------------------
> # dmesg
> sd 0:0:3:0: SCSI error: return code = 0x08000002
> sdd: Current: sense key=0x4
>    ASC=0x44 ASCQ=0x0
> Info fld=0x0
> end_request: I/O error, dev sdd, sector 159620863
> sd 0:0:3:0: SCSI error: return code = 0x08000002
> sdd: Current: sense key=0x4
>    ASC=0x44 ASCQ=0x0
> Info fld=0x0
> end_request: I/O error, dev sdd, sector 159617119
> raid5: Disk failure on sdd1, disabling device. Operation continuing on 4
> devices
> ......
>
> RAID5 conf printout:
> --- rd:5 wd:4
> disk 0, o:1, dev:sdb1
> disk 1, o:1, dev:sdc1
> disk 2, o:0, dev:sdd1
> disk 3, o:1, dev:sde1
> disk 4, o:1, dev:sdf1
> RAID5 conf printout:
> --- rd:5 wd:4
> disk 0, o:1, dev:sdb1
> disk 1, o:1, dev:sdc1
> disk 3, o:1, dev:sde1
> disk 4, o:1, dev:sdf1
>
> ---------------------------------------
>
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath] [faulty]
> md1 : active raid5 sdb1[0] sdf1[4] sde1[3] sdd1[5](F) sdc1[1]
>      585924608 blocks level 5, 8192k chunk, algorithm 2 [5/4] [UU_UU]
>
> md0 : active raid5 sdb2[0] sdf2[4] sde2[3] sdd2[5](F) sdc2[1]
>      390053888 blocks level 5, 1024k chunk, algorithm 2 [5/4] [UU_UU]
>
> unused devices: <none>
>
> ---------------------------------------
>
> And here's what the partitions look like:
>
> # fdisk  -l /dev/sdb
>
> Disk /dev/sdb: 249.8 GB, 249865175040 bytes
> 255 heads, 63 sectors/track, 30377 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>   Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1               1       18237   146488671   83  Linux
> /dev/sdb2           18238       30377    97514550   83  Linux
>
> ---------------------------------------
> Kernel parameters:
>
> echo 4200000000 > /proc/sys/kernel/shmmax
> echo 4200000000 > /proc/sys/kernel/shmall
> sysctl -w vm.overcommit_memory=2
>
> echo 8192 >  /sys/block/md0/md/stripe_cache_size
> echo 8192 >  /sys/block/md1/md/stripe_cache_size
>
> ---------------------------------------
>
>
> Both md0 and md1 are used from PostgreSQL - initially it was not design
> to use the whole disk sdb-sdf, but due to size requirement I join also
> the other unused space to be used by PostgreSQL.
>
>
> And here's the Postgre's log (FATAL message is coming when I try to
> connect to the database, of course this is the case for the most
> interesting database ... some other small databases are working fine):
>
> LOG:  received smart shutdown request
> LOG:  autovacuum launcher shutting down
> LOG:  shutting down
> LOG:  database system is shut down
> LOG:  could not create IPv6 socket: Address family not supported by
> protocol
> LOG:  database system was shut down at 2008-05-20 17:54:17 EEST
> LOG:  autovacuum launcher started
> LOG:  database system is ready to accept connections
> FATAL:  "base/16399" is not a valid data directory
> DETAIL:  File "base/16399/PG_VERSION" does not contain valid data.
> HINT:  You might need to initdb.
>
> Of course base/16399/PG_VERSION contains something strange not the
> version information:
>
> # cat base/16399/PG_VERSION
> X
>
>
> ---------------------------------------
>
>
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg0F0YACgkQjDX6szCBa+r5wwCg5Dzms7G3ipmVaoBbCZd+jPp8
TmIAnRrehvG1m+wvERsZ8J8Xw8v9scO5
=5AgU
-----END PGP SIGNATURE-----

В списке pgsql-general по дате отправления:

Предыдущее

От: David Fetter
Дата: 21 мая 2008 г., 12:01:03
Сообщение: Re: Short-circuiting FK check for a newly-added field

Следующее

От: Sim Zacks
Дата: 21 мая 2008 г., 12:51:58
Сообщение: bytea case sensitivity

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)

Предыдущее

Следующее