Обсуждение: 9.1.2 Postgres corruption, any way to recover?

Поиск
Список
Период
Сортировка

9.1.2 Postgres corruption, any way to recover?

От
Tory M Blue
Дата:
Appears one of my bigger, but older DB's cored or other this morning and when it came back up the DB shows that it can't start and is possibly corrupted. I've read this was actually due to a kernel bug sometime back (or at least tied to the kernel bug).

I'm wondering if there was any other work arounds or "tricks" that I may try to recover, vs doing a restore from backup?

2014-02-23 03:46:08 PST    LOG:  aborting startup due to startup process failure
2014-02-23 11:10:09 PST    LOG:  database system was interrupted while in recovery at 2014-02-23 03:46:04 PST
2014-02-23 11:10:09 PST    HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
2014-02-23 11:10:09 PST    LOG:  database system was not properly shut down; automatic recovery in progress
2014-02-23 11:10:09 PST    LOG:  consistent recovery state reached at 1493/24398AA8
2014-02-23 11:10:09 PST    LOG:  redo starts at 1493/5306FC8
2014-02-23 11:10:09 PST    PANIC:  heap_update_redo: invalid lp
2014-02-23 11:10:09 PST    CONTEXT:  xlog redo hot_update: rel 16399/868691025/959835680; tid 1180404/38; new 1180404/40
2014-02-23 11:10:09 PST    LOG:  startup process (PID 3175) was terminated by signal 6: Aborted
2014-02-23 11:10:09 PST    LOG:  aborting startup due to startup process failure


Not holding out hope, but maybe just maybe someone has some ideas/shortcuts to maybe get this DB back up

Thanks
Tory

Re: 9.1.2 Postgres corruption, any way to recover?

От
Tomas Vondra
Дата:
On 23.2.2014 21:42, Tory M Blue wrote:
> Appears one of my bigger, but older DB's cored or other this morning and
> when it came back up the DB shows that it can't start and is possibly
> corrupted. I've read this was actually due to a kernel bug sometime back
> (or at least tied to the kernel bug).
...
>
> Not holding out hope, but maybe just maybe someone has some
> ideas/shortcuts to maybe get this DB back up


I think the first thing you should ask yourself is why you're running
9.1.2, i.e. a 3 years old revision, instead of the current 9.1.12. Maybe
it's not the cause of the bug, but still ...

Also, it seems to me that the corruption happened some time ago and you
only discovered it now. Which is strange, because corrupted page header
should kill every backup attempt. Are you sure you really have backups?
I mean, tested and working backups?

Do you have an idea how many blocks are actually corrupted? Is it just
this single one, or are there more? Are you sure it's a actually due to
a kernel bug, and not a storage failure (for example)? And what kernel
do you have in mind?

There are certainly tricks to make it work (e.g. zeroing the block with
corrupted header), but that means data loss (you won't have data from
the block) and it's tedious / time consuming. If you have a working
backup, and if it's acceptable to loose the data since then, you should
probably do that.

The only thing that might help you to recover all the data is probably
PITR, i.e. a base backup + WAL archive (or replication).

regards
Tomas