Обсуждение: 9.1.2 Postgres corruption, any way to recover?
Appears one of my bigger, but older DB's cored or other this morning and when it came back up the DB shows that it can't start and is possibly corrupted. I've read this was actually due to a kernel bug sometime back (or at least tied to the kernel bug).
I'm wondering if there was any other work arounds or "tricks" that I may try to recover, vs doing a restore from backup?
2014-02-23 03:46:08 PST LOG: aborting startup due to startup process failure
2014-02-23 11:10:09 PST LOG: database system was interrupted while in recovery at 2014-02-23 03:46:04 PST
2014-02-23 11:10:09 PST HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
2014-02-23 11:10:09 PST LOG: database system was not properly shut down; automatic recovery in progress
2014-02-23 11:10:09 PST LOG: consistent recovery state reached at 1493/24398AA8
2014-02-23 11:10:09 PST LOG: redo starts at 1493/5306FC8
2014-02-23 11:10:09 PST PANIC: heap_update_redo: invalid lp
2014-02-23 11:10:09 PST CONTEXT: xlog redo hot_update: rel 16399/868691025/959835680; tid 1180404/38; new 1180404/40
2014-02-23 11:10:09 PST LOG: startup process (PID 3175) was terminated by signal 6: Aborted
2014-02-23 11:10:09 PST LOG: aborting startup due to startup process failure
Not holding out hope, but maybe just maybe someone has some ideas/shortcuts to maybe get this DB back up
Thanks
Tory
On 23.2.2014 21:42, Tory M Blue wrote: > Appears one of my bigger, but older DB's cored or other this morning and > when it came back up the DB shows that it can't start and is possibly > corrupted. I've read this was actually due to a kernel bug sometime back > (or at least tied to the kernel bug). ... > > Not holding out hope, but maybe just maybe someone has some > ideas/shortcuts to maybe get this DB back up I think the first thing you should ask yourself is why you're running 9.1.2, i.e. a 3 years old revision, instead of the current 9.1.12. Maybe it's not the cause of the bug, but still ... Also, it seems to me that the corruption happened some time ago and you only discovered it now. Which is strange, because corrupted page header should kill every backup attempt. Are you sure you really have backups? I mean, tested and working backups? Do you have an idea how many blocks are actually corrupted? Is it just this single one, or are there more? Are you sure it's a actually due to a kernel bug, and not a storage failure (for example)? And what kernel do you have in mind? There are certainly tricks to make it work (e.g. zeroing the block with corrupted header), but that means data loss (you won't have data from the block) and it's tedious / time consuming. If you have a working backup, and if it's acceptable to loose the data since then, you should probably do that. The only thing that might help you to recover all the data is probably PITR, i.e. a base backup + WAL archive (or replication). regards Tomas