Обсуждение: hardware failure - data recovery
To make a long story short lets just say that I had a bit of a hardware failure recently. If I got an error like this when trying to dump a db from the mangled data directory is it safe to say it's totally hosed or is there some chance of recovery? pg_dump: ERROR: could not open relation 1663/18392/18400: No such file or directory pg_dump: SQL command to dump the contents of table "file" failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: could not open relation 1663/18392/18400: No such file or directory pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path, vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; Thanks, Rick
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/18/06 19:57, Rick Gigger wrote: > To make a long story short lets just say that I had a bit of a hardware > failure recently. > > If I got an error like this when trying to dump a db from the mangled > data directory is it safe to say it's totally hosed or is there some > chance of recovery? > > pg_dump: ERROR: could not open relation 1663/18392/18400: No such file > or directory > pg_dump: SQL command to dump the contents of table "file" failed: > PQendcopy() failed. > pg_dump: Error message from server: ERROR: could not open relation > 1663/18392/18400: No such file or directory > pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path, > vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; What happens when you fsck the relevant partitions? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFNtOdS9HxQb37XmcRAqswAJ0X6up02yGXkky6YBrVHIK3DXuO1ACgqbI/ //TGGkVOIe9o4sKzuUHyuXI= =fZuZ -----END PGP SIGNATURE-----
Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/18/06 19:57, Rick Gigger wrote: >> To make a long story short lets just say that I had a bit of a hardware >> failure recently. >> >> If I got an error like this when trying to dump a db from the mangled >> data directory is it safe to say it's totally hosed or is there some >> chance of recovery? >> >> pg_dump: ERROR: could not open relation 1663/18392/18400: No such file >> or directory >> pg_dump: SQL command to dump the contents of table "file" failed: >> PQendcopy() failed. >> pg_dump: Error message from server: ERROR: could not open relation >> 1663/18392/18400: No such file or directory >> pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path, >> vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; > > What happens when you fsck the relevant partitions? Errors about a bunch of duplicate inodes, missing inodes, etc. Should I do it again and get some of the exact text for you?
Rick Gigger wrote: > Ron Johnson wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 10/18/06 19:57, Rick Gigger wrote: >>> To make a long story short lets just say that I had a bit of a hardware >>> failure recently. >>> >>> If I got an error like this when trying to dump a db from the mangled >>> data directory is it safe to say it's totally hosed or is there some >>> chance of recovery? >>> >>> pg_dump: ERROR: could not open relation 1663/18392/18400: No such file >>> or directory >>> pg_dump: SQL command to dump the contents of table "file" failed: >>> PQendcopy() failed. >>> pg_dump: Error message from server: ERROR: could not open relation >>> 1663/18392/18400: No such file or directory >>> pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path, >>> vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; >> >> What happens when you fsck the relevant partitions? > > Errors about a bunch of duplicate inodes, missing inodes, etc. Should I > do it again and get some of the exact text for you? Also this is an example of the type of errors that were being logged before it died: LOG: checkpoint record is at 26/41570488 LOG: redo record is at 26/41570488; undo record is at 0/0; shutdown TRUE
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/18/06 23:52, Rick Gigger wrote: > Rick Gigger wrote: >> Ron Johnson wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On 10/18/06 19:57, Rick Gigger wrote: >>>> To make a long story short lets just say that I had a bit of a hardware >>>> failure recently. >>>> >>>> If I got an error like this when trying to dump a db from the mangled >>>> data directory is it safe to say it's totally hosed or is there some >>>> chance of recovery? >>>> >>>> pg_dump: ERROR: could not open relation 1663/18392/18400: No such file >>>> or directory >>>> pg_dump: SQL command to dump the contents of table "file" failed: >>>> PQendcopy() failed. >>>> pg_dump: Error message from server: ERROR: could not open relation >>>> 1663/18392/18400: No such file or directory >>>> pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path, >>>> vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; >>> >>> What happens when you fsck the relevant partitions? >> >> Errors about a bunch of duplicate inodes, missing inodes, etc. Should >> I do it again and get some of the exact text for you? > > Also this is an example of the type of errors that were being logged > before it died: > > LOG: checkpoint record is at 26/41570488 > LOG: redo record is at 26/41570488; undo record is at 0/0; shutdown TRUE What does Google say about these error messages and your fs? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFNwhNS9HxQb37XmcRAksiAJ4zWuLJKcOaAR5PI5QSgZt3NhAQEgCgplJn p4Xfy5+I/K6uYD+MOg+e6Nc= =nhTB -----END PGP SIGNATURE-----
Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/18/06 23:52, Rick Gigger wrote: >> Rick Gigger wrote: >>> Ron Johnson wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> On 10/18/06 19:57, Rick Gigger wrote: >>>>> To make a long story short lets just say that I had a bit of a hardware >>>>> failure recently. >>>>> >>>>> If I got an error like this when trying to dump a db from the mangled >>>>> data directory is it safe to say it's totally hosed or is there some >>>>> chance of recovery? >>>>> >>>>> pg_dump: ERROR: could not open relation 1663/18392/18400: No such file >>>>> or directory >>>>> pg_dump: SQL command to dump the contents of table "file" failed: >>>>> PQendcopy() failed. >>>>> pg_dump: Error message from server: ERROR: could not open relation >>>>> 1663/18392/18400: No such file or directory >>>>> pg_dump: The command was: COPY public.file (vfs_id, vfs_type, vfs_path, >>>>> vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; >>>> What happens when you fsck the relevant partitions? >>> Errors about a bunch of duplicate inodes, missing inodes, etc. Should >>> I do it again and get some of the exact text for you? >> Also this is an example of the type of errors that were being logged >> before it died: >> >> LOG: checkpoint record is at 26/41570488 >> LOG: redo record is at 26/41570488; undo record is at 0/0; shutdown TRUE > > What does Google say about these error messages and your fs? Not much that is useful. I think this is a little beyond that scope. A hardware failure basically left the fs and the db in an inconsistent state. There is one table in one database that has a bunch of data in it that I need to get out. I'm guessing I'm going to need to find someone who understands the the internal structure of the files to go in and pull out whatever data is still in tact. I have been poking around and as far as I can tell, although one of the toast indexes is gone the actual table files appear to be in tact. That is they are still in the file system. I don't know if they are ok internally. I also get this error when trying to access the non-toasted data: ERROR: could not access status of transaction 307904873 DETAIL: could not open file "pg_clog/0125": No such file or directory I'm guessing that this means that I may have get someone to pull out all versions of a given tuple because I have lost some of the visibility info. This shouldn't matter as most likely very few tuples would have had more than one version when the system went down. I just hope that the relations are need are in tact and that there is someone out there who can help me get it out.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/19/06 00:46, Rick Gigger wrote: > Ron Johnson wrote: >> >> On 10/18/06 23:52, Rick Gigger wrote: >>> Rick Gigger wrote: >>>> Ron Johnson wrote: >>>>> >>>>> On 10/18/06 19:57, Rick Gigger wrote: [snip] > Not much that is useful. I think this is a little beyond that scope. A > hardware failure basically left the fs and the db in an inconsistent > state. There is one table in one database that has a bunch of data in > it that I need to get out. I'm guessing I'm going to need to find > someone who understands the the internal structure of the files to go in > and pull out whatever data is still in tact. So, no backup tapes? > I have been poking around and as far as I can tell, although one of the > toast indexes is gone the actual table files appear to be in tact. That > is they are still in the file system. I don't know if they are ok > internally. > > I also get this error when trying to access the non-toasted data: > > ERROR: could not access status of transaction 307904873 > DETAIL: could not open file "pg_clog/0125": No such file or directory > > I'm guessing that this means that I may have get someone to pull out all > versions of a given tuple because I have lost some of the visibility > info. This shouldn't matter as most likely very few tuples would have > had more than one version when the system went down. > > I just hope that the relations are need are in tact and that there is > someone out there who can help me get it out. What kernel, version & fs? - -- Ron Johnson, Jr. Jefferson LA USA Is "common sense" really valid? For example, it is "common sense" to white-power racists that whites are superior to blacks, and that those with brown skins are mud people. However, that "common sense" is obviously wrong. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFNxObS9HxQb37XmcRAkaJAKCKqIN76cHjvCqMJN3JqO7LBJZWfQCff2AT WZWfEy3dWyByN2Q06IAt1kg= =xmv+ -----END PGP SIGNATURE-----
Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/19/06 00:46, Rick Gigger wrote: >> Ron Johnson wrote: >>> On 10/18/06 23:52, Rick Gigger wrote: >>>> Rick Gigger wrote: >>>>> Ron Johnson wrote: >>>>>> On 10/18/06 19:57, Rick Gigger wrote: > [snip] >> Not much that is useful. I think this is a little beyond that scope. A >> hardware failure basically left the fs and the db in an inconsistent >> state. There is one table in one database that has a bunch of data in >> it that I need to get out. I'm guessing I'm going to need to find >> someone who understands the the internal structure of the files to go in >> and pull out whatever data is still in tact. > > So, no backup tapes? > >> I have been poking around and as far as I can tell, although one of the >> toast indexes is gone the actual table files appear to be in tact. That >> is they are still in the file system. I don't know if they are ok >> internally. >> >> I also get this error when trying to access the non-toasted data: >> >> ERROR: could not access status of transaction 307904873 >> DETAIL: could not open file "pg_clog/0125": No such file or directory >> >> I'm guessing that this means that I may have get someone to pull out all >> versions of a given tuple because I have lost some of the visibility >> info. This shouldn't matter as most likely very few tuples would have >> had more than one version when the system went down. >> >> I just hope that the relations are need are in tact and that there is >> someone out there who can help me get it out. > > What kernel, version & fs? Linux. 2.4.22. and ext3.
Rick Gigger wrote: > Ron Johnson wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 10/19/06 00:46, Rick Gigger wrote: >>> Ron Johnson wrote: >>>> On 10/18/06 23:52, Rick Gigger wrote: >>>>> Rick Gigger wrote: >>>>>> Ron Johnson wrote: >>>>>>> On 10/18/06 19:57, Rick Gigger wrote: >> [snip] >>> Not much that is useful. I think this is a little beyond that scope. A >>> hardware failure basically left the fs and the db in an inconsistent >>> state. There is one table in one database that has a bunch of data in >>> it that I need to get out. I'm guessing I'm going to need to find >>> someone who understands the the internal structure of the files to go in >>> and pull out whatever data is still in tact. >> >> So, no backup tapes? Oh. Yeah. Everything was backed up but this one thing. Hence the need to get the data from the messed up db. I think we've got it figure out though. We were able to patch up the db enough to extract the data with some help from google and old postings from Tom. Thanks, Rick
On Thu, 19 Oct 2006 06:14:46 -0600, Rick Gigger wrote: > I think we've got it figure out though. We were able to patch up the > db enough to extract the data with some help from google and old postings > from Tom. It would be really great if you put down the specifics of what you googled/old postings/Tom together. You might need it next time. I know I'd like to be ready.
Rick Gigger <rick@alpinenetworking.com> writes: > To make a long story short lets just say that I had a bit of a > hardware failure recently. > > If I got an error like this when trying to dump a db from the mangled > data directory is it safe to say it's totally hosed or is there some > chance of recovery? Why don't you try dumping just the critical tables using pg_dump -t? Also perhaps use psql and try looking up in pg_class for the damaged tables by OID. You may be able to drop just a few tables and then dump the DB normally. This assumes the damaged table(s) are non-critical... I suggest you stop Pg first, take an FS backup of the entire cluster before this so you cahn try various approaches if needed. > pg_dump: ERROR: could not open relation 1663/18392/18400: No such > file or directory > pg_dump: SQL command to dump the contents of table "file" failed: > PQendcopy() failed. > pg_dump: Error message from server: ERROR: could not open relation > 1663/18392/18400: No such file or directory > pg_dump: The command was: COPY public.file (vfs_id, vfs_type, > vfs_path, vfs_name, vfs_modified, vfs_owner, vfs_data) TO stdout; > > Thanks, > > Rick > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- ------------------------------------------------------------------------------- Jerry Sievers 305 854-3001 (home) WWW ECommerce Consultant 305 321-1144 (mobile http://www.JerrySievers.com/
I could have my developer do this if it would be useful to someone else. But in general I think my time would be much better served fixing my backup situation and monitoring them so that this CAN'T happen again. It shouldn't have happened this time. On Oct 19, 2006, at 8:35 AM, Ray Stell wrote: > On Thu, 19 Oct 2006 06:14:46 -0600, Rick Gigger wrote: >> I think we've got it figure out though. We were able to patch up the >> db enough to extract the data with some help from google and old >> postings >> from Tom. > > > It would be really great if you put down the specifics of what you > googled/old postings/Tom together. You might need it next time. > I know I'd like to be ready. > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
> Rick Gigger <rick@alpinenetworking.com> writes: > >> To make a long story short lets just say that I had a bit of a >> hardware failure recently. >> >> If I got an error like this when trying to dump a db from the mangled >> data directory is it safe to say it's totally hosed or is there some >> chance of recovery? > > Why don't you try dumping just the critical tables using pg_dump -t? There was only one table in the db. > Also perhaps use psql and try looking up in pg_class for the damaged > tables by OID. You may be able to drop just a few tables and then > dump the DB normally. This assumes the damaged table(s) are > non-critical... The table I needed was damaged. I dropped the indexes from it that were also damaged but then the table had to be repaired. > I suggest you stop Pg first, take an FS backup of the entire cluster > before this so you cahn try various approaches if needed. That was the first thing I did. Thanks everyone for the help. Luckily one of my developers was able to patch up the table and get 99% of the data out. All of the truly critical data was in another database and was backed up, so the 99% was enough to get through the crisis. Thanks, Rick