Обсуждение: Crash with data corruption under Windows
We run into the following issue with Windows 2003 and Postgres 8.2.6 while database was running: FATAL: "pg_tblspc/16405/37638" is not a valid data directory DETAIL: File "pg_tblspc/16405/37638/PG_VERSION" is missing. ERROR: could not open relation 16405/37638/2661: No such file or directory ERROR: could not open relation 16405/37638/2659: No such file or directory ERROR: could not write block 4 of relation 16405/37638/37656: Permission denied CONTEXT: writing block 4 of relation 16405/37638/37656 ... WARNING: could not write block 4 of 16405/37638/37656 DETAIL: Multiple failures --- write error may be permanent. This happened 4 times in the last few months! Usually after the crash datafiles appear to be corrupted, but in some other cases they completely disappear from the filesystem (tablespace directory is empty) and we have to recreate the entire db from the last dump. No suspicious activities have been detected on the server (unauthorized accesses, anti-virus intervention) and information about disappeared files cannot be found using an undelete utilities. Disk hardware is healthy and no other part of the filesystem seems to be affected by such strange deletions (several applications, including an oracle database, are correctly running on the server). Since the problem seems involving only directories containing tablespaces (stored on local partition E:\) we are pointing our attention to "Reparse Point" and "NTFS Junction" mechanism. Could be there issues in those features? Thanks in advance, Nicola Mauri
On Fri, Feb 20, 2009 at 03:32:16PM +0100, Nicola Mauri wrote: > > We run into the following issue with Windows 2003 and Postgres 8.2.6 while > database was running: > > FATAL: "pg_tblspc/16405/37638" is not a valid data directory > DETAIL: File "pg_tblspc/16405/37638/PG_VERSION" is missing. > ERROR: could not open relation 16405/37638/2661: No such file or > directory > ERROR: could not open relation 16405/37638/2659: No such file or > directory > ERROR: could not write block 4 of relation 16405/37638/37656: Permission > denied > CONTEXT: writing block 4 of relation 16405/37638/37656 > ... > WARNING: could not write block 4 of 16405/37638/37656 > DETAIL: Multiple failures --- write error may be permanent. > > This happened 4 times in the last few months! Usually after the crash > datafiles appear to be corrupted, but in some other cases they completely > disappear from the filesystem (tablespace directory is empty) and we have > to recreate the entire db from the last dump. > > No suspicious activities have been detected on the server (unauthorized > accesses, anti-virus intervention) and information about disappeared files > cannot be found using an undelete utilities. Disk hardware is healthy and > no other part of the filesystem seems to be affected by such strange > deletions (several applications, including an oracle database, are > correctly running on the server). > > Since the problem seems involving only directories containing tablespaces > (stored on local partition E:\) we are pointing our attention to "Reparse > Point" and "NTFS Junction" mechanism. > > Could be there issues in those features? > > Thanks in advance, > Nicola Mauri > I did not check the release note, but you do realize that you are 6 releases back from the latest stable 8.2 version. Maybe an upgrade would help. Cheers, Ken
On Fri, Feb 20, 2009 at 7:32 AM, Nicola Mauri <nicola.mauri@saga.it> wrote: > > We run into the following issue with Windows 2003 and Postgres 8.2.6 while > database was running: > > FATAL: "pg_tblspc/16405/37638" is not a valid data directory > DETAIL: File "pg_tblspc/16405/37638/PG_VERSION" is missing. > ERROR: could not open relation 16405/37638/2661: No such file or directory > ERROR: could not open relation 16405/37638/2659: No such file or directory > ERROR: could not write block 4 of relation 16405/37638/37656: Permission > denied Usually when I see the permission denied thing there's anti-virus software hard locking pgsql files in the middle of the day.
> On Fri, Feb 20, 2009 at 03:32:16PM +0100, Nicola Mauri wrote: > >> We run into the following issue with Windows 2003 and Postgres 8.2.6 while >> database was running: >> >> FATAL: "pg_tblspc/16405/37638" is not a valid data directory >> DETAIL: File "pg_tblspc/16405/37638/PG_VERSION" is missing. >> ERROR: could not open relation 16405/37638/2661: No such file or >> directory >> ERROR: could not open relation 16405/37638/2659: No such file or >> directory >> ERROR: could not write block 4 of relation 16405/37638/37656: Permission >> denied >> CONTEXT: writing block 4 of relation 16405/37638/37656 >> ... >> WARNING: could not write block 4 of 16405/37638/37656 >> DETAIL: Multiple failures --- write error may be permanent. >> >> This happened 4 times in the last few months! Usually after the crash >> datafiles appear to be corrupted, but in some other cases they completely >> disappear from the filesystem (tablespace directory is empty) and we have >> to recreate the entire db from the last dump. >> >> No suspicious activities have been detected on the server (unauthorized >> accesses, anti-virus intervention) and information about disappeared files >> cannot be found using an undelete utilities. Disk hardware is healthy and >> no other part of the filesystem seems to be affected by such strange >> deletions (several applications, including an oracle database, are >> correctly running on the server). >> >> Since the problem seems involving only directories containing tablespaces >> (stored on local partition E:\) we are pointing our attention to "Reparse >> Point" and "NTFS Junction" mechanism. >> >> Could be there issues in those features? >> >> Thanks in advance, >> Nicola Mauri >> >> > I did not check the release note, but you do realize that you are > 6 releases back from the latest stable 8.2 version. Maybe an upgrade > would help. > > Cheers, > Ken > Yes, we now upgraded to 8.2.11, but not sure if this will help and prevent data losses in the future. So we wonder if someone has some hints. thanks Nicola
-------- Messaggio Originale -------- Oggetto: Re: [ADMIN] Crash with data corruption under Windows Da: Scott Marlowe <scott.marlowe@gmail.com> > Usually when I see the permission denied thing there's anti-virus > software hard locking pgsql files in the middle of the day. > Have you experienced data loss too in that case? (however, we already excluded pgsql directories from anti-virus scanning). regards, Nicola
On Mon, Feb 23, 2009 at 9:49 AM, Nicola Mauri <nicola.mauri@saga.it> wrote: > > -------- Messaggio Originale -------- > Oggetto: Re: [ADMIN] Crash with data corruption under Windows > Da: Scott Marlowe <scott.marlowe@gmail.com> >> >> Usually when I see the permission denied thing there's anti-virus >> software hard locking pgsql files in the middle of the day. >> > > Have you experienced data loss too in that case? > (however, we already excluded pgsql directories from anti-virus scanning). I've not, but I rarely run postgresql under windows. When we do it's on a laptop for personal use / demo use. All our servers run Linux. However, on this list, when I've seen this happen, it has been known to result in data loss. Note that once your database is corrupted, no amount of fixing can make it safe to keep using. you need to get a good reliable dump from it, reinitdb and reload your data. So not trust a database that's gotten corrupted and seems mostly fixed. It's a timebomb.
-------- Messaggio Originale -------- Oggetto: Re: [ADMIN] Crash with data corruption under Windows Da: Scott Marlowe <scott.marlowe@gmail.com> > On Mon, Feb 23, 2009 at 9:49 AM, Nicola Mauri <nicola.mauri@saga.it> wrote: > >> -------- Messaggio Originale -------- >> Oggetto: Re: [ADMIN] Crash with data corruption under Windows >> Da: Scott Marlowe <scott.marlowe@gmail.com> >> >>> Usually when I see the permission denied thing there's anti-virus >>> software hard locking pgsql files in the middle of the day. >>> >>> >> Have you experienced data loss too in that case? >> (however, we already excluded pgsql directories from anti-virus scanning). >> > > I've not, but I rarely run postgresql under windows. When we do it's > on a laptop for personal use / demo use. All our servers run Linux. > However, on this list, when I've seen this happen, it has been known > to result in data loss. > > Note that once your database is corrupted, no amount of fixing can > make it safe to keep using. you need to get a good reliable dump from > it, reinitdb and reload your data. So not trust a database that's > gotten corrupted and seems mostly fixed. It's a timebomb. > > Hi Scott, we had to recreate the cluster and reload data from a dump, because we were unable to read any table (some critical indexes was broken). However we would like to try to recovery the most recent data from the corrupted datafiles (as a help for manual insertion). Is there any method to extract raw data from the pgsql table files? regards, Nicola