Обсуждение: Crash with data corruption under Windows

Поиск
Список
Период
Сортировка

Crash with data corruption under Windows

От
Nicola Mauri
Дата:
We run into the following issue with Windows 2003 and Postgres 8.2.6
while database was running:

   FATAL:  "pg_tblspc/16405/37638" is not a valid data directory
   DETAIL:  File "pg_tblspc/16405/37638/PG_VERSION" is missing.
   ERROR:  could not open relation 16405/37638/2661: No such file or
directory
   ERROR:  could not open relation 16405/37638/2659: No such file or
directory
   ERROR:  could not write block 4 of relation 16405/37638/37656:
Permission denied
   CONTEXT:  writing block 4 of relation 16405/37638/37656
   ...
   WARNING:  could not write block 4 of 16405/37638/37656
   DETAIL:  Multiple failures --- write error may be permanent.

This happened 4 times in the last few months! Usually after the crash
datafiles appear to be corrupted, but in some other cases they
completely disappear from the filesystem (tablespace directory is empty)
and we have to recreate the entire db from the last dump.

No suspicious activities have been detected on the server (unauthorized
accesses, anti-virus intervention) and information about disappeared
files cannot be found using an undelete utilities. Disk hardware is
healthy and no other part of the filesystem seems to be affected by such
strange deletions (several applications, including an oracle database,
are correctly running on the server).

Since the problem seems involving only directories containing
tablespaces (stored on local partition E:\) we are pointing our
attention to "Reparse Point" and "NTFS Junction" mechanism.

Could be there issues in those features?

Thanks in advance,
Nicola Mauri

Re: Crash with data corruption under Windows

От
Kenneth Marshall
Дата:
On Fri, Feb 20, 2009 at 03:32:16PM +0100, Nicola Mauri wrote:
>
> We run into the following issue with Windows 2003 and Postgres 8.2.6 while
> database was running:
>
>   FATAL:  "pg_tblspc/16405/37638" is not a valid data directory
>   DETAIL:  File "pg_tblspc/16405/37638/PG_VERSION" is missing.
>   ERROR:  could not open relation 16405/37638/2661: No such file or
> directory
>   ERROR:  could not open relation 16405/37638/2659: No such file or
> directory
>   ERROR:  could not write block 4 of relation 16405/37638/37656: Permission
> denied
>   CONTEXT:  writing block 4 of relation 16405/37638/37656
>   ...
>   WARNING:  could not write block 4 of 16405/37638/37656
>   DETAIL:  Multiple failures --- write error may be permanent.
>
> This happened 4 times in the last few months! Usually after the crash
> datafiles appear to be corrupted, but in some other cases they completely
> disappear from the filesystem (tablespace directory is empty) and we have
> to recreate the entire db from the last dump.
>
> No suspicious activities have been detected on the server (unauthorized
> accesses, anti-virus intervention) and information about disappeared files
> cannot be found using an undelete utilities. Disk hardware is healthy and
> no other part of the filesystem seems to be affected by such strange
> deletions (several applications, including an oracle database, are
> correctly running on the server).
>
> Since the problem seems involving only directories containing tablespaces
> (stored on local partition E:\) we are pointing our attention to "Reparse
> Point" and "NTFS Junction" mechanism.
>
> Could be there issues in those features?
>
> Thanks in advance,
> Nicola Mauri
>
I did not check the release note, but you do realize that you are
6 releases back from the latest stable 8.2 version. Maybe an upgrade
would help.

Cheers,
Ken

Re: Crash with data corruption under Windows

От
Scott Marlowe
Дата:
On Fri, Feb 20, 2009 at 7:32 AM, Nicola Mauri <nicola.mauri@saga.it> wrote:
>
> We run into the following issue with Windows 2003 and Postgres 8.2.6 while
> database was running:
>
>  FATAL:  "pg_tblspc/16405/37638" is not a valid data directory
>  DETAIL:  File "pg_tblspc/16405/37638/PG_VERSION" is missing.
>  ERROR:  could not open relation 16405/37638/2661: No such file or directory
>  ERROR:  could not open relation 16405/37638/2659: No such file or directory
>  ERROR:  could not write block 4 of relation 16405/37638/37656: Permission
> denied

Usually when I see the permission denied thing there's anti-virus
software hard locking pgsql files in the middle of the day.

Re: Crash with data corruption under Windows

От
Nicola Mauri
Дата:
> On Fri, Feb 20, 2009 at 03:32:16PM +0100, Nicola Mauri wrote:
>
>> We run into the following issue with Windows 2003 and Postgres 8.2.6 while
>> database was running:
>>
>>   FATAL:  "pg_tblspc/16405/37638" is not a valid data directory
>>   DETAIL:  File "pg_tblspc/16405/37638/PG_VERSION" is missing.
>>   ERROR:  could not open relation 16405/37638/2661: No such file or
>> directory
>>   ERROR:  could not open relation 16405/37638/2659: No such file or
>> directory
>>   ERROR:  could not write block 4 of relation 16405/37638/37656: Permission
>> denied
>>   CONTEXT:  writing block 4 of relation 16405/37638/37656
>>   ...
>>   WARNING:  could not write block 4 of 16405/37638/37656
>>   DETAIL:  Multiple failures --- write error may be permanent.
>>
>> This happened 4 times in the last few months! Usually after the crash
>> datafiles appear to be corrupted, but in some other cases they completely
>> disappear from the filesystem (tablespace directory is empty) and we have
>> to recreate the entire db from the last dump.
>>
>> No suspicious activities have been detected on the server (unauthorized
>> accesses, anti-virus intervention) and information about disappeared files
>> cannot be found using an undelete utilities. Disk hardware is healthy and
>> no other part of the filesystem seems to be affected by such strange
>> deletions (several applications, including an oracle database, are
>> correctly running on the server).
>>
>> Since the problem seems involving only directories containing tablespaces
>> (stored on local partition E:\) we are pointing our attention to "Reparse
>> Point" and "NTFS Junction" mechanism.
>>
>> Could be there issues in those features?
>>
>> Thanks in advance,
>> Nicola Mauri
>>
>>
> I did not check the release note, but you do realize that you are
> 6 releases back from the latest stable 8.2 version. Maybe an upgrade
> would help.
>
> Cheers,
> Ken
>
Yes, we now upgraded to 8.2.11, but not sure if this will help and
prevent data losses in the future.
So we wonder if someone has some hints.
thanks
Nicola

Re: Crash with data corruption under Windows

От
Nicola Mauri
Дата:
-------- Messaggio Originale  --------
Oggetto: Re: [ADMIN] Crash with data corruption under Windows
Da: Scott Marlowe <scott.marlowe@gmail.com>
> Usually when I see the permission denied thing there's anti-virus
> software hard locking pgsql files in the middle of the day.
>
Have you experienced data loss too in that case?
(however, we already excluded pgsql directories from anti-virus scanning).

regards,
Nicola

Re: Crash with data corruption under Windows

От
Scott Marlowe
Дата:
On Mon, Feb 23, 2009 at 9:49 AM, Nicola Mauri <nicola.mauri@saga.it> wrote:
>
> -------- Messaggio Originale  --------
> Oggetto: Re: [ADMIN] Crash with data corruption under Windows
> Da: Scott Marlowe <scott.marlowe@gmail.com>
>>
>> Usually when I see the permission denied thing there's anti-virus
>> software hard locking pgsql files in the middle of the day.
>>
>
> Have you experienced data loss too in that case?
> (however, we already excluded pgsql directories from anti-virus scanning).

I've not, but I rarely run postgresql under windows.  When we do it's
on a laptop for personal use / demo use.  All our servers run Linux.
However, on this list, when I've seen this happen, it has been known
to result in data loss.

Note that once your database is corrupted, no amount of fixing can
make it safe to keep using.  you need to get a good reliable dump from
it, reinitdb and reload your data. So not trust a database that's
gotten corrupted and seems mostly fixed.  It's a timebomb.

Re: Crash with data corruption under Windows

От
Nicola Mauri
Дата:

-------- Messaggio Originale  --------
Oggetto: Re: [ADMIN] Crash with data corruption under Windows
Da: Scott Marlowe <scott.marlowe@gmail.com>
> On Mon, Feb 23, 2009 at 9:49 AM, Nicola Mauri <nicola.mauri@saga.it> wrote:
>
>> -------- Messaggio Originale  --------
>> Oggetto: Re: [ADMIN] Crash with data corruption under Windows
>> Da: Scott Marlowe <scott.marlowe@gmail.com>
>>
>>> Usually when I see the permission denied thing there's anti-virus
>>> software hard locking pgsql files in the middle of the day.
>>>
>>>
>> Have you experienced data loss too in that case?
>> (however, we already excluded pgsql directories from anti-virus scanning).
>>
>
> I've not, but I rarely run postgresql under windows.  When we do it's
> on a laptop for personal use / demo use.  All our servers run Linux.
> However, on this list, when I've seen this happen, it has been known
> to result in data loss.
>
> Note that once your database is corrupted, no amount of fixing can
> make it safe to keep using.  you need to get a good reliable dump from
> it, reinitdb and reload your data. So not trust a database that's
> gotten corrupted and seems mostly fixed.  It's a timebomb.
>
>
Hi Scott,
we had to recreate the cluster and reload data from a dump, because we
were unable to read any table (some critical indexes was broken).
However we would like to try to recovery the most recent data from the
corrupted datafiles (as a help for manual insertion). Is there any
method to extract raw data from the pgsql table files?

regards,
Nicola