Обсуждение: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

Поиск
Список
Период
Сортировка

Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Jean-Pierre Pelletier"
Дата:
Hi,

I've installed PostgreSQL 8.1 beta2 five days ago and it crashed 3 times
since then.
Here is what's been logged for the last crash

2005-10-04 11:00:19 FATAL:  could not read block 121 of relation
1663/16384/2608: Invalid argument
2005-10-04 11:00:20 LOG:  server process (PID 2592) was terminated by signal
1
2005-10-04 11:00:20 LOG:  terminating any other active server processes

Than for each connections, the log has:
2005-10-04 11:00:20 WARNING:  terminating connection because of crash of
another server process
2005-10-04 11:00:20 DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2005-10-04 11:00:20 HINT:  In a moment you should be able to reconnect to
the database and repeat your command.

With this in the end:
2005-10-04 11:00:20 LOG:  all server processes terminated; reinitializing
2005-10-04 11:00:21 LOG:  database system was interrupted at 2005-10-04
10:59:43 Eastern Daylight Time

relation 2608 is pg_depend
----------------------------------------------------------------------------------
The crash before that was on relation pg_type, the first line logged was:
2005-10-03 10:51:06 FATAL:  could not read block 38 of relation
1663/16384/1247: Invalid argument
----------------------------------------------------------------------------------
The first crash was also on relation pg_depend, but with open instead or
read
2005-09-30 18:38:53 FATAL:  could not open relation 1663/16384/2608: Invalid
argument
----------------------------------------------------------------------------------

There was between 14 and 17 connections when these crashes happened.

The database was not reloaded from a backup but created from
.sql scripts for DDL, and data from user tables were reloaded
from files with "copy from".

We are using Windows 2000 Server, Service Pack 4.

Thanks,
Jean-Pierre Pelletier
e-djuster

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Qingqing Zhou"
Дата:
""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote
>
> I've installed PostgreSQL 8.1 beta2 five days ago and it crashed 3 times
> since then.
> Here is what's been logged for the last crash
>
> 2005-10-04 11:00:19 FATAL:  could not read block 121 of relation
> 1663/16384/2608: Invalid argument
>
> relation 2608 is pg_depend
> ----------------------------------------------------------------------------------
> The crash before that was on relation pg_type, the first line logged was:
> 2005-10-03 10:51:06 FATAL:  could not read block 38 of relation
> 1663/16384/1247: Invalid argument
> ----------------------------------------------------------------------------------
> The first crash was also on relation pg_depend, but with open instead or
> read
> 2005-09-30 18:38:53 FATAL:  could not open relation 1663/16384/2608:
> Invalid argument
> ----------------------------------------------------------------------------------
>

This problem was reported several times before, but not necessarily system
tables. Is there any anti-virus softwares installed on the same machine? Is
the database under intensive IO pressure?

Regards,
Qingqing

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Jean-Pierre Pelletier"
Дата:
Yes, there is an antivirus software on the machine, a reboot is needed when
it's turned off,
I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
before that.

There are around 15 connections to PostgreSQL when it crashes but most are
idle
there may be a few inserts but no bulk inserts, the biggest load would come
from
select statements.

Jean-Pierre Pelletier

----- Original Message -----
From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
To: <pgsql-bugs@postgresql.org>
Sent: Wednesday, October 05, 2005 3:03 AM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000


>
> ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote
>>
>> I've installed PostgreSQL 8.1 beta2 five days ago and it crashed 3 times
>> since then.
>> Here is what's been logged for the last crash
>>
>> 2005-10-04 11:00:19 FATAL:  could not read block 121 of relation
>> 1663/16384/2608: Invalid argument
>>
>> relation 2608 is pg_depend
>> ----------------------------------------------------------------------------------
>> The crash before that was on relation pg_type, the first line logged was:
>> 2005-10-03 10:51:06 FATAL:  could not read block 38 of relation
>> 1663/16384/1247: Invalid argument
>> ----------------------------------------------------------------------------------
>> The first crash was also on relation pg_depend, but with open instead or
>> read
>> 2005-09-30 18:38:53 FATAL:  could not open relation 1663/16384/2608:
>> Invalid argument
>> ----------------------------------------------------------------------------------
>>
>
> This problem was reported several times before, but not necessarily system
> tables. Is there any anti-virus softwares installed on the same machine?
> Is the database under intensive IO pressure?
>
> Regards,
> Qingqing
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Qingqing Zhou"
Дата:
""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote in message
news:003801c5c9b0$03e08500$6401a8c0@JP...
>
> Yes, there is an antivirus software on the machine, a reboot is needed
> when it's turned off,
> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
> before that.
>
> There are around 15 connections to PostgreSQL when it crashes but most are
> idle
> there may be a few inserts but no bulk inserts, the biggest load would
> come from
> select statements.
>

We haven't identified that the failed read/write are caused by anti-virus
software or intensive read/write. If you can compile the source, can you
patch smgrread()/smgrwrite() like this to capture the native windows error:

void
smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
isTemp)
{
 if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
               isTemp))
   ereport(ERROR,
    (errcode_for_file_access(),
     errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
      blocknum,
      reln->smgr_rnode.spcNode,
      reln->smgr_rnode.dbNode,
      reln->smgr_rnode.relNode,
      GetLastError())));
}

Regards,
Qingqing

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Jean-Pierre Pelletier"
Дата:
I'll recompile with the trace that's no problem,
and install the patched release tonight.

After your last email, I've excluded the postgreSQL
directory from the antivirus because I could do it without
rebooting.

I was also sometimes getting read/write or open
error Invalid argument without the server crashing.
After two days, if I haven't seen any of these
error messages there is a very high chance that it's
been fixed by turning off the antivirus.

Jean-Pierre Pelletier

----- Original Message -----
From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
To: <pgsql-bugs@postgresql.org>
Sent: Wednesday, October 05, 2005 5:16 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000


>
> ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote in message
> news:003801c5c9b0$03e08500$6401a8c0@JP...
>>
>> Yes, there is an antivirus software on the machine, a reboot is needed
>> when it's turned off,
>> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
>> before that.
>>
>> There are around 15 connections to PostgreSQL when it crashes but most
>> are idle
>> there may be a few inserts but no bulk inserts, the biggest load would
>> come from
>> select statements.
>>
>
> We haven't identified that the failed read/write are caused by anti-virus
> software or intensive read/write. If you can compile the source, can you
> patch smgrread()/smgrwrite() like this to capture the native windows
> error:
>
> void
> smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
> isTemp)
> {
> if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
>               isTemp))
>   ereport(ERROR,
>    (errcode_for_file_access(),
>     errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
>      blocknum,
>      reln->smgr_rnode.spcNode,
>      reln->smgr_rnode.dbNode,
>      reln->smgr_rnode.relNode,
>      GetLastError())));
> }
>
> Regards,
> Qingqing
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Jean-Pierre Pelletier"
Дата:
Turning off the antivirus fixed the problem.
We haven't have any read/write/open error in more
than  two days.

Thank you very much for your help and keep up the good work.

Our only remaining PostgreSQL problem is with pg_stat_actitivity
being unreliable and the statistics collector being restarted many times
every day.

Any idea what might be causing that?

Jean-Pierre Pelletier

----- Original Message -----
From: "Jean-Pierre Pelletier" <pelletier_32@sympatico.ca>
To: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
Cc: <pgsql-bugs@postgresql.org>
Sent: Wednesday, October 05, 2005 2:58 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000


> I'll recompile with the trace that's no problem,
> and install the patched release tonight.
>
> After your last email, I've excluded the postgreSQL
> directory from the antivirus because I could do it without
> rebooting.
>
> I was also sometimes getting read/write or open
> error Invalid argument without the server crashing.
> After two days, if I haven't seen any of these
> error messages there is a very high chance that it's
> been fixed by turning off the antivirus.
>
> Jean-Pierre Pelletier
>
> ----- Original Message -----
> From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
> To: <pgsql-bugs@postgresql.org>
> Sent: Wednesday, October 05, 2005 5:16 PM
> Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1
> beta2,
> Windows 2000
>
>
>>
>> ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote in message
>> news:003801c5c9b0$03e08500$6401a8c0@JP...
>>>
>>> Yes, there is an antivirus software on the machine, a reboot is needed
>>> when it's turned off,
>>> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
>>> before that.
>>>
>>> There are around 15 connections to PostgreSQL when it crashes but most
>>> are idle
>>> there may be a few inserts but no bulk inserts, the biggest load would
>>> come from
>>> select statements.
>>>
>>
>> We haven't identified that the failed read/write are caused by anti-virus
>> software or intensive read/write. If you can compile the source, can you
>> patch smgrread()/smgrwrite() like this to capture the native windows
>> error:
>>
>> void
>> smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
>> isTemp)
>> {
>> if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
>>               isTemp))
>>   ereport(ERROR,
>>    (errcode_for_file_access(),
>>     errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
>>      blocknum,
>>      reln->smgr_rnode.spcNode,
>>      reln->smgr_rnode.dbNode,
>>      reln->smgr_rnode.relNode,
>>      GetLastError())));
>> }
>>
>> Regards,
>> Qingqing
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
Alvaro Herrera
Дата:
On Fri, Oct 07, 2005 at 11:19:25AM -0400, Jean-Pierre Pelletier wrote:

> Our only remaining PostgreSQL problem is with pg_stat_actitivity
> being unreliable and the statistics collector being restarted many times
> every day.

The stats collector (which mantains pg_stat_activity among other things)
uses an UDP socket to receive info from the backends, so if UDP
communication is crippled, it's going to be unreliable.  Maybe there are
too many lost packets.  I don't know what could cause it to die though
-- certainly not lost packets.  (The postmaster restarts it
automatically if it detects it's not running.)

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Qingqing Zhou"
Дата:
""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote
> Turning off the antivirus fixed the problem.
> We haven't have any read/write/open error in more
> than  two days.
>
> Thank you very much for your help and keep up the good work.
>

You are welcome :-) But I still suspect if this really solves the problem
... by the way, may I know what anti-virus software are you using? And, if
it is possible, can you please turn on the anti-virus software again and
check the GetLastError()?

A more detailed "guess" of the problem is here:
http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php

Thanks a lot,
Qingqing

Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

От
"Jean-Pierre Pelletier"
Дата:
The antivirus is CA eTrust EZ v 7.0.6.7.

I cannot put back the antivirus on that server
because it is now in production mode.

Jean-Pierre Pelletier

----- Original Message -----
From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
To: <pgsql-bugs@postgresql.org>
Sent: Friday, October 07, 2005 3:08 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000


>
> ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote
>> Turning off the antivirus fixed the problem.
>> We haven't have any read/write/open error in more
>> than  two days.
>>
>> Thank you very much for your help and keep up the good work.
>>
>
> You are welcome :-) But I still suspect if this really solves the problem
> ... by the way, may I know what anti-virus software are you using? And, if
> it is possible, can you please turn on the anti-virus software again and
> check the GetLastError()?
>
> A more detailed "guess" of the problem is here:
> http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php
>
> Thanks a lot,
> Qingqing
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly