Обсуждение: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
От
"Jean-Pierre Pelletier"
Дата:
Hi, I've installed PostgreSQL 8.1 beta2 five days ago and it crashed 3 times since then. Here is what's been logged for the last crash 2005-10-04 11:00:19 FATAL: could not read block 121 of relation 1663/16384/2608: Invalid argument 2005-10-04 11:00:20 LOG: server process (PID 2592) was terminated by signal 1 2005-10-04 11:00:20 LOG: terminating any other active server processes Than for each connections, the log has: 2005-10-04 11:00:20 WARNING: terminating connection because of crash of another server process 2005-10-04 11:00:20 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2005-10-04 11:00:20 HINT: In a moment you should be able to reconnect to the database and repeat your command. With this in the end: 2005-10-04 11:00:20 LOG: all server processes terminated; reinitializing 2005-10-04 11:00:21 LOG: database system was interrupted at 2005-10-04 10:59:43 Eastern Daylight Time relation 2608 is pg_depend ---------------------------------------------------------------------------------- The crash before that was on relation pg_type, the first line logged was: 2005-10-03 10:51:06 FATAL: could not read block 38 of relation 1663/16384/1247: Invalid argument ---------------------------------------------------------------------------------- The first crash was also on relation pg_depend, but with open instead or read 2005-09-30 18:38:53 FATAL: could not open relation 1663/16384/2608: Invalid argument ---------------------------------------------------------------------------------- There was between 14 and 17 connections when these crashes happened. The database was not reloaded from a backup but created from .sql scripts for DDL, and data from user tables were reloaded from files with "copy from". We are using Windows 2000 Server, Service Pack 4. Thanks, Jean-Pierre Pelletier e-djuster
""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote > > I've installed PostgreSQL 8.1 beta2 five days ago and it crashed 3 times > since then. > Here is what's been logged for the last crash > > 2005-10-04 11:00:19 FATAL: could not read block 121 of relation > 1663/16384/2608: Invalid argument > > relation 2608 is pg_depend > ---------------------------------------------------------------------------------- > The crash before that was on relation pg_type, the first line logged was: > 2005-10-03 10:51:06 FATAL: could not read block 38 of relation > 1663/16384/1247: Invalid argument > ---------------------------------------------------------------------------------- > The first crash was also on relation pg_depend, but with open instead or > read > 2005-09-30 18:38:53 FATAL: could not open relation 1663/16384/2608: > Invalid argument > ---------------------------------------------------------------------------------- > This problem was reported several times before, but not necessarily system tables. Is there any anti-virus softwares installed on the same machine? Is the database under intensive IO pressure? Regards, Qingqing
Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
От
"Jean-Pierre Pelletier"
Дата:
Yes, there is an antivirus software on the machine, a reboot is needed when it's turned off, I'll be allowed to reboot it tonight or I'll do it sooner if it crashes before that. There are around 15 connections to PostgreSQL when it crashes but most are idle there may be a few inserts but no bulk inserts, the biggest load would come from select statements. Jean-Pierre Pelletier ----- Original Message ----- From: "Qingqing Zhou" <zhouqq@cs.toronto.edu> To: <pgsql-bugs@postgresql.org> Sent: Wednesday, October 05, 2005 3:03 AM Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000 > > ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote >> >> I've installed PostgreSQL 8.1 beta2 five days ago and it crashed 3 times >> since then. >> Here is what's been logged for the last crash >> >> 2005-10-04 11:00:19 FATAL: could not read block 121 of relation >> 1663/16384/2608: Invalid argument >> >> relation 2608 is pg_depend >> ---------------------------------------------------------------------------------- >> The crash before that was on relation pg_type, the first line logged was: >> 2005-10-03 10:51:06 FATAL: could not read block 38 of relation >> 1663/16384/1247: Invalid argument >> ---------------------------------------------------------------------------------- >> The first crash was also on relation pg_depend, but with open instead or >> read >> 2005-09-30 18:38:53 FATAL: could not open relation 1663/16384/2608: >> Invalid argument >> ---------------------------------------------------------------------------------- >> > > This problem was reported several times before, but not necessarily system > tables. Is there any anti-virus softwares installed on the same machine? > Is the database under intensive IO pressure? > > Regards, > Qingqing > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote in message news:003801c5c9b0$03e08500$6401a8c0@JP... > > Yes, there is an antivirus software on the machine, a reboot is needed > when it's turned off, > I'll be allowed to reboot it tonight or I'll do it sooner if it crashes > before that. > > There are around 15 connections to PostgreSQL when it crashes but most are > idle > there may be a few inserts but no bulk inserts, the biggest load would > come from > select statements. > We haven't identified that the failed read/write are caused by anti-virus software or intensive read/write. If you can compile the source, can you patch smgrread()/smgrwrite() like this to capture the native windows error: void smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool isTemp) { if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer, isTemp)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not write block %u of relation %u/%u/%u:%d: %m", blocknum, reln->smgr_rnode.spcNode, reln->smgr_rnode.dbNode, reln->smgr_rnode.relNode, GetLastError()))); } Regards, Qingqing
Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
От
"Jean-Pierre Pelletier"
Дата:
I'll recompile with the trace that's no problem, and install the patched release tonight. After your last email, I've excluded the postgreSQL directory from the antivirus because I could do it without rebooting. I was also sometimes getting read/write or open error Invalid argument without the server crashing. After two days, if I haven't seen any of these error messages there is a very high chance that it's been fixed by turning off the antivirus. Jean-Pierre Pelletier ----- Original Message ----- From: "Qingqing Zhou" <zhouqq@cs.toronto.edu> To: <pgsql-bugs@postgresql.org> Sent: Wednesday, October 05, 2005 5:16 PM Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000 > > ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote in message > news:003801c5c9b0$03e08500$6401a8c0@JP... >> >> Yes, there is an antivirus software on the machine, a reboot is needed >> when it's turned off, >> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes >> before that. >> >> There are around 15 connections to PostgreSQL when it crashes but most >> are idle >> there may be a few inserts but no bulk inserts, the biggest load would >> come from >> select statements. >> > > We haven't identified that the failed read/write are caused by anti-virus > software or intensive read/write. If you can compile the source, can you > patch smgrread()/smgrwrite() like this to capture the native windows > error: > > void > smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool > isTemp) > { > if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer, > isTemp)) > ereport(ERROR, > (errcode_for_file_access(), > errmsg("could not write block %u of relation %u/%u/%u:%d: %m", > blocknum, > reln->smgr_rnode.spcNode, > reln->smgr_rnode.dbNode, > reln->smgr_rnode.relNode, > GetLastError()))); > } > > Regards, > Qingqing > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
От
"Jean-Pierre Pelletier"
Дата:
Turning off the antivirus fixed the problem. We haven't have any read/write/open error in more than two days. Thank you very much for your help and keep up the good work. Our only remaining PostgreSQL problem is with pg_stat_actitivity being unreliable and the statistics collector being restarted many times every day. Any idea what might be causing that? Jean-Pierre Pelletier ----- Original Message ----- From: "Jean-Pierre Pelletier" <pelletier_32@sympatico.ca> To: "Qingqing Zhou" <zhouqq@cs.toronto.edu> Cc: <pgsql-bugs@postgresql.org> Sent: Wednesday, October 05, 2005 2:58 PM Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000 > I'll recompile with the trace that's no problem, > and install the patched release tonight. > > After your last email, I've excluded the postgreSQL > directory from the antivirus because I could do it without > rebooting. > > I was also sometimes getting read/write or open > error Invalid argument without the server crashing. > After two days, if I haven't seen any of these > error messages there is a very high chance that it's > been fixed by turning off the antivirus. > > Jean-Pierre Pelletier > > ----- Original Message ----- > From: "Qingqing Zhou" <zhouqq@cs.toronto.edu> > To: <pgsql-bugs@postgresql.org> > Sent: Wednesday, October 05, 2005 5:16 PM > Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 > beta2, > Windows 2000 > > >> >> ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote in message >> news:003801c5c9b0$03e08500$6401a8c0@JP... >>> >>> Yes, there is an antivirus software on the machine, a reboot is needed >>> when it's turned off, >>> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes >>> before that. >>> >>> There are around 15 connections to PostgreSQL when it crashes but most >>> are idle >>> there may be a few inserts but no bulk inserts, the biggest load would >>> come from >>> select statements. >>> >> >> We haven't identified that the failed read/write are caused by anti-virus >> software or intensive read/write. If you can compile the source, can you >> patch smgrread()/smgrwrite() like this to capture the native windows >> error: >> >> void >> smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool >> isTemp) >> { >> if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer, >> isTemp)) >> ereport(ERROR, >> (errcode_for_file_access(), >> errmsg("could not write block %u of relation %u/%u/%u:%d: %m", >> blocknum, >> reln->smgr_rnode.spcNode, >> reln->smgr_rnode.dbNode, >> reln->smgr_rnode.relNode, >> GetLastError()))); >> } >> >> Regards, >> Qingqing >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
On Fri, Oct 07, 2005 at 11:19:25AM -0400, Jean-Pierre Pelletier wrote: > Our only remaining PostgreSQL problem is with pg_stat_actitivity > being unreliable and the statistics collector being restarted many times > every day. The stats collector (which mantains pg_stat_activity among other things) uses an UDP socket to receive info from the backends, so if UDP communication is crippled, it's going to be unreliable. Maybe there are too many lost packets. I don't know what could cause it to die though -- certainly not lost packets. (The postmaster restarts it automatically if it detects it's not running.) -- Alvaro Herrera http://www.advogato.org/person/alvherre "Everybody understands Mickey Mouse. Few understand Hermann Hesse. Hardly anybody understands Einstein. And nobody understands Emperor Norton."
""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote > Turning off the antivirus fixed the problem. > We haven't have any read/write/open error in more > than two days. > > Thank you very much for your help and keep up the good work. > You are welcome :-) But I still suspect if this really solves the problem ... by the way, may I know what anti-virus software are you using? And, if it is possible, can you please turn on the anti-virus software again and check the GetLastError()? A more detailed "guess" of the problem is here: http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php Thanks a lot, Qingqing
Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
От
"Jean-Pierre Pelletier"
Дата:
The antivirus is CA eTrust EZ v 7.0.6.7. I cannot put back the antivirus on that server because it is now in production mode. Jean-Pierre Pelletier ----- Original Message ----- From: "Qingqing Zhou" <zhouqq@cs.toronto.edu> To: <pgsql-bugs@postgresql.org> Sent: Friday, October 07, 2005 3:08 PM Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000 > > ""Jean-Pierre Pelletier"" <pelletier_32@sympatico.ca> wrote >> Turning off the antivirus fixed the problem. >> We haven't have any read/write/open error in more >> than two days. >> >> Thank you very much for your help and keep up the good work. >> > > You are welcome :-) But I still suspect if this really solves the problem > ... by the way, may I know what anti-virus software are you using? And, if > it is possible, can you please turn on the anti-virus software again and > check the GetLastError()? > > A more detailed "guess" of the problem is here: > http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php > > Thanks a lot, > Qingqing > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly