Обсуждение: shmctl EIDRM preventing startup
One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64) appears to be in the same state after a reboot as the server in the "Restart after poweroutage" thread from a few months ago: http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php As in the thread, "ipcs -a" shows no postgres-owned shared memory segments and strace shows shmctl() failing with EIDRM. http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php I have only limited access to the box and I haven't found out why it was rebooted. I don't think it was a scheduled reboot so it might have been due to a power outage. Has anybody figured out if this is a Linux kernel bug? I might have until Monday morning if anybody can suggest something to look at; after that the admins will probably reboot and/or remove postmaster.pid to get the database running again. Thanks. -- Michael Fuhr
Michael Fuhr <mike@fuhr.org> writes: > One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64) > appears to be in the same state after a reboot as the server in the > "Restart after poweroutage" thread from a few months ago: > http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php Interesting indeed. Lapham's report was on FC6 which uses a kernel vastly newer than RHEL4 (2.6.20) but his was also x86_64, which might be relevant. I recall trying a little bit to reproduce the problem after updating my own x86_64 box to FC6, but without success. > Has anybody figured out if this is a Linux kernel bug? I might > have until Monday morning if anybody can suggest something to look > at; after that the admins will probably reboot and/or remove > postmaster.pid to get the database running again. Is it possible/reasonable/practical to (a) hold off longer than that and (b) get me access to the box? On Monday I'd have a chance to involve some Red Hat kernel folk in looking at it. regards, tom lane
Michael Fuhr wrote: > One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64) > appears to be in the same state after a reboot as the server in the > "Restart after poweroutage" thread from a few months ago: > > http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php > > As in the thread, "ipcs -a" shows no postgres-owned shared memory > segments and strace shows shmctl() failing with EIDRM. > > http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php Maybe what is happening is that an entirely unrelated process created a segment with that ID, attached to it, and then it was deleted. I don't know how to check however. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Maybe what is happening is that an entirely unrelated process created a > segment with that ID, attached to it, and then it was deleted. I don't > know how to check however. AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still exists because there are still processes attached to it. So the thing to look for is processes still attached. Not 100% sure how to do that, but I'm sure the info is exposed under /proc somehow... regards, tom lane
On Sun, Jul 01, 2007 at 10:06:58PM -0400, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > Has anybody figured out if this is a Linux kernel bug? I might > > have until Monday morning if anybody can suggest something to look > > at; after that the admins will probably reboot and/or remove > > postmaster.pid to get the database running again. > > Is it possible/reasonable/practical to (a) hold off longer than that > and (b) get me access to the box? On Monday I'd have a chance to > involve some Red Hat kernel folk in looking at it. Possibly; I'll see what I can do. How early Monday do you think everybody would be available? -- Michael Fuhr
On Sun, Jul 01, 2007 at 10:39:01PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Maybe what is happening is that an entirely unrelated process created a > > segment with that ID, attached to it, and then it was deleted. I don't > > know how to check however. > > AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still > exists because there are still processes attached to it. So the thing > to look for is processes still attached. Not 100% sure how to do that, > but I'm sure the info is exposed under /proc somehow... If it's installed, this: lsof |grep SYSV Will list all processes attached to a SHM segemtn on the system. I think ipcs can do the same. You can grep /proc/*/maps for the same info. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Вложения
On Mon, Jul 02, 2007 at 01:05:35PM +0200, Martijn van Oosterhout wrote: > If it's installed, this: > > lsof |grep SYSV > > Will list all processes attached to a SHM segemtn on the system. I > think ipcs can do the same. You can grep /proc/*/maps for the same > info. I already tried those; none show the shared memory key that the postmaster is complaining about. -- Michael Fuhr