Обсуждение: shmctl EIDRM preventing startup

Поиск
Список
Период
Сортировка

shmctl EIDRM preventing startup

От
Michael Fuhr
Дата:
One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
appears to be in the same state after a reboot as the server in the
"Restart after poweroutage" thread from a few months ago:

http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php

As in the thread, "ipcs -a" shows no postgres-owned shared memory
segments and strace shows shmctl() failing with EIDRM.

http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php

I have only limited access to the box and I haven't found out why
it was rebooted.  I don't think it was a scheduled reboot so it
might have been due to a power outage.

Has anybody figured out if this is a Linux kernel bug?  I might
have until Monday morning if anybody can suggest something to look
at; after that the admins will probably reboot and/or remove
postmaster.pid to get the database running again.

Thanks.

--
Michael Fuhr

Re: shmctl EIDRM preventing startup

От
Tom Lane
Дата:
Michael Fuhr <mike@fuhr.org> writes:
> One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
> appears to be in the same state after a reboot as the server in the
> "Restart after poweroutage" thread from a few months ago:

> http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php

Interesting indeed.  Lapham's report was on FC6 which uses a kernel
vastly newer than RHEL4 (2.6.20) but his was also x86_64, which might
be relevant.  I recall trying a little bit to reproduce the problem
after updating my own x86_64 box to FC6, but without success.

> Has anybody figured out if this is a Linux kernel bug?  I might
> have until Monday morning if anybody can suggest something to look
> at; after that the admins will probably reboot and/or remove
> postmaster.pid to get the database running again.

Is it possible/reasonable/practical to (a) hold off longer than that
and (b) get me access to the box?  On Monday I'd have a chance to
involve some Red Hat kernel folk in looking at it.

            regards, tom lane

Re: shmctl EIDRM preventing startup

От
Alvaro Herrera
Дата:
Michael Fuhr wrote:
> One of the servers I use (RHEL AS 4; Linux 2.6.9-34.ELsmp x86_64)
> appears to be in the same state after a reboot as the server in the
> "Restart after poweroutage" thread from a few months ago:
>
> http://archives.postgresql.org/pgsql-general/2007-03/msg00738.php
>
> As in the thread, "ipcs -a" shows no postgres-owned shared memory
> segments and strace shows shmctl() failing with EIDRM.
>
> http://archives.postgresql.org/pgsql-general/2007-03/msg00743.php

Maybe what is happening is that an entirely unrelated process created a
segment with that ID, attached to it, and then it was deleted.  I don't
know how to check however.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: shmctl EIDRM preventing startup

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Maybe what is happening is that an entirely unrelated process created a
> segment with that ID, attached to it, and then it was deleted.  I don't
> know how to check however.

AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still
exists because there are still processes attached to it.  So the thing
to look for is processes still attached.  Not 100% sure how to do that,
but I'm sure the info is exposed under /proc somehow...

            regards, tom lane

Re: shmctl EIDRM preventing startup

От
Michael Fuhr
Дата:
On Sun, Jul 01, 2007 at 10:06:58PM -0400, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Has anybody figured out if this is a Linux kernel bug?  I might
> > have until Monday morning if anybody can suggest something to look
> > at; after that the admins will probably reboot and/or remove
> > postmaster.pid to get the database running again.
>
> Is it possible/reasonable/practical to (a) hold off longer than that
> and (b) get me access to the box?  On Monday I'd have a chance to
> involve some Red Hat kernel folk in looking at it.

Possibly; I'll see what I can do.  How early Monday do you think
everybody would be available?

--
Michael Fuhr

Re: shmctl EIDRM preventing startup

От
Martijn van Oosterhout
Дата:
On Sun, Jul 01, 2007 at 10:39:01PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Maybe what is happening is that an entirely unrelated process created a
> > segment with that ID, attached to it, and then it was deleted.  I don't
> > know how to check however.
>
> AFAIK, EIDRM should imply that the segment has been IPC_RMID'd but still
> exists because there are still processes attached to it.  So the thing
> to look for is processes still attached.  Not 100% sure how to do that,
> but I'm sure the info is exposed under /proc somehow...

If it's installed, this:

lsof |grep SYSV

Will list all processes attached to a SHM segemtn on the system. I
think ipcs can do the same. You can grep /proc/*/maps for the same
info.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: shmctl EIDRM preventing startup

От
Michael Fuhr
Дата:
On Mon, Jul 02, 2007 at 01:05:35PM +0200, Martijn van Oosterhout wrote:
> If it's installed, this:
>
> lsof |grep SYSV
>
> Will list all processes attached to a SHM segemtn on the system. I
> think ipcs can do the same. You can grep /proc/*/maps for the same
> info.

I already tried those; none show the shared memory key that the
postmaster is complaining about.

--
Michael Fuhr