Обсуждение: Protecting against case where shmget says EINVAL instead of EEXIST

Поиск
Список
Период
Сортировка

Protecting against case where shmget says EINVAL instead of EEXIST

От
Tom Lane
Дата:
The thread here
http://archives.postgresql.org/pgsql-admin/2010-04/msg00358.php
shows that current OS X contains the same issue that was complained of
a year or so ago with respect to NetBSD.  Namely, that if shmget finds
an existing shared memory segment that is smaller than the current
request, it will return EINVAL, rather than EEXIST which is what
InternalIpcMemoryCreate is expecting to get for a collision.  This
leads to an unnecessary startup failure with a completely misleading
error message.  It's easy to reproduce on a Mac:

1. kill -9 an existing postmaster.
2. edit postgresql.conf to increase max_connections by 1.
3. try to start postmaster.

You get

FATAL:  could not create shared memory segment: Invalid argument
DETAIL:  Failed system call was shmget(key=5432001, size=29622272, 03600).
HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX
parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request
size(currently 29622272 bytes), reduce PostgreSQL's shared_buffers parameter (currently 3072) and/or its
max_connectionsparameter (currently 105).       If the request size is already small, it's possible that it is less
thanyour kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
ThePostgreSQL documentation contains more information about shared memory configuration.
 

In the previous go-round, the misleading errno was reported to NetBSD
as a kernel bug.  I see from their CVS that they did fix it: see 1.113 in
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/sysv_shm.c

But what now seems clear to me is that this behavior probably exists
in *every* BSD-derived kernel.  It's unlikely that we can get them all
fixed, especially in view of the POSIX standard's wording saying that a
kernel's order of error checking is not guaranteed.  It'd be smarter for
us to install a workaround.

The workaround I'm thinking of is, when we see EINVAL, to try another
shmget with the same key and flags, and size zero.  If this results in
EEXIST or EACCES then handle it as a collision.  Otherwise clean up the
new segment (if we managed to make one, which is unlikely) and report
the original EINVAL.  This depends on the knowledge that these kernels
don't check the size against shmmin/shmmax in the code path where
there's an existing segment, so we will not get an EINVAL on the basis
of the size and will instead see an errno that reflects the collision,
if there is one.

Comments?
        regards, tom lane


Re: Protecting against case where shmget says EINVAL instead of EEXIST

От
Robert Haas
Дата:
On Sat, May 1, 2010 at 12:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The thread here
> http://archives.postgresql.org/pgsql-admin/2010-04/msg00358.php
> shows that current OS X contains the same issue that was complained of
> a year or so ago with respect to NetBSD.  Namely, that if shmget finds
> an existing shared memory segment that is smaller than the current
> request, it will return EINVAL, rather than EEXIST which is what
> InternalIpcMemoryCreate is expecting to get for a collision.  This
> leads to an unnecessary startup failure with a completely misleading
> error message.  It's easy to reproduce on a Mac:
>
> 1. kill -9 an existing postmaster.
> 2. edit postgresql.conf to increase max_connections by 1.
> 3. try to start postmaster.
>
> You get
>
> FATAL:  could not create shared memory segment: Invalid argument
> DETAIL:  Failed system call was shmget(key=5432001, size=29622272, 03600).
> HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX
parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request
size(currently 29622272 bytes), reduce PostgreSQL's shared_buffers parameter (currently 3072) and/or its
max_connectionsparameter (currently 105). 
>        If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in
whichcase raising the request size or reconfiguring SHMMIN is called for. 
>        The PostgreSQL documentation contains more information about shared memory configuration.
>
> In the previous go-round, the misleading errno was reported to NetBSD
> as a kernel bug.  I see from their CVS that they did fix it: see 1.113 in
> http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/sysv_shm.c
>
> But what now seems clear to me is that this behavior probably exists
> in *every* BSD-derived kernel.  It's unlikely that we can get them all
> fixed, especially in view of the POSIX standard's wording saying that a
> kernel's order of error checking is not guaranteed.  It'd be smarter for
> us to install a workaround.
>
> The workaround I'm thinking of is, when we see EINVAL, to try another
> shmget with the same key and flags, and size zero.  If this results in
> EEXIST or EACCES then handle it as a collision.  Otherwise clean up the
> new segment (if we managed to make one, which is unlikely) and report
> the original EINVAL.  This depends on the knowledge that these kernels
> don't check the size against shmmin/shmmax in the code path where
> there's an existing segment, so we will not get an EINVAL on the basis
> of the size and will instead see an errno that reflects the collision,
> if there is one.
>
> Comments?

It seems reasonable, though I couldn't speak to whether it's going to
fully solve the problem.

...Robert