Обсуждение: semaphore usage "port based"?

Поиск
Список
Период
Сортировка

semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
I've got an odd issue that I'm not sure how to fix ... or, if fixing is 
even possible ...

I just put into place a FreeBSD 6.x server ... it has 2 jails running on 
it, and inside of each, I'm trying to run a PostgreSQL 7.4.12 server 
(OpenACS requirement, no choice there) ...

Now, on my older FreeBSD 4.x servers, I have about 17 PostgreSQL servers 
(some 7.2, some 7.4, some 8.x) ... and they all run fine, and they all run 
on port 5432 ...

Now, something in FreeBSD has changed since 4.x that, if you start up a 
second PostgreSQL server on port 5432, the first one starts to generate 
"semctl: Invalid argument" errors ...

If I move one to port 5433, both run great ...

Now, since this *did* work fine with 4.x, the FreeBSD developers have 
obviously changed something that is causing it not to work ... but, since 
'changing port' appears to fix it, I'm wondering if there is something in 
our Semaphore creation code that can be tweaked so that the semaphore side 
of things *thinks* its running on a different port, but it still responses 
to port 5432?

Or, more simply, I think ... is there somewhere in the Semaphore code that 
is using the port # as a 'seed'?

I'm trying to attack things from the FreeBSD side too, to find out what 
has changed, and how to fix it, but figured I might be able to come up 
with a quicker fix from this group ...

Thx ...


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
'k, an excerpt from a thread on the freebsd lists ... I'm not sure how to 
answer:

----
On Sun, Apr 02, 2006 at 05:24:10PM -0300, Marc G. Fournier wrote:
> On Sun, 2 Apr 2006, Kris Kennaway wrote:
>
> >>Right, but why are they doing it *consistently* in FreeBSD 6.x, when 
they
> >>never did it in FreeBSD 4.x?  I have postmaster processes running on 
the
> >>FreeBSD box as far back as November 27th, 2005 ... and have *never*
> >>experienced this problem ... so it isn't PostgreSQL that has changed,
> >>something in FreeBSD has changed :(
> >
> >You'll need to do some debugging to find out which of the two causes
> >of EINVAL are true here (or some undocumented cause).
>
> 'k, right now, the checks in PostgreSQL are just seeing if the result of
> semctl < 0 ... i see from the man page what 'two values' of EINVAL you 
are
> referring to ... but, if they both return the same ERRNO, how do I
> determine which of the two is the cause of the problem? :(

Evaluate context: what other semaphore operations have been performed
previously?

Kris
------

is there any easy way to answer this?  I'm getting the Invalid Argument 
error for SETVAL and IPC_RMID ...

On Sun, 2 Apr 2006, Marc G. Fournier wrote:

>
> I've got an odd issue that I'm not sure how to fix ... or, if fixing is even 
> possible ...
>
> I just put into place a FreeBSD 6.x server ... it has 2 jails running on it, 
> and inside of each, I'm trying to run a PostgreSQL 7.4.12 server (OpenACS 
> requirement, no choice there) ...
>
> Now, on my older FreeBSD 4.x servers, I have about 17 PostgreSQL servers 
> (some 7.2, some 7.4, some 8.x) ... and they all run fine, and they all run on 
> port 5432 ...
>
> Now, something in FreeBSD has changed since 4.x that, if you start up a 
> second PostgreSQL server on port 5432, the first one starts to generate 
> "semctl: Invalid argument" errors ...
>
> If I move one to port 5433, both run great ...
>
> Now, since this *did* work fine with 4.x, the FreeBSD developers have 
> obviously changed something that is causing it not to work ... but, since 
> 'changing port' appears to fix it, I'm wondering if there is something in our 
> Semaphore creation code that can be tweaked so that the semaphore side of 
> things *thinks* its running on a different port, but it still responses to 
> port 5432?
>
> Or, more simply, I think ... is there somewhere in the Semaphore code that is 
> using the port # as a 'seed'?
>
> I'm trying to attack things from the FreeBSD side too, to find out what has 
> changed, and how to fix it, but figured I might be able to come up with a 
> quicker fix from this group ...
>
> Thx ...
>
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> Or, more simply, I think ... is there somewhere in the Semaphore code that 
> is using the port # as a 'seed'?

We use the port number as a basis for selecting the semaphore key (see
semget(2)).  There is code in there to pick a different key value if the
one we first selected appears to be in use; that has to work correctly
if you're going to run multi postmasters on the same port number.  It
sounds like FBSD 6 has done something that broke the key-in-use check.

Look at IpcSemaphoreCreate and InternalIpcSemaphoreCreate in
src/backend/port/sysv_sema.c.  It may be worth stepping through them
with gdb to see what the semget calls are returning.
        regards, tom lane


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
I wrote:
> Look at IpcSemaphoreCreate and InternalIpcSemaphoreCreate in
> src/backend/port/sysv_sema.c.  It may be worth stepping through them
> with gdb to see what the semget calls are returning.

BTW, even before doing that, you should look at "ipcs -s" output to try
to get a clue what's going on.  The EINVAL failures may be because the
second postmaster to start deletes the semaphores created by the first
one.  You could easily see this happening in before-and-after ipcs data
if so.

strace'ing startup of the second postmaster is another approach that
might be easier than gdb'ing.
        regards, tom lane


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
On Sun, 2 Apr 2006, Tom Lane wrote:

> I wrote:
>> Look at IpcSemaphoreCreate and InternalIpcSemaphoreCreate in
>> src/backend/port/sysv_sema.c.  It may be worth stepping through them
>> with gdb to see what the semget calls are returning.
>
> BTW, even before doing that, you should look at "ipcs -s" output to try 
> to get a clue what's going on.  The EINVAL failures may be because the 
> second postmaster to start deletes the semaphores created by the first 
> one.  You could easily see this happening in before-and-after ipcs data 
> if so.

You are right ...

Before:

Semaphores:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP          NSEMS OTIME    CTIME 
s       524288      5432001 --rw-------       70       70       70       70           17 14:44:19 14:44:19
s       524289      5432002 --rw-------       70       70       70       70           17 14:44:19 14:44:19
s       524290      5432003 --rw-------       70       70       70       70           17 14:44:19 14:44:19
s       524291      5432004 --rw-------       70       70       70       70           17 14:44:19 14:44:19
s       524292      5432005 --rw-------       70       70       70       70           17 14:44:19 14:44:19
s       524293      5432006 --rw-------       70       70       70       70           17 20:23:56 14:44:19
s       524294      5432007 --rw-------       70       70       70       70           17 20:23:58 14:44:19

After:

Semaphores:
T           ID          KEY MODE        OWNER    GROUP    CREATOR  CGROUP          NSEMS OTIME    CTIME
s       589824      5432001 --rw-------       70       70       70       70           17 21:38:03 21:38:03
s       589825      5432002 --rw-------       70       70       70       70           17 21:38:03 21:38:03
s       589826      5432003 --rw-------       70       70       70       70           17 21:38:03 21:38:03
s       589827      5432004 --rw-------       70       70       70       70           17 21:38:03 21:38:03
s       589828      5432005 --rw-------       70       70       70       70           17 21:38:03 21:38:03
s       589829      5432006 --rw-------       70       70       70       70           17 21:38:03 21:38:03
s       589830      5432007 --rw-------       70       70       70       70           17 21:38:03 21:38:03

So, our semget() is trying to acquire 5432001, FreeBSD's semget is 
reporting back that its not in use, so the second instance if basically 
'punting' the original one off of it ...

Kris, from the PostgreSQL sources, here is where we try and set the semId 
to use ... is there something we are doing wrong with our code as far as 
FreeBSD 6.x is concerned, such that semget is not returning a negative 
value when the key is already in use?  Or is there a problem with semget() 
in a jail such that it is allowing for the KEY to be reused, instead of 
returning a negative value?

========
static IpcSemaphoreId
InternalIpcSemaphoreCreate(IpcSemaphoreKey semKey, int numSems)
{        int                     semId;
        semId = semget(semKey, numSems, IPC_CREAT | IPC_EXCL | IPCProtection);
        if (semId < 0)        {                /*                 * Fail quietly if error indicates a collision with
existingset.                 * One would expect EEXIST, given that we said IPC_EXCL, but                 * perhaps we
couldget a permission violation instead?  Also,                 * EIDRM might occur if an old set is slated for
destructionbut                 * not gone yet.                 */                if (errno == EEXIST || errno ==
EACCES
#ifdef EIDRM                        || errno == EIDRM
#endif                        )                        return -1;
                /*                 * Else complain and abort                 */                ereport(FATAL,
                    (errmsg("could not create semaphores: %m"),                                 errdetail("Failed
systemcall was semget(%d, %d, 0%o).",                                                   (int) semKey, numSems,
                                        IPC_CREAT | IPC_EXCL | IPCProtection),                                 (errno
==ENOSPC) ?                                 errhint("This error does *not* mean that you have run out of disk space.\n"
                                               "It occurs when either the system limit for the maximum number of "
          "semaphore sets (SEMMNI), or the system wide maximum number of "                "semaphores (SEMMNS), would
beexceeded.  You need to raise the "                                                 "respective kernel parameter.
Alternatively,reduce PostgreSQL's "                                                 "consumption of semaphores by
reducingits max_connections parameter "                                                 "(currently %d).\n"
    "The PostgreSQL documentation contains more information about "
"configuringyour system for PostgreSQL.",                                                 MaxBackends) : 0));        }
 
        return semId;
}
========


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> On Sun, 2 Apr 2006, Tom Lane wrote:
>> BTW, even before doing that, you should look at "ipcs -s" output to try 
>> to get a clue what's going on.  The EINVAL failures may be because the 
>> second postmaster to start deletes the semaphores created by the first 
>> one.  You could easily see this happening in before-and-after ipcs data 
>> if so.

> You are right ...

OK, could we see strace (or whatever BSD calls it) output for the second
postmaster?  I'd like to see exactly what results it's getting for the
kernel calls it makes during IpcSemaphoreCreate.
        regards, tom lane


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
On Sun, 2 Apr 2006, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@postgresql.org> writes:
>> On Sun, 2 Apr 2006, Tom Lane wrote:
>>> BTW, even before doing that, you should look at "ipcs -s" output to try
>>> to get a clue what's going on.  The EINVAL failures may be because the
>>> second postmaster to start deletes the semaphores created by the first
>>> one.  You could easily see this happening in before-and-after ipcs data
>>> if so.
>
>> You are right ...
>
> OK, could we see strace (or whatever BSD calls it) output for the second
> postmaster?  I'd like to see exactly what results it's getting for the
> kernel calls it makes during IpcSemaphoreCreate.

'k, dont' know what strace is ... we have ktrace and truss ... truss is 
what I usually use, and is:

DESCRIPTION     The truss utility traces the system calls called by the specified process     or program.  Output is to
thespecified output file, or standard error by     default.  It does this by stopping and restarting the process being
moni-    tored via procfs(5).
 

And shows output like:

# truss ls
ioctl(1,TIOCGETA,0x7fbff514)                     = 0 (0x0)
ioctl(1,TIOCGWINSZ,0x7fbff588)                   = 0 (0x0)
getuid()                                         = 0 (0x0)
readlink("/etc/malloc.conf",0x7fbff470,63)       ERR#2 'No such file or directory'
mmap(0x0,4096,0x3,0x1002,-1,0x0)                 = 671666176 (0x2808d000)
break(0x809b000)                                 = 0 (0x0)
break(0x809c000)                                 = 0 (0x0)
break(0x809d000)                                 = 0 (0x0)
break(0x809e000)                                 = 0 (0x0)
stat(".",0x7fbff470)                             = 0 (0x0)
open(".",0x0,00)                                 = 3 (0x3)
fchdir(0x3)                                      = 0 (0x0)
open(".",0x0,00)                                 = 4 (0x4)
stat(".",0x7fbff430)                             = 0 (0x0)
open(".",0x4,00)                                 = 5 (0x5)
fstat(5,0x7fbff430)                              = 0 (0x0)
fcntl(0x5,0x2,0x1)                               = 0 (0x0)
__sysctl(0x7fbff2e8,0x2,0x8098760,0x7fbff2e4,0x0,0x0) = 0 (0x0)
fstatfs(0x5,0x7fbff330)                          = 0 (0x0)
break(0x809f000)                                 = 0 (0x0)
getdirentries(0x5,0x809e000,0x1000,0x809a0b4)    = 512 (0x200)
getdirentries(0x5,0x809e000,0x1000,0x809a0b4)    = 0 (0x0)
lseek(5,0x0,0)                                   = 0 (0x0)
close(5)                                         = 0 (0x0)
fchdir(0x4)                                      = 0 (0x0)
close(4)                                         = 0 (0x0)
fstat(1,0x7fbff270)                              = 0 (0x0)
break(0x80a0000)                                 = 0 (0x0)
ioctl(1,TIOCGETA,0x7fbff2a4)                     = 0 (0x0)
.cshrc          .cvspass        .history        .login          .psql_history   .ssh
write(1,0x809f000,53)                            = 53 (0x35)
.cshrc~         .emacs.d        .klogin         .profile        .rnd            ktrace.out
write(1,0x809f000,53)                            = 53 (0x35)
exit(0x0)                                       process exit, rval = 0


ktrace is:

DESCRIPTION     The ktrace utility enables kernel trace logging for the specified pro-     cesses.  Kernel trace data
islogged to the file ktrace.out.  The kernel     operations that are traced include system calls, namei translations,
sig-    nal processing, and I/O.
 

And shows output like:
 86523 ls       RET   __sysctl 0 86523 ls       CALL  fstatfs(0x5,0x7fbff330) 86523 ls       RET   fstatfs 0 86523 ls
   CALL  break(0x809f000) 86523 ls       RET   break 0 86523 ls       CALL
getdirentries(0x5,0x809e000,0x1000,0x809a0b4)86523 ls       RET   getdirentries 512/0x200 86523 ls       CALL
getdirentries(0x5,0x809e000,0x1000,0x809a0b4)86523 ls       RET   getdirentries 0 86523 ls       CALL
lseek(0x5,0,0,0,0)


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> On Sun, 2 Apr 2006, Tom Lane wrote:
>> OK, could we see strace (or whatever BSD calls it) output for the second
>> postmaster?  I'd like to see exactly what results it's getting for the
>> kernel calls it makes during IpcSemaphoreCreate.

> 'k, dont' know what strace is ... we have ktrace and truss ... truss is 
> what I usually use, and is:

truss seems to have an output format closer to what I'm used to, but
either will do.
        regards, tom lane


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
Sent offlist ...

On Sun, 2 Apr 2006, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@postgresql.org> writes:
>> On Sun, 2 Apr 2006, Tom Lane wrote:
>>> OK, could we see strace (or whatever BSD calls it) output for the second
>>> postmaster?  I'd like to see exactly what results it's getting for the
>>> kernel calls it makes during IpcSemaphoreCreate.
>
>> 'k, dont' know what strace is ... we have ktrace and truss ... truss is
>> what I usually use, and is:
>
> truss seems to have an output format closer to what I'm used to, but
> either will do.
>
>             regards, tom lane
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> 'k, try this one ... looks better, actually has semget() calls in it :)

OK, here's our problem:

84250: semget(0x52e2c1,0x11,0x780)         ERR#17 'File exists'

This is InternalIpcSemaphoreCreate failing because of key collision.
As it should.

84250: semget(0x52e2c1,0x11,0x0)         = 1114112 (0x110000)

This is IpcSemaphoreCreate trying to see what's up.  OK.

84250: __semctl(0x110000,0x10,0x5,0x0)         = 537 (0x219)

IpcSemaphoreGetValue indicates it has the right "magic number" to be
a Postgres semaphore set.  Still expected.

84250: __semctl(0x110000,0x10,0x4,0x0)         = 83699 (0x146f3)

IpcSemaphoreGetLastPID says the sema set is last touched by pid 83699.
Looks reasonable (but do you want to double check that that matched the
first postmaster's PID?)

84250: getpid()                     = 84250 (0x1491a)

our pid ... as expected ...

84250: kill(0x146f3,0x0)             ERR#3 'No such process'

Oops.  Here is the problem: kill() is lying by claiming there is no such
process as 83699.  It looks to me like there in fact is such a process,
but it's in a different jail.

I venture that FBSD 6 has decided to return ESRCH (no such process)
where FBSD 4 returned some other error that acknowledged that the
process did exist (EPERM would be a reasonable guess).

If this is the story, then FBSD have broken their system and must revert
their change.  They do not have kernel behavior that totally hides the
existence of the other process, and therefore having some calls that
pretend it's not there is simply inconsistent.
        regards, tom lane


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
Kris Kennaway <kris@obsecurity.org> writes:
> On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote:
>> If this is the story, then FBSD have broken their system and must revert
>> their change.  They do not have kernel behavior that totally hides the
>> existence of the other process, and therefore having some calls that
>> pretend it's not there is simply inconsistent.

> I'm guessing it's a deliberate change to prevent the information
> leakage between jails.

I have no objection to doing that, so long as you are actually doing it
correctly.  This example shows that each jail must have its own SysV
semaphore key space, else information leaks anyway.  The current
situation breaks Postgres, and therefore I suggest reverting the errno
change until you are prepared to fix the SysV IPC stuff to be per-jail.
        regards, tom lane


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
Kris Kennaway <kris@obsecurity.org> writes:
> On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote:
>> I have no objection to doing that, so long as you are actually doing it
>> correctly.  This example shows that each jail must have its own SysV
>> semaphore key space, else information leaks anyway.

> By default SysV shared memory is disallowed in jails.

Hm, the present problem seems to be about semaphores not shared memory
... although I'd not be surprised to find that there's a similar issue
around shared memory.  Anyway, if FBSD's position is that they are
uninterested in supporting SysV IPC in connection with jails, then I
think the Postgres project position has to be that we are uninterested
in supporting Postgres inside FBSD jails.  Sorry Marc :-(
        regards, tom lane


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
On Sun, 2 Apr 2006, Kris Kennaway wrote:

> On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote:
>> Kris Kennaway <kris@obsecurity.org> writes:
>>> On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote:
>>>> If this is the story, then FBSD have broken their system and must revert
>>>> their change.  They do not have kernel behavior that totally hides the
>>>> existence of the other process, and therefore having some calls that
>>>> pretend it's not there is simply inconsistent.
>>
>>> I'm guessing it's a deliberate change to prevent the information
>>> leakage between jails.
>>
>> I have no objection to doing that, so long as you are actually doing it
>> correctly.  This example shows that each jail must have its own SysV
>> semaphore key space, else information leaks anyway.
>
> By default SysV shared memory is disallowed in jails.

'k, but how do I fix kill so that it has the proper behaviour if SysV is 
enabled?  Maybe a mount option for procfs that allows for pre-5.x 
behaviour? I'm not the first one to point out that this is a problem, just 
the first to follow it through to the cause ;(  And I believe there is 
more then just PostgreSQL that is affected by shared memory (ie. apache2 
needs SysV IPC enabled, so anyone doing that in a jail has it enabled 
also) ...

Basically, I don't care if 'default' is ultra-secure ... but some means to 
bring it down a notch would be nice ... :(

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
On Sun, 2 Apr 2006, Kris Kennaway wrote:

> No-one is taking a position of being "uninterested", so please don't
> be hasty to reciprocate.

I just posted it off the -hackers list, but there is an ancient patch in 
the FreeBSD queue for implementing Private IPCs for Jails ... not sure why 
it was never committed, or what is involved in bring it up to speed with 
the current 6.x and / or -current kernels though ... but, as I mentioned 
in another thread, I know that *at least* Apache2 makes use of shared 
memory for some of its stuff ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
Thanks all ... have moved this to just the freebsd-stable list, since I 
don't imagine most here are interested in FreeBSD :(

On Mon, 3 Apr 2006, Andrew Thompson wrote:

> On Sun, Apr 02, 2006 at 11:41:01PM -0400, Kris Kennaway wrote:
>> On Mon, Apr 03, 2006 at 12:30:58AM -0300, Marc G. Fournier wrote:
>>> 'k, but how do I fix kill so that it has the proper behaviour if SysV is
>>> enabled?
>>
>> Check the source, perhaps there's already a way.  If not, talk to
>> whoever made the change.
>>
>>> Maybe a mount option for procfs that allows for pre-5.x
>>> behaviour?
>>
>> procfs has nothing to do with this though.
>>
>>> I'm not the first one to point out that this is a problem, just
>>> the first to follow it through to the cause ;(  And I believe there is
>>> more then just PostgreSQL that is affected by shared memory (ie. apache2
>>> needs SysV IPC enabled, so anyone doing that in a jail has it enabled
>>> also) ...
>>
>> Also note that SysV IPC is not the problem here, it's the change in
>> the behaviour of kill() that is causing postgresql to become confused.
>> That's what you should investigate.
>
> The ESRCH error is being returned from prison_check(), that would be a
> good starting place.
>
>
> Andrew
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
Robert Watson <rwatson@FreeBSD.org> writes:
> However, pid's in general uniquely identify a process only at the time they 
> are recorded.  So any pid returned here is necessarily stale -- even if there
> is another process with the pid returned by GETPID, it may actually be a 
> different process that has ended up with the same pid.  The longer the gap 
> since the last semaphore operation, the more likely (presumably) it is that 
> the pid has been recycled.  And on modern systems with thousands of processes
> and high process turn-over (i.e., systems with CGI and other sorts of 
> scripting),pid reuse can happen quickly.  Is your use of the pid here 
> consistent with fact that pid's are reused quickly after process exit?

That's a fair question, but in the context of the code I believe we are
behaving reasonably.  The reason this code exists is to provide some
insurance against leaking semaphores when a postmaster process is
terminated unexpectedly (ye olde often-recommended-against "kill -9
postmaster", for instance).  If the PID returned by GETPID is
nonexistent or belongs to a process not owned by the postgres userid
then we assume that the semaphore set can be recycled.  We could get
fooled by PID recycling if the PID returned by GETPID belongs to a
postgres-owned process that isn't actually the original owner, but
the penalty is just that we'll fail to recycle semaphores that could
be released.  Not very harmful, and not very probable either, unless
you're running postgres under a userid that's used for a lot of other
stuff too.  There is not much risk of long-term leakage of many
semaphore sets, even if you've got lots of postmaster crashes going on
(which I sure hope you don't).  The code is designed to retry the same
semaphore keys on each cycle of life, so you'd have to get fooled by
chance coincidence of existing PIDs every time over many cycles to
have a severe resource-leakage problem.  (BTW, Marc, that's the reason
for *not* randomizing the key selection as you suggested.)

So I think the code is pretty bulletproof as long as it's in a system
that is behaving per SysV spec.  The problem in the current FBSD
situation is that the jail mechanism is exposing semaphore sets across
jails, but not exposing the existence of the owning processes.  That
behavior is inconsistent: if process A can affect the state of a sema
set that process B can see, it's surely unreasonable to pretend that A
doesn't exist.
        regards, tom lane


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
Robert Watson <rwatson@FreeBSD.org> writes:
> Maybe I've misunderstood the problem here -- is the use of the GETPID 
> operation occuring within a coordinated set of server processes, or does it 
> also occur between client and server processes?  I think it's quite reasonable 
> to argue that a coordinated set of server processes should be able to see each 
> other, especially if they're running as the same user, in the same jail, 
> started at the same time.

We use the semaphore sets only within postgres server processes; we
could hardly expect client processes to be able to get at them, since
in general clients aren't on the same machine.  The issue here, though,
is that Marc is trying to start multiple postgres servers in different
jails, and in that context the different postgres servers aren't
"coordinated" in any real sense.  We'd prefer that they didn't interact
at all, but they are interacting because the SysV code isn't restricting
IPC to occur only within a jail.

BTW, Marc, it occurs to me that a workaround for you would be to create
a separate userid for postgres to run under in each jail; then the
regular protection mechanisms would prevent the different postmasters
from interfering with each others' semaphore sets.  But I think that
workaround just makes it even clearer that the jail mechanism isn't
behaving very sanely.

> I would, in general, consider the use of System 
> V IPC across jails (as opposed to in a single jail) unsupported, since it's 
> not consistent with the security model.

That'd be fine with me --- the problem here is that we've got unwanted
communication across jails.  If, say, the jail ID were considered part
of semaphore keys, we'd be in fine shape.
        regards, tom lane


Re: semaphore usage "port based"?

От
Tom Lane
Дата:
Robert Watson <rwatson@FreeBSD.org> writes:
> Any multi-instance application that uses unvirtualized System V IPC must know
> how to distinguish between those instances.

Sure.

> How is PostgreSQL deciding what semaphores to use?  Can it be instructed to 
> use non-colliding ones by specifying an alternative argument to pass to 
> ftok(), or ID to use directly?

The problem here is not that we don't know how to avoid a collision.
The problem is stemming from code that we added to prevent semaphore
leakage during failure recoveries.  The code believes that it is
deleting a semaphore set left over from a crashed previous instance
of the same postmaster.

We don't use ftok() to determine the keys, and I'm disinclined to think
that doing so would improve the situation: you could still have key
collisions, they'd just be unpredictable and there'd be no convenient
mechanism for escaping one if you hit it.

> However, if applications behave incorrectly when treading over each other 
> because either they aren't written to support specifying how not to walk over
> each other, or if they are not configured to use that support, then they're 
> not going to behave well :-).

Postgres is absolutely designed not to walk all over itself.  It is,
however, designed to clean up after itself, and I don't consider that a
bug.  The problem here is that by redefining the behavior of kill, you've
prevented the code from detecting the existence of the other postmaster,
and thereby triggered the cleanup behavior.

I don't exactly see why it's considered such a critical security feature
that kill return ESRCH rather than, say, EPERM for processes in another
jail.  kill won't tell you what that process is or what it's doing,
so the amount of information leaked is certainly pretty trivial.  It'd
be fine if FBSD actually had a jail implementation that leaked zero
information, but you don't --- in fact, you're saying it's a feature
that you don't.

Perhaps a reasonable compromise would be to have the
SysV-IPC-allowed-in-jails switch also restore the normal return value
of kill().  This seems sensible to me because the GETPID feature makes
PIDs be part of the API that is exposed across jails.
        regards, tom lane


Re: semaphore usage "port based"?

От
Vivek Khera
Дата:
On Apr 3, 2006, at 12:37 PM, Tom Lane wrote:

> semaphore keys on each cycle of life, so you'd have to get fooled by
> chance coincidence of existing PIDs every time over many cycles to
> have a severe resource-leakage problem.  (BTW, Marc, that's the reason
> for *not* randomizing the key selection as you suggested.)

Seems to me the way around this with minimal fuss is to add a flag to  
postgres to have it  start at different points in the ID sequence.   
So pg#1 would start at first position, pg#2 second ID position, etc.   
then just hard-code an "instance ID" into the startup script for each  
pg.  No randomization make it easier to debug, and unique IDs make it  
avoid clashes under normal cases.



Re: semaphore usage "port based"?

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> That's a fair question, but in the context of the code I believe we are
> behaving reasonably.  The reason this code exists is to provide some
> insurance against leaking semaphores when a postmaster process is
> terminated unexpectedly (ye olde often-recommended-against "kill -9
> postmaster", for instance).  If the PID returned by GETPID is

Could this be handled sensibly by using SEM_UNDO?  Just a thought.

> So I think the code is pretty bulletproof as long as it's in a system
> that is behaving per SysV spec.  The problem in the current FBSD
> situation is that the jail mechanism is exposing semaphore sets across
> jails, but not exposing the existence of the owning processes.  That
> behavior is inconsistent: if process A can affect the state of a sema
> set that process B can see, it's surely unreasonable to pretend that A
> doesn't exist.

This is certainly a problem with FBSD jails...  Not only the
inconsistancy, but what happens if someone manages to get access to the
appropriate uid under one jail and starts sniffing or messing with the
semaphores or shared memory segments from other jails?  If that's
possible then that's a rather glaring security problem...
Thanks,
    Stephen

Re: semaphore usage "port based"?

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> BTW, Marc, it occurs to me that a workaround for you would be to create
> a separate userid for postgres to run under in each jail; then the
> regular protection mechanisms would prevent the different postmasters
> from interfering with each others' semaphore sets.  But I think that
> workaround just makes it even clearer that the jail mechanism isn't
> behaving very sanely.

Just to toss it in there, I do this on some systems where we use Linux
VServers.  It's just so that when I'm looking at a process list across
the whole system it's easy to tell which processes are inside which
vservers (since the only thing which should be running in a given
vserver is a single Postgres instance which should only be running with
the uid/gid corresponding to that vserver, and that uid/gid is recorded
in the host passwd file with a name associated with it since that's the
passwd file used when looking at all pids).

I also just double-checked with the Linux VServer folks and they confirm
that IPC inside the vserver are isolated from all the other IPCs on the
system.
Thanks,
    Stephen

Re: semaphore usage "port based"?

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Could this be handled sensibly by using SEM_UNDO?  Just a thought.

Interesting thought, but I think it doesn't work for the special case
where the semaphore's "previous owner" is actually our same PID ---
which is actually the more commonly exercised path, since true
postmaster crashes are pretty rare.  More commonly we're trying to
detach from and recreate our own shmem and semas following a backend
crash.  We can special-case that pretty easily with the GETPID solution
(pid == ours is obviously not some other process's sema), but with
SEM_UNDO it wouldn't work right.

I'm also concerned about the portability risks of depending on SEM_UNDO.
I think a lot of systems set the semaphore-undo limits pretty small,
maybe even zero.

BTW, as long as we're annoying the freebsd-stable list with discussions
of workarounds, I'm wondering whether our shared memory code might have
similar risks.  Does FBSD 6 also lie about the existence of other-jail
processes connected to a shared memory segment --- ie, in
shmctl(IPC_STAT)'s result, does shm_nattch count only processes in our
own jail?
        regards, tom lane


Re: semaphore usage "port based"?

От
Stephen Frost
Дата:
* Robert Watson (rwatson@FreeBSD.org) wrote:
> On Mon, 3 Apr 2006, Stephen Frost wrote:
> >This is certainly a problem with FBSD jails...  Not only the
> >inconsistancy, but what happens if someone manages to get access to the
> >appropriate uid under one jail and starts sniffing or messing with the
> >semaphores or shared memory segments from other jails?  If that's possible
> >then that's a rather glaring security problem...
>
> This is why it's disabled by default, and the jail documentation
> specifically advises of this possibility.  Excerpt below.

Ah, I see, glad to see it's accurately documented.  Given the rather
significant use of shared memory by Postgres it seems to me that
jail'ing it under FBSD is unlikely to get you the kind of isolation
between instances that you want (the assumption being that you want to
avoid the possibility of a user under one jail impacting a user in
another jail).  As such, I'd suggest finding something else if you
truely need that isolation for Postgres or dropping the jails entirely.

Running the Postgres instances under different uids (as you'd probably
expect to do anyway if not using the jails) is probably the right
approach.  Doing that and using jails would probably work, just don't
delude yourself into thinking that you're safe from a malicious user in
one jail.
Thanks,
    Stephen

Re: semaphore usage "port based"?

От
"Marc G. Fournier"
Дата:
On Mon, 3 Apr 2006, Stephen Frost wrote:

> * Robert Watson (rwatson@FreeBSD.org) wrote:
>> On Mon, 3 Apr 2006, Stephen Frost wrote:
>>> This is certainly a problem with FBSD jails...  Not only the
>>> inconsistancy, but what happens if someone manages to get access to the
>>> appropriate uid under one jail and starts sniffing or messing with the
>>> semaphores or shared memory segments from other jails?  If that's possible
>>> then that's a rather glaring security problem...
>>
>> This is why it's disabled by default, and the jail documentation
>> specifically advises of this possibility.  Excerpt below.
>
> Ah, I see, glad to see it's accurately documented.  Given the rather
> significant use of shared memory by Postgres it seems to me that
> jail'ing it under FBSD is unlikely to get you the kind of isolation
> between instances that you want (the assumption being that you want to
> avoid the possibility of a user under one jail impacting a user in
> another jail).  As such, I'd suggest finding something else if you
> truely need that isolation for Postgres or dropping the jails entirely.
>
> Running the Postgres instances under different uids (as you'd probably
> expect to do anyway if not using the jails) is probably the right
> approach.  Doing that and using jails would probably work, just don't
> delude yourself into thinking that you're safe from a malicious user in
> one jail.

We don't ... we put all our databases on a central database server, even 
private ones, that nobody has shell access to ... we keep them isolated 
...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: semaphore usage "port based"?

От
Stephen Frost
Дата:
* Marc G. Fournier (scrappy@postgresql.org) wrote:
> On Mon, 3 Apr 2006, Stephen Frost wrote:
> >Running the Postgres instances under different uids (as you'd probably
> >expect to do anyway if not using the jails) is probably the right
> >approach.  Doing that and using jails would probably work, just don't
> >delude yourself into thinking that you're safe from a malicious user in
> >one jail.
>
> We don't ... we put all our databases on a central database server, even
> private ones, that nobody has shell access to ... we keep them isolated
> ...

I guess what I was trying to get at is this:

Running 2 Postgres instances under FreeBSD with (or without really, but
I guess that's more obvious) jails but with the same UID is a bad idea.
Even if Postgres could be modified to allow this to work you're going to
be in a position where the jail isn't really helping much except to give
a somewhat false (in this case) sense of security.  We probably
shouldn't encourage it and in fact it's something of a nice feature that
it breaks.

The reasoning is pretty simple: if someone manages to get control of
one of the Postgres instances they're going to be able to wreck havoc on
the other.  With different UIDs, with or without jails, this would be
much more difficult (need to get root first).

Running 2 Postgres instances under FreeBSD with jails *and* different
UIDs is *probably* better than w/o jails but since you have to enable
the single-instance IPC system it might not be that great of a benefit
over a simple chroot or similar.

Hope that helps...
Thanks,
    Stephen

Re: semaphore usage "port based"?

От
Kris Kennaway
Дата:
On Sun, Apr 02, 2006 at 11:26:52PM -0400, Tom Lane wrote:
> Kris Kennaway <kris@obsecurity.org> writes:
> > On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote:
> >> I have no objection to doing that, so long as you are actually doing it
> >> correctly.  This example shows that each jail must have its own SysV
> >> semaphore key space, else information leaks anyway.
>
> > By default SysV shared memory is disallowed in jails.
>
> Hm, the present problem seems to be about semaphores not shared memory

Sorry, I meant IPC.

> ... although I'd not be surprised to find that there's a similar issue
> around shared memory.  Anyway, if FBSD's position is that they are
> uninterested in supporting SysV IPC in connection with jails, then I
> think the Postgres project position has to be that we are uninterested
> in supporting Postgres inside FBSD jails.

No-one is taking a position of being "uninterested", so please don't
be hasty to reciprocate.

Kris

Re: semaphore usage "port based"?

От
Kris Kennaway
Дата:
On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote:
> Kris Kennaway <kris@obsecurity.org> writes:
> > On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote:
> >> If this is the story, then FBSD have broken their system and must revert
> >> their change.  They do not have kernel behavior that totally hides the
> >> existence of the other process, and therefore having some calls that
> >> pretend it's not there is simply inconsistent.
>
> > I'm guessing it's a deliberate change to prevent the information
> > leakage between jails.
>
> I have no objection to doing that, so long as you are actually doing it
> correctly.  This example shows that each jail must have its own SysV
> semaphore key space, else information leaks anyway.

By default SysV shared memory is disallowed in jails.

Kris

Re: semaphore usage "port based"?

От
Kris Kennaway
Дата:
On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote:

> I venture that FBSD 6 has decided to return ESRCH (no such process)
> where FBSD 4 returned some other error that acknowledged that the
> process did exist (EPERM would be a reasonable guess).
>
> If this is the story, then FBSD have broken their system and must revert
> their change.  They do not have kernel behavior that totally hides the
> existence of the other process, and therefore having some calls that
> pretend it's not there is simply inconsistent.

I'm guessing it's a deliberate change to prevent the information
leakage between jails.

Kris

Re: semaphore usage "port based"?

От
Andrew Thompson
Дата:
On Sun, Apr 02, 2006 at 11:41:01PM -0400, Kris Kennaway wrote:
> On Mon, Apr 03, 2006 at 12:30:58AM -0300, Marc G. Fournier wrote:
> > 'k, but how do I fix kill so that it has the proper behaviour if SysV is 
> > enabled?
> 
> Check the source, perhaps there's already a way.  If not, talk to
> whoever made the change.
> 
> > Maybe a mount option for procfs that allows for pre-5.x 
> > behaviour?
> 
> procfs has nothing to do with this though.
> 
> > I'm not the first one to point out that this is a problem, just 
> > the first to follow it through to the cause ;(  And I believe there is 
> > more then just PostgreSQL that is affected by shared memory (ie. apache2 
> > needs SysV IPC enabled, so anyone doing that in a jail has it enabled 
> > also) ...
> 
> Also note that SysV IPC is not the problem here, it's the change in
> the behaviour of kill() that is causing postgresql to become confused.
> That's what you should investigate.

The ESRCH error is being returned from prison_check(), that would be a
good starting place.


Andrew


Re: semaphore usage "port based"?

От
Kris Kennaway
Дата:
On Mon, Apr 03, 2006 at 12:30:58AM -0300, Marc G. Fournier wrote:
> On Sun, 2 Apr 2006, Kris Kennaway wrote:
>
> >On Sun, Apr 02, 2006 at 11:17:49PM -0400, Tom Lane wrote:
> >>Kris Kennaway <kris@obsecurity.org> writes:
> >>>On Sun, Apr 02, 2006 at 11:08:11PM -0400, Tom Lane wrote:
> >>>>If this is the story, then FBSD have broken their system and must revert
> >>>>their change.  They do not have kernel behavior that totally hides the
> >>>>existence of the other process, and therefore having some calls that
> >>>>pretend it's not there is simply inconsistent.
> >>
> >>>I'm guessing it's a deliberate change to prevent the information
> >>>leakage between jails.
> >>
> >>I have no objection to doing that, so long as you are actually doing it
> >>correctly.  This example shows that each jail must have its own SysV
> >>semaphore key space, else information leaks anyway.
> >
> >By default SysV shared memory is disallowed in jails.
>
> 'k, but how do I fix kill so that it has the proper behaviour if SysV is
> enabled?

Check the source, perhaps there's already a way.  If not, talk to
whoever made the change.

> Maybe a mount option for procfs that allows for pre-5.x
> behaviour?

procfs has nothing to do with this though.

> I'm not the first one to point out that this is a problem, just
> the first to follow it through to the cause ;(  And I believe there is
> more then just PostgreSQL that is affected by shared memory (ie. apache2
> needs SysV IPC enabled, so anyone doing that in a jail has it enabled
> also) ...

Also note that SysV IPC is not the problem here, it's the change in
the behaviour of kill() that is causing postgresql to become confused.
That's what you should investigate.

Kris

Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Sun, 2 Apr 2006, Tom Lane wrote:

> Oops.  Here is the problem: kill() is lying by claiming there is no such 
> process as 83699.  It looks to me like there in fact is such a process, but 
> it's in a different jail.
>
> I venture that FBSD 6 has decided to return ESRCH (no such process) where 
> FBSD 4 returned some other error that acknowledged that the process did 
> exist (EPERM would be a reasonable guess).
>
> If this is the story, then FBSD have broken their system and must revert 
> their change.  They do not have kernel behavior that totally hides the 
> existence of the other process, and therefore having some calls that pretend 
> it's not there is simply inconsistent.

FreeBSD's mandatory access control models, such as multi-level security, biba 
integrity, and type enforcement, will generally provide consistent protection 
under the circumstances you describe: specifically, that information flow 
invariants across IPC types, including System V IPC and inter-process 
signalling, will allow flow only in keeping with the policy.

However, I guess I would counter with the following concern: the PID returned 
by semctl() has the following definition:
     GETPID       Return the pid of the last process to perform an operation                  on semaphore number
semnum.

However, pid's in general uniquely identify a process only at the time they 
are recorded.  So any pid returned here is necessarily stale -- even if there 
is another process with the pid returned by GETPID, it may actually be a 
different process that has ended up with the same pid.  The longer the gap 
since the last semaphore operation, the more likely (presumably) it is that 
the pid has been recycled.  And on modern systems with thousands of processes 
and high process turn-over (i.e., systems with CGI and other sorts of 
scripting),pid reuse can happen quickly.  Is your use of the pid here 
consistent with fact that pid's are reused quickly after process exit?  Use of 
pid's in UNIX is often unreliable, and must be combined with other 
synchronizing, such as file locking on a pidfile, to ensure that the pid read 
is valid.  Even then, you can't implement atomic check-pid-and-signal using 
current UNIX APIs, which would require a notion of a process handle (or, in 
the parlance of Mach, a task port).

Another thought along these lines -- especially with the proliferation of 
fine-grained access control systems, such as Type Enforcement in SELinux, I 
would be cautious about assuming that two processes being able to manipulate 
the same sempahore implies the ability to exchange signals using the signal 
facility.

Robert N M Watson


Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Mon, 3 Apr 2006, Tom Lane wrote:

> BTW, as long as we're annoying the freebsd-stable list with discussions of 
> workarounds, I'm wondering whether our shared memory code might have similar 
> risks.  Does FBSD 6 also lie about the existence of other-jail processes 
> connected to a shared memory segment --- ie, in shmctl(IPC_STAT)'s result, 
> does shm_nattch count only processes in our own jail?

People are, of course, welcome to read the Jail documentation in order to read 
the warning about not enabling the System V IPC support in Jails, and what the 
possible results of doing so are.

Robert N M Watson


Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Mon, 3 Apr 2006, Stephen Frost wrote:

>> This is why it's disabled by default, and the jail documentation 
>> specifically advises of this possibility.  Excerpt below.
>
> Ah, I see, glad to see it's accurately documented.

As it has been for the last five years, I believe since introduction of the 
setting to allow System V IPC to be used with documented limitations.

> Given the rather significant use of shared memory by Postgres it seems to me 
> that jail'ing it under FBSD is unlikely to get you the kind of isolation 
> between instances that you want (the assumption being that you want to avoid 
> the possibility of a user under one jail impacting a user in another jail). 
> As such, I'd suggest finding something else if you truely need that 
> isolation for Postgres or dropping the jails entirely.
>
> Running the Postgres instances under different uids (as you'd probably 
> expect to do anyway if not using the jails) is probably the right approach. 
> Doing that and using jails would probably work, just don't delude yourself 
> into thinking that you're safe from a malicious user in one jail.

Yes, there seems to be an awful lot of noise being made about the fact that 
the system does, in fact, work exactly as documented, and that the 
configuration being complained about is one that is specifically documented as 
being unsupported and undesirable.

As commented elsewhere in this thread, currently, there is no virtualization 
support for System V IPC in the FreeBSD Jail implementation.  That may change 
if/when someone implements it.  Until it's implemented, it isn't going to be 
there, and the system won't behave as though it's there no matter how much 
jumping up and down is done.

Robert N M Watson


Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Mon, 3 Apr 2006, Tom Lane wrote:

> Robert Watson <rwatson@FreeBSD.org> writes:
>> Any multi-instance application that uses unvirtualized System V IPC must know
>> how to distinguish between those instances.
>
> Sure.
>
>> How is PostgreSQL deciding what semaphores to use?  Can it be instructed to
>> use non-colliding ones by specifying an alternative argument to pass to
>> ftok(), or ID to use directly?
>
> The problem here is not that we don't know how to avoid a collision. The 
> problem is stemming from code that we added to prevent semaphore leakage 
> during failure recoveries.  The code believes that it is deleting a 
> semaphore set left over from a crashed previous instance of the same 
> postmaster.
>
> We don't use ftok() to determine the keys, and I'm disinclined to think that 
> doing so would improve the situation: you could still have key collisions, 
> they'd just be unpredictable and there'd be no convenient mechanism for 
> escaping one if you hit it.

I guess what I'm saying is this: by turning on system V IPC in a jail, 
administrators accept that they are using an unsupported configuration, in 
which the security features of jail, which include hiding the process state of 
other jails, are known to conflict with the System V IPC services.  We 
specifically disable System V IPC in jails because it is known to have 
undesirable properties.  When configuring systems in that state, the 
responsibility falls on the administrator to disambiguate the configuration by 
specifying which resources must be used in order to prevent a conflict, 
because software operating in that environment will not be able to do so 
properly.  The goal of the switch to enable System V IPC is to allow IPC to be 
enabled for a single jail at a time, where it can be used to its full 
capabilities, without violating the security model.  If it is turned on for 
more than one jail, then isolation is not provided for System V IPC.

So my recommendation is, if people want to run Postgres in more than one jail 
at a time, they be provided with a configuration option to disambiguate which 
semaphore to use: they must hard-code that it will not use the same sempahore 
already in use by another Postgres instance in another Jail.  This is no 
different than specifying that if there are multiple Apache's running on a 
single system, that they run on different port/IP combinations.  If they 
aren't configured to do so, one of them will encounter an error when running, 
because the resource is already in use, and you may get unpredictable results 
if the two Apaches are started at the same time, restarted, etc, as they will 
race to acquire the resource.

Whether you pull the resource ID out of a hat, use ftok(), or whatever, I 
really mind, and have no strong opinion.  The name space of System V IPC is 
one of the known problems with the IPC model, and sadly, one accepts those 
problems by using those IPC mechanisms.

Robert N M Watson


Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Mon, 3 Apr 2006, Stephen Frost wrote:

>> So I think the code is pretty bulletproof as long as it's in a system that 
>> is behaving per SysV spec.  The problem in the current FBSD situation is 
>> that the jail mechanism is exposing semaphore sets across jails, but not 
>> exposing the existence of the owning processes.  That behavior is 
>> inconsistent: if process A can affect the state of a sema set that process 
>> B can see, it's surely unreasonable to pretend that A doesn't exist.
>
> This is certainly a problem with FBSD jails...  Not only the inconsistancy, 
> but what happens if someone manages to get access to the appropriate uid 
> under one jail and starts sniffing or messing with the semaphores or shared 
> memory segments from other jails?  If that's possible then that's a rather 
> glaring security problem...

This is why it's disabled by default, and the jail documentation specifically 
advises of this possibility.  Excerpt below.

Robert N M Watson
     security.jail.sysvipc_allowed          This MIB entry determines whether or not processes within a jail
haveaccess to System V IPC primitives.  In the current jail imple-          mentation, System V primitives share a
singlenamespace across the          host and jail environments, meaning that processes within a jail          would be
ableto communicate with (and potentially interfere with)          processes outside of the jail, and in other jails.
Assuch, this          functionality is disabled by default, but can be enabled by setting          this MIB entry to
1.


Re: semaphore usage "port based"?

От
Kris Kennaway
Дата:
On Mon, Apr 03, 2006 at 06:51:45PM -0400, Stephen Frost wrote:
> * Robert Watson (rwatson@FreeBSD.org) wrote:
> > On Mon, 3 Apr 2006, Stephen Frost wrote:
> > >This is certainly a problem with FBSD jails...  Not only the
> > >inconsistancy, but what happens if someone manages to get access to the
> > >appropriate uid under one jail and starts sniffing or messing with the
> > >semaphores or shared memory segments from other jails?  If that's possible
> > >then that's a rather glaring security problem...
> >
> > This is why it's disabled by default, and the jail documentation
> > specifically advises of this possibility.  Excerpt below.
>
> Ah, I see, glad to see it's accurately documented.  Given the rather
> significant use of shared memory by Postgres it seems to me that
> jail'ing it under FBSD is unlikely to get you the kind of isolation
> between instances that you want (the assumption being that you want to
> avoid the possibility of a user under one jail impacting a user in
> another jail).  As such, I'd suggest finding something else if you
> truely need that isolation for Postgres or dropping the jails entirely.
>
> Running the Postgres instances under different uids (as you'd probably
> expect to do anyway if not using the jails) is probably the right
> approach.  Doing that and using jails would probably work, just don't
> delude yourself into thinking that you're safe from a malicious user in
> one jail.

Yes; however jails are still useful for administrative
compartmentalization even when you have to weaken their security
properties, such as here.

Kris

Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Mon, 3 Apr 2006, Tom Lane wrote:

> Robert Watson <rwatson@FreeBSD.org> writes:
>> Maybe I've misunderstood the problem here -- is the use of the GETPID
>> operation occuring within a coordinated set of server processes, or does it
>> also occur between client and server processes?  I think it's quite reasonable
>> to argue that a coordinated set of server processes should be able to see each
>> other, especially if they're running as the same user, in the same jail,
>> started at the same time.
>
> We use the semaphore sets only within postgres server processes; we could 
> hardly expect client processes to be able to get at them, since in general 
> clients aren't on the same machine.  The issue here, though, is that Marc is 
> trying to start multiple postgres servers in different jails, and in that 
> context the different postgres servers aren't "coordinated" in any real 
> sense.  We'd prefer that they didn't interact at all, but they are 
> interacting because the SysV code isn't restricting IPC to occur only within 
> a jail.
>
> BTW, Marc, it occurs to me that a workaround for you would be to create a 
> separate userid for postgres to run under in each jail; then the regular 
> protection mechanisms would prevent the different postmasters from 
> interfering with each others' semaphore sets.  But I think that workaround 
> just makes it even clearer that the jail mechanism isn't behaving very 
> sanely.

Any multi-instance application that uses unvirtualized System V IPC must know 
how to distinguish between those instances.  This is true of any potential 
communication mechanism used by multi-instance applications -- be it a command 
line argument to specify an alternative configuration file, or a configuration 
file that specifies alternative ports, working directories, mail spool 
directories, etc.  If you install two instances of sendmail, it requires some 
configuration to teach them not to step all over each other, and this is not 
an accident: if they try to use the same mail spools, ports, etc, things will 
go badly.  I can't imagine that PostgreSQL should be any different -- it has 
to be pointed at what resources to use and how to use them -- some of that 
will be a property of how it's written, and some how it's configured. 
Presumably, running multiple instances of PostgreSQL in jails should not be 
all that different from running multiple instances on any UNIX machine: they 
must not overlap where shared resources are concerned.

How is PostgreSQL deciding what semaphores to use?  Can it be instructed to 
use non-colliding ones by specifying an alternative argument to pass to 
ftok(), or ID to use directly?

>> I would, in general, consider the use of System V IPC across jails (as 
>> opposed to in a single jail) unsupported, since it's not consistent with 
>> the security model.
>
> That'd be fine with me --- the problem here is that we've got unwanted 
> communication across jails.  If, say, the jail ID were considered part of 
> semaphore keys, we'd be in fine shape.

Well, I think it's definitely unwanted communications, but until such time as 
FreeBSD supports virtualizing the System V IPC name spaces, the fact that you 
can communicate between jails when System V IPC support is turned on for the 
jail shouldn't be a surprise, and should in fact be considered a feature. 
However, if applications behave incorrectly when treading over each other 
because either they aren't written to support specifying how not to walk over 
each other, or if they are not configured to use that support, then they're 
not going to behave well :-).

Robert N M Watson


Re: semaphore usage "port based"?

От
Kris Kennaway
Дата:
On Mon, Apr 03, 2006 at 03:42:51PM -0400, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > That's a fair question, but in the context of the code I believe we are
> > behaving reasonably.  The reason this code exists is to provide some
> > insurance against leaking semaphores when a postmaster process is
> > terminated unexpectedly (ye olde often-recommended-against "kill -9
> > postmaster", for instance).  If the PID returned by GETPID is
>
> Could this be handled sensibly by using SEM_UNDO?  Just a thought.
>
> > So I think the code is pretty bulletproof as long as it's in a system
> > that is behaving per SysV spec.  The problem in the current FBSD
> > situation is that the jail mechanism is exposing semaphore sets across
> > jails, but not exposing the existence of the owning processes.  That
> > behavior is inconsistent: if process A can affect the state of a sema
> > set that process B can see, it's surely unreasonable to pretend that A
> > doesn't exist.
>
> This is certainly a problem with FBSD jails...  Not only the
> inconsistancy, but what happens if someone manages to get access to the
> appropriate uid under one jail and starts sniffing or messing with the
> semaphores or shared memory segments from other jails?  If that's
> possible then that's a rather glaring security problem...

This was stated already upthread, but sysv IPC is disabled by default
in jails for precisely this reason.  So yes, when you turn it on it's
a potential security problem if your jails are supposed to be
compartmentalized.

Kris

Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Mon, 3 Apr 2006, Tom Lane wrote:

> That's a fair question, but in the context of the code I believe we are 
> behaving reasonably.  The reason this code exists is to provide some 
> insurance against leaking semaphores when a postmaster process is terminated 
> unexpectedly (ye olde often-recommended-against "kill -9 postmaster", for 
> instance).  If the PID returned by GETPID is nonexistent or belongs to a 
> process not owned by the postgres userid then we assume that the semaphore 
> set can be recycled.  We could get fooled by PID recycling if the PID 
> returned by GETPID belongs to a postgres-owned process that isn't actually 
> the original owner, but the penalty is just that we'll fail to recycle 
> semaphores that could be released.  Not very harmful, and not very probable 
> either, unless you're running postgres under a userid that's used for a lot 
> of other stuff too.  There is not much risk of long-term leakage of many 
> semaphore sets, even if you've got lots of postmaster crashes going on 
> (which I sure hope you don't).  The code is designed to retry the same 
> semaphore keys on each cycle of life, so you'd have to get fooled by chance 
> coincidence of existing PIDs every time over many cycles to have a severe 
> resource-leakage problem.  (BTW, Marc, that's the reason for *not* 
> randomizing the key selection as you suggested.)
>
> So I think the code is pretty bulletproof as long as it's in a system that 
> is behaving per SysV spec.  The problem in the current FBSD situation is 
> that the jail mechanism is exposing semaphore sets across jails, but not 
> exposing the existence of the owning processes.  That behavior is 
> inconsistent: if process A can affect the state of a sema set that process B 
> can see, it's surely unreasonable to pretend that A doesn't exist.

Maybe I've misunderstood the problem here -- is the use of the GETPID 
operation occuring within a coordinated set of server processes, or does it 
also occur between client and server processes?  I think it's quite reasonable 
to argue that a coordinated set of server processes should be able to see each 
other, especially if they're running as the same user, in the same jail, 
started at the same time.  After all, coordinated server applications 
frequently use signals to manage resources and perform asynchronous 
notification (i.e., SIGCHLD, SIGHUP, etc).  If we're talking about clients and 
servers coordinating using the same System V IPC name space, I find myself 
less sympathetic to the idea that otherwise unrelated processes on either side 
of the IPC mechanism should be using out-of-band process operations to test 
for mutual presence.

There has been occasional investigation of virtualizing the System V IPC name 
space, but as you are no doubt aware, the name space doesn't lend itself to 
virtualization, as it fails to be conveniently hierarchical, etc.  This is 
just another of the ways in which System V IPC offers quite useful IPC 
services in less useful ways.  I would, in general, consider the use of System 
V IPC across jails (as opposed to in a single jail) unsupported, since it's 
not consistent with the security model.  However, I have doubts about the 
behavioral dependency we're talking about above.

Robert N M Watson


Re: semaphore usage "port based"?

От
Bruce Momjian
Дата:
[ FreeBSD email list removed.]

I totally agree, and have added the attached documentation patch to
recommend using different users in FreeBSD jails.

---------------------------------------------------------------------------

Stephen Frost wrote:
-- Start of PGP signed section.
> * Marc G. Fournier (scrappy@postgresql.org) wrote:
> > On Mon, 3 Apr 2006, Stephen Frost wrote:
> > >Running the Postgres instances under different uids (as you'd probably
> > >expect to do anyway if not using the jails) is probably the right
> > >approach.  Doing that and using jails would probably work, just don't
> > >delude yourself into thinking that you're safe from a malicious user in
> > >one jail.
> >
> > We don't ... we put all our databases on a central database server, even
> > private ones, that nobody has shell access to ... we keep them isolated
> > ...
>
> I guess what I was trying to get at is this:
>
> Running 2 Postgres instances under FreeBSD with (or without really, but
> I guess that's more obvious) jails but with the same UID is a bad idea.
> Even if Postgres could be modified to allow this to work you're going to
> be in a position where the jail isn't really helping much except to give
> a somewhat false (in this case) sense of security.  We probably
> shouldn't encourage it and in fact it's something of a nice feature that
> it breaks.
>
> The reasoning is pretty simple: if someone manages to get control of
> one of the Postgres instances they're going to be able to wreck havoc on
> the other.  With different UIDs, with or without jails, this would be
> much more difficult (need to get root first).
>
> Running 2 Postgres instances under FreeBSD with jails *and* different
> UIDs is *probably* better than w/o jails but since you have to enable
> the single-instance IPC system it might not be that great of a benefit
> over a simple chroot or similar.
>
> Hope that helps...
>
>     Thanks,
>
>         Stephen
-- End of PGP section, PGP failed!

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.366
diff -c -c -r1.366 runtime.sgml
*** doc/src/sgml/runtime.sgml    3 Apr 2006 23:35:02 -0000    1.366
--- doc/src/sgml/runtime.sgml    11 Apr 2006 19:23:27 -0000
***************
*** 764,769 ****
--- 764,781 ----
         </para>

         <para>
+         If running in FreeBSD jails by enabling <application>sysconf</>'s
+         <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s
+         running in different jails should be run by different operating system
+         users.  This improves security because it prevents one jail from
+         interfering with shared memory or semaphores in another, and it
+         allows the PostgreSQL IPC cleanup code to function properly.
+         (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
+         processes in other jails, preventing the running of postmasters on the
+         same port in different jails.)
+        </para>
+
+        <para>
          <systemitem class="osname">FreeBSD</> versions before 4.0 work like
          <systemitem class="osname">NetBSD</> and <systemitem class="osname">
          OpenBSD</> (see below).

Re: semaphore usage "port based"?

От
Stephen Frost
Дата:
* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
>          <para>
> +         If running in FreeBSD jails by enabling <application>sysconf</>'s
> +         <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s
> +         running in different jails should be run by different operating system
> +         users.  This improves security because it prevents one jail from
> +         interfering with shared memory or semaphores in another, and it
> +         allows the PostgreSQL IPC cleanup code to function properly.
> +         (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
> +         processes in other jails, preventing the running of postmasters on the
> +         same port in different jails.)
> +        </para>

This looks good, my only comment would be that we don't want people to
believe that using different users somehow makes the sysv spaces
seperate between the jails.  It doesn't.  Even when using different
uids, a user who gets root in one jail would be able to mess with the
Postgres instance in the other jail through IPC.

Perhaps change:

"This improves security because it prevents one jail from
interfering with shared memory or semaphores in another"

to:

"This improves security because it prevents the postgres user in one
jail from interfering with shared memory or semaphores owned by a
different user in another jail (with BSD jails, root, or the same
UID, in any jail can see and interfere with the shared memory and
semaphores in any other jail of the same UID, or all if root)"

That's still not great but I think it's a little better...
Thanks,
    Stephen

Re: semaphore usage "port based"?

От
Bruce Momjian
Дата:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (pgman@candle.pha.pa.us) wrote:
> >          <para>
> > +         If running in FreeBSD jails by enabling <application>sysconf</>'s
> > +         <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s
> > +         running in different jails should be run by different operating system
> > +         users.  This improves security because it prevents one jail from
> > +         interfering with shared memory or semaphores in another, and it
> > +         allows the PostgreSQL IPC cleanup code to function properly.  
> > +         (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
> > +         processes in other jails, preventing the running of postmasters on the
> > +         same port in different jails.)
> > +        </para>
> 
> This looks good, my only comment would be that we don't want people to
> believe that using different users somehow makes the sysv spaces
> seperate between the jails.  It doesn't.  Even when using different
> uids, a user who gets root in one jail would be able to mess with the
> Postgres instance in the other jail through IPC.
> 
> Perhaps change: 
> 
> "This improves security because it prevents one jail from
> interfering with shared memory or semaphores in another"
> 
> to:
> 
> "This improves security because it prevents the postgres user in one
> jail from interfering with shared memory or semaphores owned by a
> different user in another jail (with BSD jails, root, or the same 
> UID, in any jail can see and interfere with the shared memory and 
> semaphores in any other jail of the same UID, or all if root)"
> 
> That's still not great but I think it's a little better...

I updated the wording to say 'non-root users':
       If running in FreeBSD jails by enabling <application>sysconf</>'s
<literal>security.jail.sysvipc_allowed</>,<application>postmaster</>s       running in different jails should be run by
differentoperating system       users.  This improves security because it prevents non-root users       from
interferingwith shared memory or semaphores in a different jail,       and it allows the PostgreSQL IPC cleanup code to
functionproperly.       (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect       processes in other
jails,preventing the running of postmasters on the       same port in different jails.)
 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: semaphore usage "port based"?

От
Stephen Frost
Дата:
* Bruce Momjian (pgman@candle.pha.pa.us) wrote:
> I updated the wording to say 'non-root users':
>
>         If running in FreeBSD jails by enabling <application>sysconf</>'s
>         <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s
>         running in different jails should be run by different operating system
>         users.  This improves security because it prevents non-root users
>         from interfering with shared memory or semaphores in a different jail,
>         and it allows the PostgreSQL IPC cleanup code to function properly.
>         (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
>         processes in other jails, preventing the running of postmasters on the
>         same port in different jails.)

You're still saying it'll do something that it won't...  It doesn't
prevent non-root users from messing with each other if they're the same
UID, even if they're under different jails...  That's the whole problem
here. :)
Thanks,
    Stephen

Re: semaphore usage "port based"?

От
Bruce Momjian
Дата:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (pgman@candle.pha.pa.us) wrote:
> > I updated the wording to say 'non-root users':
> > 
> >         If running in FreeBSD jails by enabling <application>sysconf</>'s
> >         <literal>security.jail.sysvipc_allowed</>, <application>postmaster</>s
> >         running in different jails should be run by different operating system
> >         users.  This improves security because it prevents non-root users
> >         from interfering with shared memory or semaphores in a different jail,
> >         and it allows the PostgreSQL IPC cleanup code to function properly.
> >         (In FreeBSD 6.0 and later the IPC cleanup code doesn't properly detect
> >         processes in other jails, preventing the running of postmasters on the
> >         same port in different jails.)
> 
> You're still saying it'll do something that it won't...  It doesn't
> prevent non-root users from messing with each other if they're the same
> UID, even if they're under different jails...  That's the whole problem
> here. :)

Uh, the first part says use different Unix users for different jails,
then it says why to do that (security).  Seems clear to me.

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: semaphore usage "port based"?

От
Max Khon
Дата:
Hi!

On Mon, Apr 03, 2006 at 11:56:13PM +0100, Robert Watson wrote:

> >>This is why it's disabled by default, and the jail documentation 
> >>specifically advises of this possibility.  Excerpt below.
> >
> >Ah, I see, glad to see it's accurately documented.
> 
> As it has been for the last five years, I believe since introduction of the 
> setting to allow System V IPC to be used with documented limitations.
> 
> >Given the rather significant use of shared memory by Postgres it seems to 
> >me that jail'ing it under FBSD is unlikely to get you the kind of 
> >isolation between instances that you want (the assumption being that you 
> >want to avoid the possibility of a user under one jail impacting a user in 
> >another jail). As such, I'd suggest finding something else if you truely 
> >need that isolation for Postgres or dropping the jails entirely.
> >
> >Running the Postgres instances under different uids (as you'd probably 
> >expect to do anyway if not using the jails) is probably the right 
> >approach. Doing that and using jails would probably work, just don't 
> >delude yourself into thinking that you're safe from a malicious user in 
> >one jail.
> 
> Yes, there seems to be an awful lot of noise being made about the fact that 
> the system does, in fact, work exactly as documented, and that the 
> configuration being complained about is one that is specifically documented 
> as being unsupported and undesirable.
> 
> As commented elsewhere in this thread, currently, there is no 
> virtualization support for System V IPC in the FreeBSD Jail implementation. 
> That may change if/when someone implements it.  Until it's implemented, it 
> isn't going to be there, and the system won't behave as though it's there 
> no matter how much jumping up and down is done.

sysvipc has been implemented once, but it has been decided that it adds
unnecessary bloat. That's sad.

/fjoe


Re: semaphore usage "port based"?

От
Robert Watson
Дата:
On Tue, 9 May 2006, Max Khon wrote:

>> Yes, there seems to be an awful lot of noise being made about the fact that 
>> the system does, in fact, work exactly as documented, and that the 
>> configuration being complained about is one that is specifically documented 
>> as being unsupported and undesirable.
>>
>> As commented elsewhere in this thread, currently, there is no 
>> virtualization support for System V IPC in the FreeBSD Jail implementation. 
>> That may change if/when someone implements it.  Until it's implemented, it 
>> isn't going to be there, and the system won't behave as though it's there 
>> no matter how much jumping up and down is done.
>
> sysvipc has been implemented once, but it has been decided that it adds 
> unnecessary bloat. That's sad.

I'm not sure I follow the reasoning behind this statement.  Could you direct 
me to the implementation, and at the specific claim that it adds unnecessary 
bloat?  As far as I know, no implementation of jail support for system v ipc 
has ever been rejected on the basis that it adds bloat -- all discussion about 
it has centered on the fact that it is, in fact, a very difficult technical 
problem to solve, which requires careful consideration of the approach and 
tradeoffs, and that that careful consideration has not yet bene done.

Robert N M Watson