Обсуждение: Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

Поиск

Список

Период

Сортировка

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

От

Patrick Verdon

Дата:

29 января 1999 г., 14:05:33

Tatsuo, Vadim, Oleg, Scrappy,

Many thanks for the response.

A couple of you weren't convinced that this
is a Postgres problem so let me try to clear
the water a little bit. Maybe the use of 
Apache and mod_perl is confusing the issue -
the point I was trying to make is that if 
there are 49+ concurrent postgres processes
on a normal machine (i.e. where kernel 
parameters are the defaults, etc.) the 
postmaster dies in a nasty way with 
potentially damaging results. 

Here's a case without Apache/mod_perl that
causes exactly the same behaviour. Simply
enter the following 49 times:

kandinsky:patrick> psql template1 &

Note that I tried to automate this without
success: 

perl -e 'for ( 1..49 ) { system("/usr/local/pgsql/bin/psql template1 &"); }'

The 49th attempt to initiate a connection 
fails:

Connection to database 'template1' failed.
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
beforeor while processing the request.
 

and the error_log says:

InitPostgres
IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600
proc_exit(3) [#0]
shmem_exit(3) [#0]
exit(3)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 1521 exited with status 768
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 1518
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 

FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting.

FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting.


Even if there is a hard limit there is no way that 
Postgres should die in this spectacular fashion.
I wouldn't have said that it was unreasonable for
some large applications to peak at >48 processes
when using powerful hardware with plenty of RAM.

The other point is that even if one had 1 GB RAM,
Postgres won't scale beyond 48 processes, using
probably less than 100 MB of RAM. Would it be
possible to make the 'MaxBackendId' configurable
for those who have the resources?

I have reproduced this behaviour on both 
FreeBSD 2.2.8 and Intel Solaris 2.6 using
version 6.4.x of PostgreSQL.

I'll try to change some of the parameters
suggested and see how far I get but the bottom 
line is Postgres shouldn't be dying like this.

Let me know if you need any more info.

Cheers.



Patrick

-- 

#===============================#
\  KAN Design & Publishing Ltd  /
/  T: +44 (0)1223 511134        \
\  F: +44 (0)1223 571968        /
/  E: mailto:patrick@kan.co.uk  \ 
\  W: http://www.kan.co.uk      /
#===============================#

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

От

Hannu Krosing

Дата:

29 января 1999 г., 15:03:20

Patrick Verdon wrote:
> 
> 
> Even if there is a hard limit there is no way that
> Postgres should die in this spectacular fashion.

[snip]

> I have reproduced this behaviour on both
> FreeBSD 2.2.8 and Intel Solaris 2.6 using
> version 6.4.x of PostgreSQL.
> 
> I'll try to change some of the parameters
> suggested and see how far I get but the bottom
> line is Postgres shouldn't be dying like this.

We definitely need a chapter on tuning postgres in some of the manuals.

It should contain not only the parameters that one can change in
PostgreSQL - for either better response or for taking a larger load -
but also the ways one can tune the underlying OS, being it Linux, *BSD, 
Solaris or whatever.

Even commercial databases (at least Oracle) tend to rebuild kernel 
during installation (obsereved with Oracle 7.1 on Solaris)

When I once needed the info about setting shared memory limits on 
solaris I cried out here and got the example lines (I actually had them 
already copied from a macine where oracle was running)

But the same info, and possibly more(increasing limits for max 
files per process/globally, shared mem config, ... whatever else 
is needed) seems to be essential part of setting up a serious DB 
server on any system.

---------------
Hannu

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

От

Tom Lane

Дата:

29 января 1999 г., 16:15:13

Patrick Verdon <patrick@kan.co.uk> writes:
> the point I was trying to make is that if there are 49+ concurrent
> postgres processes on a normal machine (i.e. where kernel parameters
> are the defaults, etc.) the postmaster dies in a nasty way with
> potentially damaging results.

Right.  It looks to me like your problem is running out of SysV
semaphores:

> IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600

(read the man page for semget(2):         [ENOSPC]       A semaphore identifier is to be created, but the
        system-imposed limit on the maximum number of                        allowed semaphore identifiers system wide
wouldbe                        exceeded.

Old bad habit of Unix kernel programmers: re-use closest available error
code, rather than deal with the hassle of inventing a new kernel errno.)

You can increase the kernel's number-of-semaphores parameter (on my box,
both SEMMNI and SEMMNS need to be changed), but it'll probably take a
kernel rebuild to do it.

> Even if there is a hard limit there is no way that 
> Postgres should die in this spectacular fashion.

Well, running out of resources is something that it's hard to guarantee
recovery from.  Postgres is designed on the assumption that it's better
to try to prevent corruption of the database than to try to limp along
after a failure --- so the crash recovery behavior is exactly what you
see, mutual mass suicide of all surviving backends.  Restarting all your
clients is a pain in the neck, agreed, but would you rather have
database corruption spreading invisibly?

> The other point is that even if one had 1 GB RAM,
> Postgres won't scale beyond 48 processes, using
> probably less than 100 MB of RAM. Would it be
> possible to make the 'MaxBackendId' configurable
> for those who have the resources?

MaxBackendId is 64 by default, so that's not the limit you're hitting.

It should be easier to configure MaxBackendId --- probably it should be
an option to the configure script.  I've put this on my personal to-do
list.  (I don't think it's a good idea to have *no* upper limit, even
if it were easy to do in the code --- otherwise an unfriendly person
could run you out of memory by starting more and more clients.  If he
stops just short of exhausting swap space, then Postgres is perfectly
happy, but all the rest of your system starts misbehaving ... not cool.)

Another thing we ought to look at is changing the use of semaphores so
that Postgres uses a fixed number of semaphores, not a number that
increases as more and more backends are started.  Kernels are
traditionally configured with very low limits for the SysV IPC
resources, so having a big appetite for semaphores is a Bad Thing.

Right now it looks like we use a sema per backend to support spinlocks.
Perhaps we could just use a single sema that all backends block on when
waiting for a spinlock?  This might be marginally slower, or it might
not, but hopefully one is not blocking on spinlocks too often anyway.
Or, given that the system seems to contain only a small fixed number of
spinlocks, maybe a sema per spinlock would work best.
        regards, tom lane

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

От

Tatsuo Ishii

Дата:

29 января 1999 г., 23:22:38

> MaxBackendId is 64 by default, so that's not the limit you're hitting.
> 
> It should be easier to configure MaxBackendId --- probably it should be
> an option to the configure script.  I've put this on my personal to-do
> list.  (I don't think it's a good idea to have *no* upper limit, even

Or even better, MaxBackendId can be set at the run time such as
postmaster's option. Also, it would be nice if we could monitor number
of backends currently running. Maybe we should have a new protocol for
this kind of puropose?

BTW, as I pointed out before, PostgreSQL will have serious problem
once hitting the MaxBackendId. My patches I proposed for this seem
still under discussion. I think we should solve the problem in the
next release in whatever way, however.
---
Tatsuo Ishii

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

От

The Hermit Hacker

Дата:

30 января 1999 г., 06:06:08

On Fri, 29 Jan 1999, Patrick Verdon wrote:

> 
> Tatsuo, Vadim, Oleg, Scrappy,
> 
> Many thanks for the response.
> 
> A couple of you weren't convinced that this
> is a Postgres problem so let me try to clear
> the water a little bit. Maybe the use of 
> Apache and mod_perl is confusing the issue -
> the point I was trying to make is that if 
> there are 49+ concurrent postgres processes
> on a normal machine (i.e. where kernel 
> parameters are the defaults, etc.) the 
> postmaster dies in a nasty way with 
> potentially damaging results. 
> 
> Here's a case without Apache/mod_perl that
> causes exactly the same behaviour. Simply
> enter the following 49 times:
> 
> kandinsky:patrick> psql template1 &
> 
> Note that I tried to automate this without
> success: 
> 
> perl -e 'for ( 1..49 ) { system("/usr/local/pgsql/bin/psql template1 &"); }'
> 
> The 49th attempt to initiate a connection 
> fails:
> 
> Connection to database 'template1' failed.
> pqReadData() -- backend closed the channel unexpectedly.
>         This probably means the backend terminated abnormally before or while processing the request.
> 
> and the error_log says:
> 
> InitPostgres
> IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600

this error indicates taht you are out of semaphores...you have enough
configures to allow for 48 processes, but not the 49th...

> I have reproduced this behaviour on both 
> FreeBSD 2.2.8 and Intel Solaris 2.6 using
> version 6.4.x of PostgreSQL.

Both of them have "default" settings for semaphores...I don't recall what
they are, but the error you are seeing about IPCSemaphoreCreate indicates
that you are exceeding it...

> I'll try to change some of the parameters
> suggested and see how far I get but the bottom 
> line is Postgres shouldn't be dying like this.

PostgreSQL cannot allocate past what the operating sytem has hardcoded as
the max...maybe a more graceful exit should be in order, though?  Or is
that what you mean?

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

От

Tom Lane

Дата:

30 января 1999 г., 18:19:51

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> BTW, as I pointed out before, PostgreSQL will have serious problem
> once hitting the MaxBackendId. My patches I proposed for this seem
> still under discussion.

Not sure why that didn't get applied before, but I just put it in,
and verified that you can start exactly MaxBackendId backends
(assuming that you don't hit any kernel resource limits on the way).

BTW, we do recover quite gracefully from hitting MAXUPRC (kernel
limit on processes for one userid) :-).  But that's just because the
postmaster's initial fork() fails.  A failure any later than that
in backend startup will be treated as a backend crash ...

I agree with Hannu Krosing's remark that we really need some
documentation about kernel parameters that have to be checked when
setting up a non-toy database server.  I've personally run into
NFILES limits, for instance, with not all that many backends running.
        regards, tom lane

Reducing sema usage (was Postmaster dies with many child processes)

От

Tom Lane

Дата:

30 января 1999 г., 22:12:30

I said:
> Another thing we ought to look at is changing the use of semaphores so
> that Postgres uses a fixed number of semaphores, not a number that
> increases as more and more backends are started. Kernels are
> traditionally configured with very low limits for the SysV IPC
> resources, so having a big appetite for semaphores is a Bad Thing.

I've been looking into this issue today, and it looks possible but messy.

The source of the problem is the lock manager
(src/backend/storage/lmgr/proc.c), which wants to be able to wake up a
specific process that is blocked on a lock. I had first thought that it
would be OK to wake up any one of the processes waiting for a lock, but
after looking at the lock manager that seems a bad idea --- considerable
thought has gone into the queuing order of waiting processes, and we
don't want to give that up. So we need to preserve this ability.

The way it's currently done is that each extant backend has its own
SysV-style semaphore, and when you want to wake up a particular backend
you just V() its semaphore. (BTW, the semaphores get allocated in
chunks of 16, so an out-of-semaphores condition will always occur when
trying to start the 16*N+1'th backend...) This is simple and reliable
but fails if you want to have more backends than the kernel has SysV
semaphores. Unfortunately kernels are usually configured with not
very many semaphores --- 64 or so is typical. Also, running the system
down to nearly zero free semaphores is likely to cause problems for
other subsystems even if Postgres itself doesn't run out.

What seems practical to do instead is this:
* At postmaster startup, allocate a fixed number of semaphores for use by all child backends. ("Fixed" can really mean
"configurable",of course, but the point is we won't ask for more later.)

* The semaphores aren't dedicated to use by particular backends. Rather, when a backend needs to block, it finds a
currentlyfree semaphore and grabs it for the duration of its wait. The number of the semaphore a backend is using to
waitwith would be recorded in its PROC struct, and we'd also need an array of per-sema data to keep track of free and
in-usesemaphores.

* This works with very little extra overhead until we have more simultaneously-blocked backends than we have
semaphores. When that happens (which we hope is really seldom), we overload semaphores --- that is, we use the same
semato block two or more backends. Then the V() operation by the lock's releaser might wake the wrong backend. So, we
needan extra field in the LOCK struct to identify the intended wake-ee. When a backend is released in ProcSleep, it
hasto look at the lock it is waiting on to see if it is supposed to be wakened right now. If not, it V()s its shared
semaphorea second time (to release the intended wakee), then P()s the semaphore again to go back to sleep itself.
Thereprobably has to be a delay in here, to ensure that the intended wakee gets woken and we don't have its bed-mates
indefinitelytrading wakeups among the wrong processes. This is why we don't want this scenario happening often.

I think this could be made to work, but it would be a delicate and
hard-to-test change in what is already pretty subtle code.

A considerably more straightforward approach is just to forget about
incremental allocation of semaphores and grab all we could need at
postmaster startup. ("OK, Mac, you told me to allow up to N backends?
Fine, I'm going to grab N semaphores at startup, and if I can't get them
I won't play.") This would force the DB admin to either reconfigure the
kernel or reduce MaxBackendId to something the kernel can support right
off the bat, rather than allowing the problem to lurk undetected until
too many clients are started simultaneously. (Note there are still
potential gotchas with running out of processes, swap space, or file
table slots, so we wouldn't have really guaranteed that N backends can
be started safely.)

If we make MaxBackendId settable from a postmaster command-line switch
then this second approach is probably not too inconvenient, though it
surely isn't pretty.

Any thoughts about which way to jump? I'm sort of inclined to take
the simpler approach myself...
regards, tom lane

Re: Reducing sema usage (was Postmaster dies with many child processes)

От

Tom Lane

Дата:

30 января 1999 г., 22:43:03

I said:
> Any thoughts about which way to jump?  I'm sort of inclined to take
> the simpler approach myself...

A further thought: we could leave the semaphore management as-is,
and instead try to make running out of semaphores a less catastrophic
failure.  I'm thinking that the postmaster could be the one to try
to allocate more semaphores whenever there are none left, just before
trying to fork a new backend.  (The postmaster has access to the same
shared memory as the backends, right?  So no reason it couldn't do this.)
If the allocation fails, it can simply refuse the connection request,
rather than having to proceed as though we'd had a full-fledged backend
crash.  This only works because we can predict the number of semas
needed by an additional backend -- but we can: one.
        regards, tom lane

Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes)

От

The Hermit Hacker

Дата:

30 января 1999 г., 23:56:26

On Sat, 30 Jan 1999, Tom Lane wrote:

> I said:
> > Another thing we ought to look at is changing the use of semaphores so
> > that Postgres uses a fixed number of semaphores, not a number that
> > increases as more and more backends are started.  Kernels are
> > traditionally configured with very low limits for the SysV IPC
> > resources, so having a big appetite for semaphores is a Bad Thing.
> 
> I've been looking into this issue today, and it looks possible but messy.
> 
> The source of the problem is the lock manager
> (src/backend/storage/lmgr/proc.c), which wants to be able to wake up a
> specific process that is blocked on a lock.  I had first thought that it
> would be OK to wake up any one of the processes waiting for a lock, but
> after looking at the lock manager that seems a bad idea --- considerable
> thought has gone into the queuing order of waiting processes, and we
> don't want to give that up.  So we need to preserve this ability.
> 
> The way it's currently done is that each extant backend has its own
> SysV-style semaphore, and when you want to wake up a particular backend
> you just V() its semaphore.  (BTW, the semaphores get allocated in
> chunks of 16, so an out-of-semaphores condition will always occur when
> trying to start the 16*N+1'th backend...)  This is simple and reliable
> but fails if you want to have more backends than the kernel has SysV
> semaphores.  Unfortunately kernels are usually configured with not
> very many semaphores --- 64 or so is typical.  Also, running the system
> down to nearly zero free semaphores is likely to cause problems for
> other subsystems even if Postgres itself doesn't run out.
> 
> What seems practical to do instead is this:
> * At postmaster startup, allocate a fixed number of semaphores for
>   use by all child backends.  ("Fixed" can really mean "configurable",
>   of course, but the point is we won't ask for more later.)
> * The semaphores aren't dedicated to use by particular backends.
>   Rather, when a backend needs to block, it finds a currently free
>   semaphore and grabs it for the duration of its wait.  The number
>   of the semaphore a backend is using to wait with would be recorded
>   in its PROC struct, and we'd also need an array of per-sema data
>   to keep track of free and in-use semaphores.
> * This works with very little extra overhead until we have more
>   simultaneously-blocked backends than we have semaphores.  When that
>   happens (which we hope is really seldom), we overload semaphores ---
>   that is, we use the same sema to block two or more backends.  Then
>   the V() operation by the lock's releaser might wake the wrong backend.
>   So, we need an extra field in the LOCK struct to identify the intended
>   wake-ee.  When a backend is released in ProcSleep, it has to look at
>   the lock it is waiting on to see if it is supposed to be wakened
>   right now.  If not, it V()s its shared semaphore a second time (to
>   release the intended wakee), then P()s the semaphore again to go
>   back to sleep itself.  There probably has to be a delay in here,
>   to ensure that the intended wakee gets woken and we don't have its
>   bed-mates indefinitely trading wakeups among the wrong processes.
>   This is why we don't want this scenario happening often.
> 
> I think this could be made to work, but it would be a delicate and
> hard-to-test change in what is already pretty subtle code.
> 
> A considerably more straightforward approach is just to forget about
> incremental allocation of semaphores and grab all we could need at
> postmaster startup.  ("OK, Mac, you told me to allow up to N backends?
> Fine, I'm going to grab N semaphores at startup, and if I can't get them
> I won't play.")  This would force the DB admin to either reconfigure the
> kernel or reduce MaxBackendId to something the kernel can support right
> off the bat, rather than allowing the problem to lurk undetected until
> too many clients are started simultaneously.  (Note there are still
> potential gotchas with running out of processes, swap space, or file
> table slots, so we wouldn't have really guaranteed that N backends can
> be started safely.)
> 
> If we make MaxBackendId settable from a postmaster command-line switch
> then this second approach is probably not too inconvenient, though it
> surely isn't pretty.
> 
> Any thoughts about which way to jump?  I'm sort of inclined to take
> the simpler approach myself...

I'm inclined to agree...get rid of the 'hard coded' max, make it a
settable option on run time, and 'reserve the semaphores' on startup...

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: [HACKERS] Re: Reducing sema usage (was Postmaster dies with many child processes)

От

Bruce Momjian

Дата:

31 января 1999 г., 00:02:13

> I said:
> > Any thoughts about which way to jump?  I'm sort of inclined to take
> > the simpler approach myself...
> 
> A further thought: we could leave the semaphore management as-is,
> and instead try to make running out of semaphores a less catastrophic
> failure.  I'm thinking that the postmaster could be the one to try
> to allocate more semaphores whenever there are none left, just before
> trying to fork a new backend.  (The postmaster has access to the same
> shared memory as the backends, right?  So no reason it couldn't do this.)
> If the allocation fails, it can simply refuse the connection request,
> rather than having to proceed as though we'd had a full-fledged backend
> crash.  This only works because we can predict the number of semas
> needed by an additional backend -- but we can: one.

If they asked for 64 backends, we better be able go give them to them,
and not crash or fail under a load.  64 semaphores is nothing.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes)

От

Bruce Momjian

Дата:

31 января 1999 г., 00:02:18

> A considerably more straightforward approach is just to forget about
> incremental allocation of semaphores and grab all we could need at
> postmaster startup.  ("OK, Mac, you told me to allow up to N backends?
> Fine, I'm going to grab N semaphores at startup, and if I can't get them
> I won't play.")  This would force the DB admin to either reconfigure the
> kernel or reduce MaxBackendId to something the kernel can support right
> off the bat, rather than allowing the problem to lurk undetected until
> too many clients are started simultaneously.  (Note there are still
> potential gotchas with running out of processes, swap space, or file
> table slots, so we wouldn't have really guaranteed that N backends can
> be started safely.)
> 
> If we make MaxBackendId settable from a postmaster command-line switch
> then this second approach is probably not too inconvenient, though it
> surely isn't pretty.
> 
> Any thoughts about which way to jump?  I'm sort of inclined to take
> the simpler approach myself...

Semaphore are hard enough without overloading them.  I say just gram
them on startup.  They are cheap.  Many databases use semaphores for
every row/page they lock, and boy that can be a lot of semaphores.  We
are only getting a few.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Re: Reducing sema usage (was Postmaster dies with many child processes)

От

Tom Lane

Дата:

31 января 1999 г., 16:01:36

Bruce Momjian <maillist@candle.pha.pa.us> writes:
>> A further thought: we could leave the semaphore management as-is,
>> and instead try to make running out of semaphores a less catastrophic
>> failure.

> If they asked for 64 backends, we better be able go give them to them,
> and not crash or fail under a load.  64 semaphores is nothing.

That argument would be pretty convincing if pre-grabbing the semaphores
was sufficient to ensure we could start N backends, but of course it's
not sufficient.  The system could also run out of processes or file
descriptors, and I doubt that it's reasonable to grab all of those
instantly at postmaster startup.

The consensus seems clear not to go for the complex solution I described
at first.  But I'm still vacillating whether to do pre-reservation of
semaphores or just fix the postmaster to reject a connection cleanly if
no more can be gotten.  An advantage of the latter is that it would more
readily support on-the-fly changes of the max backend limit.  (Which I
am *not* proposing to support now; I only plan to make it settable at
postmaster startup; but someday we might want to change it on the fly.)
        regards, tom lane

Re: [HACKERS] Re: Reducing sema usage (was Postmaster dies with many child processes)

От

"Oliver Elphick"

Дата:

31 января 1999 г., 19:34:06

Tom Lane wrote: >Bruce Momjian <maillist@candle.pha.pa.us> writes: >> If they asked for 64 backends, we better be able
gogive them to them, >> and not crash or fail under a load.  64 semaphores is nothing. > >That argument would be pretty
convincingif pre-grabbing the semaphores >was sufficient to ensure we could start N backends, but of course it's >not
sufficient. The system could also run out of processes or file >descriptors, and I doubt that it's reasonable to grab
allof those >instantly at postmaster startup.
 
The major problem at the moment is not that a new backend fails, but
that it brings down everything else with it.  How about having a new
backend set a one-byte flag in shared memory when it has
finished setting itself up? as long as the flag is unset, the
backend is still starting itself up, and a failure will not require
other backends to be brought down.

-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver              PGP key from public servers; key
ID32B8FAA1                ========================================    "Jesus saith unto him, I am the way, the truth,
andthe     life; no man cometh unto the Father, but by me."                                     John 14:6

Re: [HACKERS] Re: Reducing sema usage (was Postmaster dies with many child processes)

От

Tom Lane

Дата:

31 января 1999 г., 21:04:42

"Oliver Elphick" <olly@lfix.co.uk> writes:
> The major problem at the moment is not that a new backend fails, but
> that it brings down everything else with it.

Agreed.

> How about having a new backend set a one-byte flag in shared memory
> when it has finished setting itself up? as long as the flag is unset,
> the backend is still starting itself up, and a failure will not
> require other backends to be brought down.

Not much win to be had there, I suspect.  The main problem is that as
soon as a new backend starts altering shared memory, you have potential
corruption issues to worry about if it goes down.  And there's not
really very much the new backend can do before it alters shared memory.
In fact, it can't do much of *anything* until it's made an entry for
itself in the lock manager's PROC array, because it cannot find out
anything interesting without locking shared structures.

Hmm.  If that's true, then the failure to get a sema would occur very
early in the new backend's lifetime, before it's had a chance to create
any trouble.  Maybe the very easiest solution to the sema issue is to
make the new backend send a failure report to its client and then
exit(0) instead of exit(1), so that the postmaster considers it a clean
exit rather than a crash...
        regards, tom lane

Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes)

От

Vadim Mikheev

Дата:

01 февраля 1999 г., 00:45:34

Tom Lane wrote:
> 
> I said:
> > Another thing we ought to look at is changing the use of semaphores so
> > that Postgres uses a fixed number of semaphores, not a number that
> > increases as more and more backends are started.  Kernels are
> > traditionally configured with very low limits for the SysV IPC
> > resources, so having a big appetite for semaphores is a Bad Thing.
> 
...
> 
> Any thoughts about which way to jump?  I'm sort of inclined to take
> the simpler approach myself...

Could we use sigpause (or something like this) to block
and some signal to wake up?

Vadim

Re: [HACKERS] Re: Reducing sema usage (was Postmaster dies with many child processes)

От

"Thomas G. Lockhart"

Дата:

01 февраля 1999 г., 10:46:05

> Hmm.  If that's true, then the failure to get a sema would occur very
> early in the new backend's lifetime, before it's had a chance to 
> create any trouble.  Maybe the very easiest solution to the sema issue 
> is to make the new backend send a failure report to its client and 
> then exit(0) instead of exit(1), so that the postmaster considers it a 
> clean exit rather than a crash...

Sounds like the cleanest solution too. If it pans out, I like it...
                      - Tom

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)