Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Дата
Msg-id 20130621023645.GE4724@eldon.alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)  (Noah Misch <noah@leadboat.com>)
Ответы Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
Noah Misch escribió:
> On Thu, Jun 20, 2013 at 12:33:25PM -0400, Alvaro Herrera wrote:
> > MauMau escribi?:
> > > Here, "reliable" means that the database server is certainly shut
> > > down when pg_ctl returns, not telling a lie that "I shut down the
> > > server processes for you, so you do not have to be worried that some
> > > postgres process might still remain and write to disk".  I suppose
> > > reliable shutdown is crucial especially in HA cluster.  If pg_ctl
> > > stop -mi gets stuck forever when there is an unkillable process (in
> > > what situations does this happen? OS bug, or NFS hard mount?), I
> > > think the DBA has to notice this situation from the unfinished
> > > pg_ctl, investigate the cause, and take corrective action.
> > 
> > So you're suggesting that keeping postmaster up is a useful sign that
> > the shutdown is not going well?  I'm not really sure about this.  What
> > do others think?
> 
> It would be valuable for "pg_ctl -w -m immediate stop" to have the property
> that an subsequent start attempt will not fail due to the presence of some
> backend still attached to shared memory.  (Maybe that's true anyway or can be
> achieved a better way; I have not investigated.)

Well, the only case where a process that's been SIGKILLed does not go
away, as far as I know, is when it is in some uninterruptible sleep due
to in-kernel operations that get stuck.  Personally I have never seen
this happen in any other case than some network filesystem getting
disconnected, or a disk that doesn't respond.  And whenever the
filesystem starts to respond again, the process gets out of its sleep
only to die due to the signal.

So a subsequent start attempt will either find that the filesystem is
not responding, in which case it'll probably fail to work properly
anyway (presumably the filesystem corresponds to part of the data
directory), or that it has revived in which case the old backends have
already gone away.

If we leave postmaster running after SIGKILLing its children, the only
thing we can do is have it continue to SIGKILL processes continuously
every few seconds until they die (or just sit around doing nothing until
they all die).  I don't think this will have a different effect than
postmaster going away trusting the first SIGKILL to do its job
eventually.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ian Link
Дата:
Сообщение: Re: Support for RANGE ... PRECEDING windows in OVER
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Support for RANGE ... PRECEDING windows in OVER