Re: auto removing stale pid for postmaster NT service

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: auto removing stale pid for postmaster NT service
Дата
Msg-id 2208.1032211658@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: auto removing stale pid for postmaster NT service  (Andrew Sullivan <andrew@libertyrms.info>)
Ответы Re: auto removing stale pid for postmaster NT service  (Andrew Sullivan <andrew@libertyrms.info>)
Список pgsql-admin
Andrew Sullivan <andrew@libertyrms.info> writes:
> This is because there is a process with the same pid as the
> postmaster.  This will happen in cases where the machine crashes and
> starts up again -- something else happens to get the (former)
> postgres pid at startup, and so when postgres checks for a process
> with that pid, one exists.  And kerplooey.

FYI, sendmail has the same restart failure mode; I imagine a lot of
other Unix daemons do too.

> I seem to recall that someone (maybe Tom Lane?) suggested an
> extension to the current pidfile check, so that it will also check to
> see if the process really is PostgreSQL.  But I don't know if it was
> implemented.

It wasn't yet, mainly because it's not obvious how to tell reliably
whether some other process is a postmaster or not.

I think I had suggested distinguishing EPERM from other kill() errors,
which would tell us whether the other process is under the same userid
as us or not; if not, we could perhaps safely assume that it's not a
postmaster (or at least not one likely to be using our data directory).

Unfortunately, that doesn't really improve the odds very much.  The
typical scenario for this problem is that the PID we get assigned will
wobble around by one or two counts from one boot cycle to the next,
depending on just how fast other startup processes manage to finish.
(If we get the exact same PID as before, there's no problem; the code
is smart enough to notice that case.)  But the PID(s) adjacent to the
postmaster's will likely also belong to the postgres user --- consider
the shell that launched us, for example.  The shell, or whatever it
might launch right after the postmaster, would look enough like a
postmaster to fool this simplistic test.

So I'm at a loss how the postmaster can improve the reliability of this
check, without throwing the baby out with the bathwater by making a
check that might fail to recognize a conflicting postmaster.  The
consequences of that would be *dire*.

The best solution is probably to forcibly unlink the postmaster.pid
file in some startup script --- but it has to be a script that is *only*
run during boot, never anytime later.  The postgres start script is
not the place for this.

            regards, tom lane

В списке pgsql-admin по дате отправления:

Предыдущее
От: Andrew Sullivan
Дата:
Сообщение: Re: psql database recovery error
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: compiling pgsql