Re: Better detection of staled postmaster.pid

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Better detection of staled postmaster.pid
Дата
Msg-id CAKFQuwZyzi+R2BYC2WvX7i2fkNfjtCr1qqFK_0-U+HMDezEqKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Better detection of staled postmaster.pid  (Kevin Grittner <kgrittn@ymail.com>)
Список pgsql-hackers
On Mon, Aug 31, 2015 at 10:20 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Pavel Raiskup <praiskup@redhat.com> wrote:

> It's been reported [1] that postmaster fails to start against staled
> postmaster.pid after (e.g.) power outage on Fedora, its due to init system
> parallelism and "some" other newly started process can already have allocated
> the same PID as the old postmaster had -- and in this case postmaster refuses
> to delete staled pidfile (which is expected as we need to be really
> careful).
>
> Don't you see some other possible check we could implement to guarantee that
> the PID mentioned in postmaster.pid does not hide concurrent postmaster?

​Most of this can be gleamed from the linked bug report.​..

Was the other newly started process another PostgreSQL cluster?

Yes​​

Was it launched under the same OS user?  (Those are the only
conditions under which I've seen this.)  I think it is wise to use
a separate OS user for each cluster.

​Yes.  Does the pid check that the owner of the pid file match the owner of the process?  While seemingly good advice I'm not sure how it would prevent this scenario - likely due to lack of knowledge on my part.
 

If it's not a matter of multiple clusters running under the same OS
user, please provide more deails, like the specific version and
copy/paste of error messages and relevant log entries

​See report.  I get not having transient data linked to in these kinds of postings but the supplied description and official downstream project bug report seem like sufficient data work operate from even if only in a preliminary fashion.

The only obvious solution is to stop using (pid) as a primary key of sorts and use (pid, timecreated) instead.  After a restart/reboot the timecreated would be guaranteed to have changed and no guessing would be involved.  That seems invasive, though proper, for a problem largely limited to an uncommon distribution-specific setup that requires a unclean shutdown to occur.

David J.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Buildfarm failure from overly noisy warning message
Следующее
От: Rahila Syed
Дата:
Сообщение: Re: [PROPOSAL] VACUUM Progress Checker.