Tom Lane <tgl@sss.pgh.pa.us> writes:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> On 2021-May-03, Andres Freund wrote:
>>> The issue turns out to be that postgres was in a container, with pid
>>> namespaces enabled. Because postgres was run directly in the container,
>>> without a parent process inside, it thus becomes pid 1. Which mostly
>>> works without a problem. Until, as the case here with the archive
>>> command, a sub-sub process exits while it still has a child. Then that
>>> child gets re-parented to postmaster (as init).
>
>> Hah .. interesting. I think we should definitely make this work, since
>> containerized stuff is going to become more and more prevalent.
>
> How would we make it "work"? The postmaster can't possibly be expected
> to know the right thing to do with unexpected children.
>
>> I guess we can do that in older releases, but do we really need it? As
>> I understand, the only thing we need to do is verify that the dying PID
>> is a backend PID, and not cause a crash cycle if it isn't.
> Maybe we should put in a startup-time check, analogous to the
> can't-run-as-root test, that the postmaster mustn't be PID 1.
Given that a number of minimal `init`s already exist specifically for
the case of running a single application in a container, I don't think
Postgres should to reinvent that wheel. A quick eyball of the output of
`apt search container init` on a Debian Bullseyse system reveals at
least four:
- https://github.com/Yelp/dumb-init
- https://github.com/krallin/tini
- https://github.com/fpco/pid1
- https://github.com/openSUSE/catatonit
The first one also explains why there's more to being PID 1 than just
handling reparented children.
- ilmari
--
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
the consequences of." -- Skud's Meta-Law