Обсуждение: Redesigning postmaster death handling

Поиск
Список
Период
Сортировка

Redesigning postmaster death handling

От
Thomas Munro
Дата:
Hi,

Here's an experimental patch to fix our shutdown strategy on
postmaster death, as discussed in a nearby report[1].

Maybe it's possible to switch to _exit() without also switching to
preemptive handling, but it seems fragile and painful for no gain.

Following that line of thinking, we might as well just ask the kernel
to hit our existing SIGQUIT handler at parent exit, on Linux/FreeBSD.
Job done.

For systems lacking that facility, the idea I'm trying out here is
that backends that detect the condition in WaitEventSetWait() should
themselves blast all backends with SIGQUIT, in a sense taking over the
role of the departed postmaster.  I didn't really want any
consensus/negotiation over who's going to do that, so... they all do.

Most of the patch is just removing hundreds of lines of errors and
conditions and comments that were now unreachable.

Better ideas, glaring holes in the plan, etc, welcome.

[1] https://www.postgresql.org/message-id/flat/B3C69B86-7F82-4111-B97F-0005497BB745%40yandex-team.ru

Вложения

Re: Redesigning postmaster death handling

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> Here's an experimental patch to fix our shutdown strategy on
> postmaster death, as discussed in a nearby report[1].

Thanks for tackling this topic.

> For systems lacking that facility, the idea I'm trying out here is
> that backends that detect the condition in WaitEventSetWait() should
> themselves blast all backends with SIGQUIT, in a sense taking over the
> role of the departed postmaster.

Hmm.  Up to now, we have not had an assumption that postmaster
children are aware of every other postmaster child.  In particular,
not all postmaster children have PGPROC entries.  How much does
this matter?  What happens if the shared PGPROC array is corrupt?

> I didn't really want any
> consensus/negotiation over who's going to do that, so... they all do.

Agreed on that point.

> Most of the patch is just removing hundreds of lines of errors and
> conditions and comments that were now unreachable.

The patch would likely be a lot more readable if you split out the
"delete unreachable code" part into a separate step.

            regards, tom lane



Re: Redesigning postmaster death handling

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> Following that line of thinking, we might as well just ask the kernel
> to hit our existing SIGQUIT handler at parent exit, on Linux/FreeBSD.
> Job done.

One other thought here: do we *really* want such a critical-and-hard-
to-test aspect of our behavior to be handled completely differently
on different platforms?  I'd lean to ignoring the Linux/FreeBSD
facilities, because otherwise we're basically doubling our testing
problems in exchange for not much.

            regards, tom lane



Re: Redesigning postmaster death handling

От
Thomas Munro
Дата:
On Thu, Aug 21, 2025 at 5:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm.  Up to now, we have not had an assumption that postmaster
> children are aware of every other postmaster child.  In particular,
> not all postmaster children have PGPROC entries.  How much does
> this matter?  What happens if the shared PGPROC array is corrupt?

It's also how we set latches, but yeah it's certainly an issue.

Other ideas:

1.  My other patch that used O_ASYNC (= ask the kernel to send SIGIO
when the pipe becomes readable) worked, but required a pipe or socket
pair per backend and is not actually in any standard.  I think it is
available almost everywhere anyway.  I could rejuvenate that just to
try out again.

2.  I wonder if we could make better use of session IDs.  I understand
that we use them to signal eg archiver + its children, but I wonder if
we could use a different granularity.  postmaster's sid for most
stuff, and per-backend sids when really needed, and then you just have
to signal a small number of sessions, perhaps more than one but not
much more.  We pretend that setsid is optional but it's old POSIX and
everywhere.  I also know that Windows has a similar thing, I just
haven't looked into it.

> > Most of the patch is just removing hundreds of lines of errors and
> > conditions and comments that were now unreachable.
>
> The patch would likely be a lot more readable if you split out the
> "delete unreachable code" part into a separate step.

Will do.



Re: Redesigning postmaster death handling

От
Thomas Munro
Дата:
On Thu, Aug 21, 2025 at 5:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One other thought here: do we *really* want such a critical-and-hard-
> to-test aspect of our behavior to be handled completely differently
> on different platforms?  I'd lean to ignoring the Linux/FreeBSD
> facilities, because otherwise we're basically doubling our testing
> problems in exchange for not much.

Yeah.  That attraction is that it's extremely simple and reliable:
set-and-forget, adding one line that sends you into well tested
immediate shutdown code.  Combined with the fact that most of our user
base has it, that seemed attractive.  The reliability aspects I was
thinking of are: (1) the kernel's knowledge of the process tree is
infallible by definition, (2) it's handled asynchronously on
postmaster exit, not after a POLLHUP, EVFILT_PROCESS, or process
HANDLE event that must be consumed synchronously by at least one
child.

For (2), in practice I think it's close to 100% certain that one
backend will currently or very soon be in WaitEventSetWait() and thus
drive the cleanup operation, and I think it's probably good enough.
For example, even if your backends are all busy, there's basically
always a bunch of "launchers" and other auxiliary processes ready and
waiting to deal with it.  But it's possible to dream up extreme
theoretical scenarios where that bet fails: imagine if every single
backend except for one is current waiting for a lock in sem_wait()
(let's say it's the same lock for simplicity).  I previously said in
some throwaway comment that they can't all be blocked in sem_wait() or
you already have a deadlock (a programming bug that isn't this
system's fault), but if the postmaster AND the backend that holds the
lock are killed by the OOM killer, you lose.  Those backends would
need to be cleaned up manually by an administrator in all released
versions of PostgreSQL, and it's be not better with the v1 patch on
Windows and macOS.  They'd all eat SIGQUIT on a Linux or FreeBSD
system with the v1 patch, so paper at least it's more hole-proof.

I agree that it would be nice to have just one system though, and of
course to make it completely reliable everywhere without complicated
theories.

One argument I thought of against PROC_PDEATHSIG_CTL is that its
simplicity also takes away some possibilities.  Yesterday I wrote
"taking over the role of the departed Postmaster", and realised it's
not the whole enchilada: do we also want the "issuing SIGKILL to
recalcitrant children" bit?  I don't want this system to be
complicated, rather the opposite, but I wonder if there is a nice way
to make it run *literally* the same code as the postmaster.  We'd need
bulletproof data structure sharing, or preferably, no sharing of
modifiable data at all.  Some ideas I'm looking into: better use of
process groups, or maybe doing the book keeping in memory that is not
even mapped into children until they need it.  Or something.
Researching...