Re: We shouldn't signal process groups with SIGQUIT

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: We shouldn't signal process groups with SIGQUIT
Дата
Msg-id CA+hUKGJvK0Py8BJar+HVfPUUcERLCJpnYhztpRz6cKhq0svp+w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: We shouldn't signal process groups with SIGQUIT  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: We shouldn't signal process groups with SIGQUIT
Список pgsql-hackers
On Tue, Feb 28, 2023 at 5:45 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Just naively hacking this behaviour change into the current code, would yield
> > sending SIGQUIT to postgres, and then SIGTERM to the whole process
> > group. Which seems like a reasonable order?  quickdie() should _exit()
> > immediately in the signal handler, so we shouldn't get to processing the
> > SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> > with SIGTERM being processed first, the SIGQUIT handler should be executed
> > long before the next CFI().
>
> I have been poking a bit at that, and did a change as simple as this
> one in signal_child():
>  #ifdef HAVE_SETSID
> +   if (signal == SIGQUIT)
> +       signal = SIGTERM;
>
> From what I can see, SIGTERM is actually received by the backends
> before SIGQUIT, and I can also see that the backends have enough room
> to process CFIs in some cases, especially short queries, even before
> reaching quickdie() and its exit().  So the window between SIGTERM and
> SIGQUIT is not as long as one would think.

Pop quiz: in what order do signal handlers run, if SIGQUIT and SIGTERM
are both pending when a process wakes up or unblocks?  I *think* the
answer on all typical implementation that follow conventions going
back to ancient Unix (but not standardised, so you can't count on
it!*), is that pending signals are delivered in order of the bits in
the pending signals bitmap from lowest to highest, and SIGQUIT <
SIGTERM (again: tradition, not standard), and then:

1.  If the handlers block each other via their sa_mask so that they
are serialised (note: ours don't) then you'll see the SIGQUIT handler
run and then the SIGTERM handler, for example if you do kill(self,
SIGTERM), kill(self, SIGQUIT), sigprocmask(SIG_SETMASK, &unblock_all,
NULL).

2.  If the handlers don't block each other (our case), then their
stack frames will be set up in that order (you might say they start in
that order but are immediately interrupted by the next one before they
can do anything), so they then run in the reverse order, SIGTERM
first.  I guess that is what you saw?

In theory you could straighten this out by asking what else is pending
so that we imposed our own priority, if that were a problem, but there
is something I don't understand: you said we could handle SIGTERM and
then make it all the way to CFI() (= non-signal handler code) before
handling a SIGQUIT that was sent first.  Huh... what am I missing?  I
thought the only risk was handlers running in the opposite of send
order because they 'overlapped', not non-handler code being allowed to
run in between.

*The standard explicitly says that delivery order is unspecified,
except for realtime signals which are aren't using.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: stopgap fix for signal handling during restore_command
Следующее
От: Andres Freund
Дата:
Сообщение: Re: We shouldn't signal process groups with SIGQUIT