Re: We shouldn't signal process groups with SIGQUIT

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: We shouldn't signal process groups with SIGQUIT
Дата	2 марта 2023 г. 02:29:28
Msg-id	CA+hUKGJvK0Py8BJar+HVfPUUcERLCJpnYhztpRz6cKhq0svp+w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: We shouldn't signal process groups with SIGQUIT (Michael Paquier <michael@paquier.xyz>)
Ответы	Re: We shouldn't signal process groups with SIGQUIT
Список	pgsql-hackers

Дерево обсуждения

On Tue, Feb 28, 2023 at 5:45 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> > Just naively hacking this behaviour change into the current code, would yield
> > sending SIGQUIT to postgres, and then SIGTERM to the whole process
> > group. Which seems like a reasonable order?  quickdie() should _exit()
> > immediately in the signal handler, so we shouldn't get to processing the
> > SIGTERM.  Even if both signals are "reacted to" at the same time, possibly
> > with SIGTERM being processed first, the SIGQUIT handler should be executed
> > long before the next CFI().
>
> I have been poking a bit at that, and did a change as simple as this
> one in signal_child():
>  #ifdef HAVE_SETSID
> +   if (signal == SIGQUIT)
> +       signal = SIGTERM;
>
> From what I can see, SIGTERM is actually received by the backends
> before SIGQUIT, and I can also see that the backends have enough room
> to process CFIs in some cases, especially short queries, even before
> reaching quickdie() and its exit().  So the window between SIGTERM and
> SIGQUIT is not as long as one would think.

Pop quiz: in what order do signal handlers run, if SIGQUIT and SIGTERM
are both pending when a process wakes up or unblocks?  I *think* the
answer on all typical implementation that follow conventions going
back to ancient Unix (but not standardised, so you can't count on
it!*), is that pending signals are delivered in order of the bits in
the pending signals bitmap from lowest to highest, and SIGQUIT <
SIGTERM (again: tradition, not standard), and then:

1.  If the handlers block each other via their sa_mask so that they
are serialised (note: ours don't) then you'll see the SIGQUIT handler
run and then the SIGTERM handler, for example if you do kill(self,
SIGTERM), kill(self, SIGQUIT), sigprocmask(SIG_SETMASK, &unblock_all,
NULL).

2.  If the handlers don't block each other (our case), then their
stack frames will be set up in that order (you might say they start in
that order but are immediately interrupted by the next one before they
can do anything), so they then run in the reverse order, SIGTERM
first.  I guess that is what you saw?

In theory you could straighten this out by asking what else is pending
so that we imposed our own priority, if that were a problem, but there
is something I don't understand: you said we could handle SIGTERM and
then make it all the way to CFI() (= non-signal handler code) before
handling a SIGQUIT that was sent first.  Huh... what am I missing?  I
thought the only risk was handlers running in the opposite of send
order because they 'overlapped', not non-handler code being allowed to
run in between.

*The standard explicitly says that delivery order is unspecified,
except for realtime signals which are aren't using.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Nathan Bossart
Дата: 02 марта 2023 г., 02:26:33
Сообщение: Re: stopgap fix for signal handling during restore_command

Следующее

От: Andres Freund
Дата: 02 марта 2023 г., 02:34:30
Сообщение: Re: We shouldn't signal process groups with SIGQUIT

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: We shouldn't signal process groups with SIGQUIT

Предыдущее

Следующее