Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Дата
Msg-id 20180720150323.v2uschqaedovm7ep@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: [HACKERS] possible self-deadlock window after badProcessStartupPacket
Список pgsql-hackers
On 2018-07-20 11:53:32 +0300, Heikki Linnakangas wrote:
> ISTM that no-one has any great ideas on what to do about the ereport() in
> quickdie().

I think we actually have some decent ideas how to make that less
problematic in a number of cases. Avoid translation (thereby avoiding
direct malloc()) and rely on the fact that the error message evaluation
happens in a memory context with pre-allocated memory.  That still
leaves openssl, but is better than nothing.  If we wanted, we
additionally could force ssl allocations to happen through another
context with preallocated memory.


> But I think we have consensus on replacing the exit(2) calls
> with _exit(2). If we do just that, it would be better than the status quo,
> even if it doesn't completely fix the problem. This would prevent the case
> that Asim reported, for starters.

Right. You're planning to backpatch?


> With _exit(), I think we wouldn't really need the "PG_SETMASK(&BlockSig);"
> calls in the aux process *_quickdie handlers that don't do anything else
> than call _exit(). But I didn't dare to remove them yet.

I think we should remove it at least in master.


> diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
> index 960d3de204..010d53a6ce 100644
> --- a/src/backend/postmaster/bgwriter.c
> +++ b/src/backend/postmaster/bgwriter.c
> @@ -404,22 +404,21 @@ bg_quickdie(SIGNAL_ARGS)
>      /*
>       * We DO NOT want to run proc_exit() callbacks -- we're here because
>       * shared memory may be corrupted, so we don't want to try to clean up our
> -     * transaction.  Just nail the windows shut and get out of town.  Now that
> -     * there's an atexit callback to prevent third-party code from breaking
> -     * things by calling exit() directly, we have to reset the callbacks
> -     * explicitly to make this work as intended.
> -     */
> -    on_exit_reset();
> -
> -    /*
> -     * Note we do exit(2) not exit(0).  This is to force the postmaster into a
> -     * system reset cycle if some idiot DBA sends a manual SIGQUIT to a random
> -     * backend.  This is necessary precisely because we don't clean up our
> -     * shared memory state.  (The "dead man switch" mechanism in pmsignal.c
> -     * should ensure the postmaster sees this as a crash, too, but no harm in
> -     * being doubly sure.)
> +     * transaction.  Just nail the windows shut and get out of town.
> +     *
> +     * There's an atexit callback to prevent third-party code from breaking
> +     * things by calling exit() directly.  We don't want to trigger that, so
> +     * we use _exit(), which doesn't run atexit callbacks, rather than exit().
> +     * And exit() wouldn't be safe to run from a signal handler, anyway.

It's much less the exit() that's unsafe, than the callbacks themselves,
right?  Perhaps just restate that we wouldn't want to trigger atexit
processing due to signal safety?

> +     * Note we use _exit(2) not _exit(0).  This is to force the postmaster
> +     * into a system reset cycle if some idiot DBA sends a manual SIGQUIT to a
> +     * random backend.  This is necessary precisely because we don't clean up
> +     * our shared memory state.  (The "dead man switch" mechanism in
> +     * pmsignal.c should ensure the postmaster sees this as a crash, too, but
> +     * no harm in being doubly sure.)
>       */
> -    exit(2);
> +    _exit(2);
>  }

Greetings,

Andres Freund


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Non-portable shell code in pg_upgrade tap tests