Re: Use of signal-unsafe functions from signal handlers

Поиск
Список
Период
Сортировка
От Mats Kindahl
Тема Re: Use of signal-unsafe functions from signal handlers
Дата
Msg-id CA+14426Zpdi6=jr9Z+C_tzS36pVa_+BbCHSeriqd=sLw4X6_iw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Use of signal-unsafe functions from signal handlers  (Julien Rouhaud <rjuju123@gmail.com>)
Ответы Re: Use of signal-unsafe functions from signal handlers  (Julien Rouhaud <rjuju123@gmail.com>)
Список pgsql-bugs

On Tue, May 24, 2022 at 12:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Hi,

On Tue, May 24, 2022 at 11:42:35AM +0200, Mats Kindahl wrote:
>
> Typically, signal-unsafe functions should not be called from signal
> handlers. In particular, calling malloc() directly or indirectly can cause
> deadlocks, making PostgreSQL unresponsive to signals.
>
> Unless I am missing something, bgworker_die uses ereport, which indirectly
> calls printf-like functions, which are not signal-safe since they use
> malloc(). In rare cases, this can lead to deadlocks with stacks that look
> like this (from https://github.com/timescale/timescaledb/issues/4200):
>
> #0  0x00007f0e4d1040eb in __lll_lock_wait_private () from
> target:/lib/x86_64-linux-gnu/libc.so.6
> [...]
> #3  malloc (size=53)
> [...]
> #7  0x000055b9212235b1 in errmsg ()
> #8  0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at
> /build/timescaledb/src/bgw/scheduler.c:841
> #9  <signal handler called>
> [...]
> #13 free (ptr=<optimized out>)
> #14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from
> target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
> [...]

As far as I can see the problem comes from your handle_sigterm:

static void handle_sigterm(SIGNAL_ARGS)
{
        /*
         * do not use a level >= ERROR because we don't want to exit here but
         * rather only during CHECK_FOR_INTERRUPTS
         */
        ereport(LOG,
                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
                         errmsg("terminating TimescaleDB job scheduler due to administrator command")));
        die(postgres_signal_arg);
}

As mentioned in the topmost comment of elog.c, there's an escape hatch for
out of memory situations, to make sure that a small enough message can be
displayed without trying to allocate memory.  But this is only for ERROR or
higher levels.  Using an ereport(LOG, ...) level in a sigterm handler indeed
isn't safe.

Yes, and we have already fixed this in the TimescaleDB code, so this is not an issue for us (https://github.com/timescale/timescaledb/pull/4323). (We actually replace the bgworker_die with just die as signal handler.)

However, bgworker_die()---which is part of PostgreSQL and is the default signal handler for background workers---is using ereport() and I think this should be a problem for all error levels since the problem is in the usage of the formatting function to format the error message, not where you write it. This would mean that any existing background workers would have a window for triggering this issue on shutdown.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #17494: High demand for displacement sort
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: Use of signal-unsafe functions from signal handlers