Hi all,
Typically, signal-unsafe functions should not be called from signal handlers. In particular, calling malloc() directly or indirectly can cause deadlocks, making PostgreSQL unresponsive to signals.
Unless I am missing something, bgworker_die uses ereport, which indirectly calls printf-like functions, which are not signal-safe since they use malloc(). In rare cases, this can lead to deadlocks with stacks that look like this (from
https://github.com/timescale/timescaledb/issues/4200):
#0 0x00007f0e4d1040eb in __lll_lock_wait_private () from target:/lib/x86_64-linux-gnu/libc.so.6
[...]
#3 malloc (size=53)
[...]
#7 0x000055b9212235b1 in errmsg ()
#8 0x00007f0e27bf27a8 in handle_sigterm (postgres_signal_arg=15) at /build/timescaledb/src/bgw/scheduler.c:841
#9 <signal handler called>
[...]
#13 free (ptr=<optimized out>)
#14 0x00007f0e4db12cb4 in OPENSSL_LH_free () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
[...]
A simple fix for this is to introduce a signal-safe version of write_stderr and use that from bgworker_die, which the attached patch does. Am I missing something or is this a bug?
Best wishes,
Mats Kindahl (Timescale)