On 28.02.2011 14:04, Nikhil Sontakke wrote:
> I believe we have a case where not holding off interrupts while doing a
> malloc() can cause a deadlock due to system or libc level locking. In this
> case, a pg_ctl stop in fast mode was resorted to and that caused a backend
> to handle the interrupt when it was inside the malloc call. Now as part of
> the abort processing, in the subtransaction cleanup code path, this same
> backend tried to clear memory contexts, leading to an eventual free() call.
> The free() call tried to take the same lock which was already held by
> malloc() earlier resulting into a deadlock!
Our signal handlers shouldn't try to do anything that complicated.
die(), which handles SIGTERM caused by fast shutdown in backends,
doesn't do abort processing itself. It just sets a global variable.
Unless ImmediateInterruptOK is set, but it's only set around a few
blocking system calls where it is safe to do so. (Checks...) Actually,
md5_crypt_verify() looks suspicious, it does "ImmediateInterruptOK =
true", and then calls palloc() and pfree().
> Will try to get the call stack if needed.
Yes, please.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com