pgsql-bugs@counterstorm.com writes:
> We saw the problem with async_notify again (See thread with subject
> "Postgres 7.4.6 hang in async_notify" in first message to this list
> dated "Mon, 25 Apr 2005 15:42:35 -0400") in a production setting.
> Since our last instance, we converted to compiling postgres with
> debugging, so we have a stack trace. Looking at it, the problem
> appears at first blush like it might be pretty obvious: an ill-timed
> signal which arrives during a malloc while malloc has some
> data-structure locked, and one of the extensive operations that
> Async_NotifyHandler did probably involved getting the same lock.
So it would seem. The Async_NotifyHandler mechanism was designed at a
time when ReadCommand didn't call anything of interest except read(),
and so the assumption is that it's OK for PostgresMain to do this
(oversimplified a bit):
EnableNotifyInterrupt();
firstchar = ReadCommand(&input_message);
DisableNotifyInterrupt();
Clearly, if SSL is going to be messing about with malloc() then this
assumption is no longer safe. Looking at the code, I think we have
introduced some other risks of the same ilk ourselves, but SSL is
doubtless the largest variable. This probably explains a number of
other irreproducible failures besides your hangup :-(
I think we're going to have to push the enable/disable interrupt
operations down closer to the actual read(). This doesn't seem to
be any big deal for the non-SSL case, but it's not clear to me what
we have to do to get between SSL and the socket. Anyone know offhand?
> For the record, while this postgres should be (of two) generating
> notifies out of triggers, we do not believe it should be listening for
> any, and indeed examination of pg_listener suggests it does not.
Doesn't matter --- 7.4 uses the same mechanism for SI messaging catchup
interrupts. A backend that sits idle long enough *will* get one of
these interrupts. Apparently you've managed to set up a situation where
the client starts doing something after just-the-right-delay with better
than nil probability.
regards, tom lane