Re: Postgres 7.4.7 hang in async_notify

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Postgres 7.4.7 hang in async_notify
Дата
Msg-id 26109.1117736485@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Postgres 7.4.7 hang in async_notify  (pgsql-bugs@counterstorm.com)
Список pgsql-bugs
pgsql-bugs@counterstorm.com writes:
> We saw the problem with async_notify again (See thread with subject
> "Postgres 7.4.6 hang in async_notify" in first message to this list
> dated "Mon, 25 Apr 2005 15:42:35 -0400") in a production setting.
> Since our last instance, we converted to compiling postgres with
> debugging, so we have a stack trace.  Looking at it, the problem
> appears at first blush like it might be pretty obvious: an ill-timed
> signal which arrives during a malloc while malloc has some
> data-structure locked, and one of the extensive operations that
> Async_NotifyHandler did probably involved getting the same lock.

So it would seem.  The Async_NotifyHandler mechanism was designed at a
time when ReadCommand didn't call anything of interest except read(),
and so the assumption is that it's OK for PostgresMain to do this
(oversimplified a bit):

        EnableNotifyInterrupt();

        firstchar = ReadCommand(&input_message);

        DisableNotifyInterrupt();

Clearly, if SSL is going to be messing about with malloc() then this
assumption is no longer safe.  Looking at the code, I think we have
introduced some other risks of the same ilk ourselves, but SSL is
doubtless the largest variable.  This probably explains a number of
other irreproducible failures besides your hangup :-(

I think we're going to have to push the enable/disable interrupt
operations down closer to the actual read().  This doesn't seem to
be any big deal for the non-SSL case, but it's not clear to me what
we have to do to get between SSL and the socket.  Anyone know offhand?

> For the record, while this postgres should be (of two) generating
> notifies out of triggers, we do not believe it should be listening for
> any, and indeed examination of pg_listener suggests it does not.

Doesn't matter --- 7.4 uses the same mechanism for SI messaging catchup
interrupts.  A backend that sits idle long enough *will* get one of
these interrupts.  Apparently you've managed to set up a situation where
the client starts doing something after just-the-right-delay with better
than nil probability.

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: pgsql-bugs@counterstorm.com
Дата:
Сообщение: Postgres 7.4.7 hang in async_notify
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Postgres 7.4.7 hang in async_notify