Re: [HACKERS] postmaster disappears

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: [HACKERS] postmaster disappears
Дата
Msg-id 199909220449.NAA26668@srapc451.sra.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] postmaster disappears  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] postmaster disappears  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
>> Not sure. reaper() may be called while reaper() is executing if a new
>> SIGCHLD is raised. How do you handle this case?
>
>No, because the signal is disabled when the trap is taken, and then not
>re-enabled until reaper() does pqsignal() just before exiting.  We don't

You are correct. I had wrong impression about signal handling.

>>> Moreover, you're not actually checking what the select() did unless
>>> you do it that way.
>
>> Sorry, I don't understand this. Can you explain, please?
>
>If you don't have the signal routine save/restore errno, then (when this
>problem occurs) you are not seeing the errno returned by the select(),
>but one left over from reaper()'s activity.  If the select() failed, you
>won't know it.

Oh, I see your point.

>>> Curious that this sort of problem is not seen more often --- I wonder
>>> if most Unixes arrange to save/restore errno around a signal handler
>>> for you?
>
>> Maybe because the situation I have pointed out is relatively rare.
>
>Well, the window for trouble is awfully tiny in this particular code of
>ours, but it might be larger in other programs.

Though it seems rare, we certainly have had this kind of reports from
users for a while. Since disappearing postmaster is a really bad
thing, I love to see solutions for this.

>Yet I don't think I've
>ever heard a programming recommendation to save/restore errno in signal
>handlers...

Agreed. I don't like this way.

I asked a Unix guru, and got a suggestion that we do not need to call
wait() (and CleanupProc()) inside the signal handler. Instead we could
have a null signal hander (it just calls pqsignal()) for SIGCHLD.  If
select() returns EINTR then we just call wait() and
CleanupProc(). Moreover this would eliminate sigprocmask() or
sigblock() calls currently done to avoid race conditions before going
into the critical region. Of course we have to call wait() and
CleanupProc() before select() to make sure that we have no waiting
children.

Another way would be blocking SIGCHILD before calling select(). In
this case appropriate time out setting for select() is necessary,
though.
--
Tatsuo Ishii


В списке pgsql-hackers по дате отправления:

Предыдущее
От: frankpit@pop.dn.net
Дата:
Сообщение: Re: [HACKERS] Early evaluation of constant expresions (with PATCH)
Следующее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] Re: [GENERAL] Update of bitmask type