Re: Archiver not exiting upon crash

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Archiver not exiting upon crash
Дата
Msg-id 10932.1337803835@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Archiver not exiting upon crash  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: Archiver not exiting upon crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Archiver not exiting upon crash  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
Jeff Janes <jeff.janes@gmail.com> writes:
> It looks to me like the SIGQUIT from the postmaster is simply getting
> lost.  And from what little I understand of signal handling, this is a
> known race with system(3).  The archive_command, child of archiver,
> exits before it can receive the signal sent to the entire archiver
> process group, so it doesn't set its exit status to show it was
> signalled.  But the signal sent directly to the archiver reaches it
> while it is still ignoring SIGQUITs.

Ugh.

> If the SIGQUIT is getting lost in a race, could it just be blocked
> during the system(3) call?
> I don't know what happens if you call system(3) with SIGQUIT being blocked.

On my machine, man system(3) saith:
    system() ignores the SIGINT and SIGQUIT signals, and blocks the    SIGCHLD signal, while waiting for the command to
terminate. If this    might cause the application to miss a signal that would have killed    it, the application should
examinethe return value from system() and    take whatever action is appropriate to the application if the command
terminateddue to receipt of a signal.
 

Now, the code that directly calls system(), namely pgarch_archiveXlog(),
knows this perfectly well, as per the comment at lines 590ff in HEAD.
However, the code that *calls* it did not get the memo :-(, and appears
to be willing to retry regardless.

> Or maybe the postmaster should not be infinitely patient, but send
> another round of signals after a brief delay.

If the first one was ignored, later ones might be too.

I'm inclined to think that we should change pgarch_archiveXlog to
detect these specific signal conditions and just directly exit(),
rather than giving its caller a chance to blow the decision.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kohei KaiGai
Дата:
Сообщение: Re: [RFC] Interface of Row Level Security
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [RFC] Interface of Row Level Security