Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea
Дата
Msg-id 23807.979344113@sss.pgh.pa.us
обсуждение исходный текст
Ответ на RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea  ("Mikheev, Vadim" <vmikheev@SECTORBASE.COM>)
Список pgsql-hackers
>> I think we'd be lots better off to abandon the notion that we can exit
>> directly from the SIGTERM interrupt handler, and instead treat SIGTERM
>> the same way we treat QueryCancel: set a flag that is inspected at
>> specific places where we know we are in a good state.
>> 
>> Comments?

> This will be much cleaner.

OK, here's a more detailed proposal.

We'll eliminate elog() directly from the signal handlers, except in the
extremely limited case where the main line is waiting for a lock (the
existing "cancel wait for lock" mechanism for QueryCancel can be used
for SIGTERM as well, I think).  Otherwise, both die() and
QueryCancelHandler() will just set flags and return.  handle_warn()
(the SIGQUIT handler) should probably go away entirely; it does nothing
that's not done better by QueryCancel.  I'm inclined to make SIGQUIT
invoke die() the same as SIGTERM does, unless someone has a better idea
what to do with that signal.

I believe we should keep the "critical section count" mechanism that
Vadim already created, even though it is no longer needed to discourage
the signal handlers themselves from aborting processing.  With the
critical section counter, it is OK to have flag checks inside subroutines
that are safe to abort from in some contexts and not others --- the
contexts that don't want an abort just have to do START/END_CRIT_SECTION
to ensure that QueryCancel/ProcDie won't happen in routines they call.
So the basic flag checks are like "if (InterruptFlagSet &&
CritSectionCounter == 0) elog(...);".

Having done that, the $64 question is where to test the flags.

Obviously we can check for interrupts at all the places where
QueryCancel is tested now, which is primarily the outer loops.
I suggest we also check for interrupts in s_lock's wait loop (ie, we can
cancel/die if we haven't yet got the lock AND we are not in a crit
section), as well as in END_CRIT_SECTION.

I intend to devise a macro CHECK_FOR_INTERRUPTS() that embodies
all the test code, rather than duplicating these if-tests in
many places.

Note I am assuming that it's always reasonable to check for QueryCancel
and ProcDie at the same places.  I do not see any hole in that theory,
but if someone finds one, we could introduce additional flags/counters
to distinguish safe-to-cancel from safe-to-die states.

Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hiroshi Inoue"
Дата:
Сообщение: RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea
Следующее
От: Tom Lane
Дата:
Сообщение: Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea