Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

Поиск
Список
Период
Сортировка
От Hiroshi Inoue
Тема Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea
Дата
Msg-id 3A626702.7DD48F11@tpf.co.jp
обсуждение исходный текст
Ответ на RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Список pgsql-hackers
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> >>>> I've thought that the main purpose of CRIT_SECTION is to
> >>>> force redo recovery for any errors during the CRIT_SECTION
> >>>> to complete the critical operation e.g. bt_split().
> >>
> >> How could it force redo?
> 
> > Doesn't proc_exit(non-zero) force shuttdown recovery ?
> 
> It forces a shutdown and restart, but that does not do anything good
> that I can see.  The WAL log entry hasn't been made, typically, so there
> is nothing to redo.  If there *were* a log entry, and the redo failed
> again (pretty likely), then we'd have an infinite crash/try to
> restart/crash cycle, which is just about the worst possible behavior.
> So I'm not seeing what the point is.
> 

It seems a nature of 7.1 recovery scheme.
Once a WAL log entry is made, recovery should 
complete the log in regardless of the cause of
recovery(elog, system error like SEGV etc).

I've wondered why no one has asked how we could
recover from a recovery failure. Unfortunately,
I don't know the answer. Recovery failure seems
veeeeery serious because postmaster couldn't
start if the startup recovery fails.
In addtion I have another anxiety. I don't know
how robust WAL is against general bugs not
directly related to WAL.

Regards.
Hiroshi Inoue


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: copy from stdin; bug?
Следующее
От: Rehak Tamas
Дата:
Сообщение: Re: copy from stdin; bug?