Re: Unintended restart after recovery error

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Unintended restart after recovery error
Дата
Msg-id CA+TgmoYi7DwEP+EhaMW-sYfNLu2B0Bh-yz1PeWkNV2s7_0w8bA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unintended restart after recovery error  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
On Thu, Nov 13, 2014 at 10:59 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule
> for hot-standby case. Maybe *during crash recovery* (i.e., hot standby
> should not be enabled) it's better to treat the crash of startup process as
> a catastrophic crash.

Maybe, but why, specifically?  If the startup process failed
internally, it's probably because it hit an error during the replay of
some WAL record.  So if we restart it, it will back up to the previous
checkpoint or restartpoint, replay the same WAL records as before, and
die again in the same spot.  We don't want it to sit there and do that
forever in an infinite loop, so it makes sense to kill the whole
server.

But if the startup process was killed off because the checkpointer
croaked, that logic doesn't necessarily apply.  There's no reason to
assume that the replay of a particular WAL record was what killed the
checkpointer; in fact, it seems like the odds are against it.  So it
seems right to fall back to our general principle of restarting the
server and hoping that's enough to get things back on line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: BRIN page type identifier
Следующее
От: Robert Haas
Дата:
Сообщение: Re: using custom scan nodes to prototype parallel sequential scan