Re: Hot standby, recovery infra

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Hot standby, recovery infra
Дата
Msg-id 49A6E1A9.5020901@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Hot standby, recovery infra  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: Hot standby, recovery infra
Re: Hot standby, recovery infra
Список pgsql-hackers
Fujii Masao wrote:
> On Fri, Jan 30, 2009 at 7:47 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> That whole area was something I was leaving until last, since immediate
>> shutdown doesn't work either, even in HEAD. (Fujii-san and I discussed
>> this before Christmas, briefly).
> 
> This problem remains in current HEAD. I mean, immediate shutdown
> may be unable to kill the startup process because system() which
> executes restore_command ignores SIGQUIT while waiting.
> When I tried immediate shutdown during recovery, only the startup
> process survived. This is undesirable behavior, I think.

Yeah, we need to fix that.

> The following code should be added into RestoreArchivedFile()?
> 
> ----
> if (WTERMSIG(rc) == SIGQUIT)
>        exit(2);
> ----

I don't see how that helps, as we already have this in there:
signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
ereport(signaled ? FATAL : DEBUG2,    (errmsg("could not restore file \"%s\" from archive: return code %d",
xlogfname,rc)));
 

which means we already ereport(FATAL) if the restore command dies with 
SIGQUIT.

I think the real problem here is that pg_standby traps SIGQUIT. The 
startup process doesn't receive the SIGQUIT because it's in system(), 
and pg_standby doesn't propagate it to the startup process either 
because it traps it.

I think we should simply remove the signal handler for SIGQUIT from 
pg_standby. Or will that lead to core dump by default? In that case, we 
need pg_standby to exit(128) or similar, so that RestoreArchivedFile 
understands that the command was killed by a signal.

Another approach is to check that the postmaster is still alive, like we  do in walwriter and bgwriter:
    /*     * Emergency bailout if postmaster has died.  This is to avoid the     * necessity for manual cleanup of all
postmasterchildren.     */    if (!PostmasterIsAlive(true))        exit(1);
 

However, I'm afraid there's a race condition with that. If we do that 
right after system(), postmaster might've signaled us but not exited 
yet. We could check that in the main loop, but if we wrongly interpret 
the exit of the recovery command as a "file not found - go ahead and 
start up", the damage might be done by the time we notice that the 
postmaster is gone.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: xpath processing brain dead
Следующее
От: Robert Haas
Дата:
Сообщение: Re: xpath processing brain dead