Re: recovery starting when backup_label exists, but not recovery.signal

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: recovery starting when backup_label exists, but not recovery.signal
Дата
Msg-id CAD21AoD-Pp7+hjKcKT5jZ10kV_53_Zw18oaOVEYYu_bdNLx5kw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: recovery starting when backup_label exists, but notrecovery.signal  (David Steele <david@pgmasters.net>)
Ответы Re: recovery starting when backup_label exists, but not recovery.signal  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
On Fri, Sep 27, 2019 at 3:36 AM David Steele <david@pgmasters.net> wrote:
>
> On 9/24/19 1:25 AM, Fujii Masao wrote:
> >
> > When backup_label exists, the startup process enters archive recovery mode
> > even if recovery.signal file doesn't exist. In this case, the startup process
> > tries to retrieve WAL files by using restore_command. Then, at the beginning
> > of the archive recovery, the contents of backup_label are copied to pg_control
> > and backup_label file is removed. This would be an intentional behavior.
>
> > But I think the problem is that, if the server shuts down during that
> > archive recovery, the restart of the server may cause the recovery to fail
> > because neither backup_label nor recovery.signal exist and the server
> > doesn't enter an archive recovery mode. Is this intentional, too? Seems No.
> >
> > So the problematic scenario is;
> >
> > 1. the server starts with backup_label, but not recovery.signal.
> > 2. the startup process enters an archive recovery mode because
> >     backup_label exists.
> > 3. the contents of backup_label are copied to pg_control and
> >     backup_label is deleted.
>
> Do you mean deleted or renamed to backup_label.old?
>
> > 4. the server shuts down..
>
> This happens after the cluster has reached consistency?
>
> > 5. the server is restarted. neither backup_label nor recovery.signal exist.
> > 6. the startup process starts just crash recovery because neither backup_label
> >     nor recovery.signal exist. Since it cannot retrieve WAL files from archival
> >     area, it may fail.
>
> I tried a few ways to reproduce this but was not successful without
> manually removing WAL.

Hmm me too. I think that since we enter crash recovery at step #6 we
don't retrieve WAL files from archival area.

But I reproduced the problem Fujii-san mentioned that the restart of
the server during archive recovery causes to the crash recovery
instead of resuming the archive recovery. Which is the different
behavior from version 11 or before and I personally think it made
behavior worse.

Regards,

--
Masahiko Sawada



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Следующее
От: Asim R P
Дата:
Сообщение: Re: Batch insert in CTAS/MatView code