Re: recovery starting when backup_label exists, but notrecovery.signal

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: recovery starting when backup_label exists, but notrecovery.signal
Дата
Msg-id 49b8f446-06ed-8e91-5dd6-fa4dfee1ee83@pgmasters.net
обсуждение исходный текст
Ответ на recovery starting when backup_label exists, but not recovery.signal  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: recovery starting when backup_label exists, but not recovery.signal  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: recovery starting when backup_label exists, but not recovery.signal  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
On 9/24/19 1:25 AM, Fujii Masao wrote:
> 
> When backup_label exists, the startup process enters archive recovery mode
> even if recovery.signal file doesn't exist. In this case, the startup process
> tries to retrieve WAL files by using restore_command. Then, at the beginning
> of the archive recovery, the contents of backup_label are copied to pg_control
> and backup_label file is removed. This would be an intentional behavior.

> But I think the problem is that, if the server shuts down during that
> archive recovery, the restart of the server may cause the recovery to fail
> because neither backup_label nor recovery.signal exist and the server
> doesn't enter an archive recovery mode. Is this intentional, too? Seems No.
> 
> So the problematic scenario is;
> 
> 1. the server starts with backup_label, but not recovery.signal.
> 2. the startup process enters an archive recovery mode because
>     backup_label exists.
> 3. the contents of backup_label are copied to pg_control and
>     backup_label is deleted.

Do you mean deleted or renamed to backup_label.old?

> 4. the server shuts down..

This happens after the cluster has reached consistency?

> 5. the server is restarted. neither backup_label nor recovery.signal exist.
> 6. the startup process starts just crash recovery because neither backup_label
>     nor recovery.signal exist. Since it cannot retrieve WAL files from archival
>     area, it may fail.

I tried a few ways to reproduce this but was not successful without
manually removing WAL.  Probably I just needed a much larger set of WAL.

I assume you have a repro?  Can you give more details?

> One idea to fix this issue is to make the above step #3 remember that
> backup_label existed, in pg_control. Then we should make the subsequent
> recovery enter an archive recovery mode if pg_control indicates that
> even if neither backup_label nor recovery.signal exist. Thought?

That seems pretty invasive to me at this stage.  I'd like to reproduce
it and see if there are alternatives.

Also, are you sure this is a new behavior?  I've been finding that some
behaviors that have existed for a long time are suddenly more apparent
or easier to hit with the new mechanism.  Examples of that are in [1].

-- 
-David
david@pgmasters.net

[1]
https://www.postgresql.org/message-id/5e6537c7-d10e-6a67-4813-bbd7455cfaf5%40pgmasters.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: Allow to_date() and to_timestamp() to accept localized names
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions