Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery
Дата	23 августа 2017 г. 07:53:08
Msg-id	CAB7nPqS6iRmp-zL-W6DxbekJ89Q5ifrWecD-Q9EVev_bqg6SGQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery (Magnus Hagander <magnus@hagander.net>)
Список	pgsql-general

Дерево обсуждения

On Tue, Aug 22, 2017 at 11:39 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Aug 22, 2017 at 3:06 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>> That flow looks correct to me. No I think that you should trigger
>> manually a checkpoint after step 2 on the promoted standby so as its
>> control file gets forcibly updated correctly with its new timeline
>> number. This is a small but critical point people usually miss. The
>> documentation of pg_rewind does not mention this point when using a
>> live source server, and many people have fallen into this trap up to
>> now... We should really mention that in the docs. What do others
>> think?
>
> If the documentation is missing such a clearly critical step, then I would
> say that's a definite documentation bug and it needs to be fixed. We can't
> really fault people for missing a small detail if we didn't document the
> small detail...

What do you think about the attached? I would recommend a back-patch
down to 9.5 to get the documentation right everywhere but I think as
well that this may not be enough. We could document as well an example
of a full-fledged failover flow in the Notes, in short:
1) Promote a standby.
2) Stop the old master cleanly. If it has been killed atrociously,
make it finish recovery once and then stop it so as its WAL data is
ahead of the point WAL has fork after the promotion (shutdown
checkpoint record is at least here).
3) Prepare source server for the rewind.
3-1) Using file copy, stop the source server (promoted standby) cleanly first.
3-2) Using SQL, issue a checkpoint on the source server to update its
control file and making sure that the timeline number is up-do-date on
disk.
4) Perform the actual rewind. This will need WAL segments on the
target from the point WAL has forked to the shutdown checkpoint record
created at step 2).
5) Create recovery.conf on the target and point it to the source for
streaming, or archives. Then let it perform recovery.
--
Michael

Вложения

rewind-checkpoint-doc.patch

В списке pgsql-general по дате отправления:

Предыдущее

От: Igor Korot
Дата: 23 августа 2017 г., 07:04:05
Сообщение: Re: [GENERAL] Retrieving query results

Следующее

От: Condor
Дата: 23 августа 2017 г., 11:09:49
Сообщение: Re: [GENERAL] PG and database encryption

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery

Вложения

Предыдущее

Следующее