Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery
Дата
Msg-id CAB7nPqS6iRmp-zL-W6DxbekJ89Q5ifrWecD-Q9EVev_bqg6SGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] pg_rewind - restore new slave failed to startup during recovery  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-general
On Tue, Aug 22, 2017 at 11:39 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Aug 22, 2017 at 3:06 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>> That flow looks correct to me. No I think that you should trigger
>> manually a checkpoint after step 2 on the promoted standby so as its
>> control file gets forcibly updated correctly with its new timeline
>> number. This is a small but critical point people usually miss. The
>> documentation of pg_rewind does not mention this point when using a
>> live source server, and many people have fallen into this trap up to
>> now... We should really mention that in the docs. What do others
>> think?
>
> If the documentation is missing such a clearly critical step, then I would
> say that's a definite documentation bug and it needs to be fixed. We can't
> really fault people for missing a small detail if we didn't document the
> small detail...

What do you think about the attached? I would recommend a back-patch
down to 9.5 to get the documentation right everywhere but I think as
well that this may not be enough. We could document as well an example
of a full-fledged failover flow in the Notes, in short:
1) Promote a standby.
2) Stop the old master cleanly. If it has been killed atrociously,
make it finish recovery once and then stop it so as its WAL data is
ahead of the point WAL has fork after the promotion (shutdown
checkpoint record is at least here).
3) Prepare source server for the rewind.
3-1) Using file copy, stop the source server (promoted standby) cleanly first.
3-2) Using SQL, issue a checkpoint on the source server to update its
control file and making sure that the timeline number is up-do-date on
disk.
4) Perform the actual rewind. This will need WAL segments on the
target from the point WAL has forked to the shutdown checkpoint record
created at step 2).
5) Create recovery.conf on the target and point it to the source for
streaming, or archives. Then let it perform recovery.
--
Michael

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Igor Korot
Дата:
Сообщение: Re: [GENERAL] Retrieving query results
Следующее
От: Condor
Дата:
Сообщение: Re: [GENERAL] PG and database encryption