Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Дата
Msg-id CAB7nPqTzr3fySUdTNmcOUQxAJk7m7V9eOXqfcvBYvYoGiErsUg@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (feikesteenbergen@gmail.com)
Ответы Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-bugs
On Thu, May 28, 2015 at 7:07 PM,  <feikesteenbergen@gmail.com> wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13368
> Logged by:          Feike Steenbergen
> Email address:      feikesteenbergen@gmail.com
> PostgreSQL version: 9.4.2
> Operating system:   Debian 8.0 x86_64
> Description:
>
> We sometimes see a standby server promoting itself to master immediately.
>
> Analysis shows us that the master still has a promote file in the PGDATA
> directory. We assume the presence of the promote file (which is copied
> by pg_basebackup) is triggering the promotion.

If there is a promote file in PGDATA when a standby starts up,
promotion will be triggered.

> The master itself previously was a standby server. The promotion was done
> using pg_ctl promote. Analysis of our logs show that we sent pg_ctl promote
> twice to this cluster, this also is reflected in the server log,
> "server promoting" shows up twice.

In this case promotion is triggered by CheckForStandbyTrigger(), where
the promote file is unlinked.

> Some testing shows us that in some cases, when pg_ctl promote is called
> multiple
> times, a promote file is left in the PGDATA directory, even though the
> cluster
> has been succesfully promoted and is accepting read/write queries.

This is not surprising, pg_ctl bases its analysis that a node needs to
be promoted if recovery.conf exists or not, and there is an interval
of time between which recovery.conf is removed and the promotion is
actually triggered, so you can create a promote file even after even
sending SIGUSR1 to the standby's postmaster

> We will try to workaround this issue by ensuring we do not send multiple
> promote request using pg_ctl to the same cluster.

Well, we could for example have the server switch promote to
promote_done in CheckForStandbyTrigger() and then unlink it when
recovery.conf is switched to .done. Opinions are welcome on the
matter.
--
Michael

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Sandeep Thakkar
Дата:
Сообщение: Re: BUG #13379: error installing
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: contribcheck and modulescheck of MSVC's vcregress.pl cannot work independently