Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Дата
Msg-id CAB7nPqRFfP_sVDKWfSkg9rywFZs+Kq6D2hRVK2iOWaZn8FYjTw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-bugs
On Fri, Jun 5, 2015 at 11:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jun 5, 2015 at 3:01 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>
>>
>> On Wed, Jun 3, 2015 at 1:04 AM, Fujii Masao wrote:
>>>
>>> On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
>>> >> Some testing shows us that in some cases, when pg_ctl promote is called
>>> >> multiple
>>> >> times, a promote file is left in the PGDATA directory, even though the
>>> >> cluster
>>> >> has been succesfully promoted and is accepting read/write queries.
>>> >
>>> > This is not surprising, pg_ctl bases its analysis that a node needs to
>>> > be promoted if recovery.conf exists or not, and there is an interval
>>> > of time between which recovery.conf is removed and the promotion is
>>> > actually triggered, so you can create a promote file even after even
>>> > sending SIGUSR1 to the standby's postmaster
>>> >
>>> >> We will try to workaround this issue by ensuring we do not send
>>> >> multiple
>>> >> promote request using pg_ctl to the same cluster.
>>> >
>>> > Well, we could for example have the server switch promote to
>>> > promote_done in CheckForStandbyTrigger() and then unlink it when
>>> > recovery.conf is switched to .done. Opinions are welcome on the
>>> > matter.
>>>
>>> Or we can just always remove the signal file at the end of recovery.
>>> That filename switch seems unnecessary.
>>
>>
>> Well, by doing so, in the event of a crash during recovery the promote
>> signal file would be present in PGDATA, and this would enforce a promotion
>> at the next startup of the node. I don't think that this is a good idea. In
>> the case of a promoted node crash a user may want to look at his node back
>> in a recovery state.
>
> You meant the case of crash which occurs before CheckForStandbyTrigger()
> removes the signal file after pg_ctl promote is executed? If yes, even if
> we rename the file to the intermediate one, the signal file would remain.
>
> If we want to address the above corner case, we can additionally remove
> the file always at the beginning of recovery. This idea can completely avoid
> an unexpected promotion by the surviving signal file.

Then what about the case where a promote file is let by user on
purpose to trigger a promotion on restart?

>> Also, this intermediate promote file, let's say promote.detected, would be
>> useful for external tools to let them know that the promotion has been
>> acknoledged (you can already know it if your tool knows that a promote has
>> been triggered, that promote has been removed by the server and if
>> recovery.conf is still present). That's not something you would want on back
>> branches btw as this changes how promotion bevahes seen from an external
>> point of view. But that would be a patch simple enough (got a WIP for people
>> wondering).
>>
>> An open question would be what to do with pg_ctl promote if a promote file
>> already exists. I think that we should ignore the creation of the promote
>> file but still kick the signal SIGUSR1.
>>
>>>
>>> In addition to that change, we should make pg_basebackup skip
>>> the signal file?
>>
>>
>> Well, yes, and it we would be just fine for the case reported by Feike to
>> just ignore promote and fallback_promote in a base backup, as the problem
>> reported was about a standby that contained the signal promote file after
>> pg_basebackup. And I think that we would be fine by doing that as well in
>> the back-branches. trigger_file is not exposed out of xlog.c in the startup
>> process, but I can live with the fact that it is not ignored.
>>
>> In short, I guess that the patch attached would be fine.
>> Opinions?
>
> I have no strong objection to that change, but it seems half-baked.
> That is, that idea doesn't address the case where a base backup is
> taken by other than pg_basebackup at all.

That's the same problem with for example postmaster.pid,
postmaster.opts or similar when taking a FS-level backup.
--
Michael

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #13404: Docs do not mention "access/htup_details.h" for C functions using heap_form_tuple
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #13400: Unable to connect postgresql using remote machine