Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Дата
Msg-id CAHGQGwFngo2F9FvRQ1vYrRbNn5xaUwp=zL773_UNCEVWBi_8pQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-bugs
On Tue, Jun 16, 2015 at 4:42 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Jun 12, 2015 at 2:43 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sat, Jun 6, 2015 at 8:36 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Jun 5, 2015 at 11:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> On Fri, Jun 5, 2015 at 3:01 PM, Michael Paquier
>>>> <michael.paquier@gmail.com> wrote:
>>>>>
>>>>>
>>>>> On Wed, Jun 3, 2015 at 1:04 AM, Fujii Masao wrote:
>>>>>>
>>>>>> On Mon, Jun 1, 2015 at 5:19 PM, Michael Paquier
>>>>>> >> Some testing shows us that in some cases, when pg_ctl promote is called
>>>>>> >> multiple
>>>>>> >> times, a promote file is left in the PGDATA directory, even though the
>>>>>> >> cluster
>>>>>> >> has been succesfully promoted and is accepting read/write queries.
>>>>>> >
>>>>>> > This is not surprising, pg_ctl bases its analysis that a node needs to
>>>>>> > be promoted if recovery.conf exists or not, and there is an interval
>>>>>> > of time between which recovery.conf is removed and the promotion is
>>>>>> > actually triggered, so you can create a promote file even after even
>>>>>> > sending SIGUSR1 to the standby's postmaster
>>>>>> >
>>>>>> >> We will try to workaround this issue by ensuring we do not send
>>>>>> >> multiple
>>>>>> >> promote request using pg_ctl to the same cluster.
>>>>>> >
>>>>>> > Well, we could for example have the server switch promote to
>>>>>> > promote_done in CheckForStandbyTrigger() and then unlink it when
>>>>>> > recovery.conf is switched to .done. Opinions are welcome on the
>>>>>> > matter.
>>>>>>
>>>>>> Or we can just always remove the signal file at the end of recovery.
>>>>>> That filename switch seems unnecessary.
>>>>>
>>>>>
>>>>> Well, by doing so, in the event of a crash during recovery the promote
>>>>> signal file would be present in PGDATA, and this would enforce a promotion
>>>>> at the next startup of the node. I don't think that this is a good idea. In
>>>>> the case of a promoted node crash a user may want to look at his node back
>>>>> in a recovery state.
>>>>
>>>> You meant the case of crash which occurs before CheckForStandbyTrigger()
>>>> removes the signal file after pg_ctl promote is executed? If yes, even if
>>>> we rename the file to the intermediate one, the signal file would remain.
>>>>
>>>> If we want to address the above corner case, we can additionally remove
>>>> the file always at the beginning of recovery. This idea can completely avoid
>>>> an unexpected promotion by the surviving signal file.
>>>
>>> Then what about the case where a promote file is let by user on
>>> purpose to trigger a promotion on restart?
>>
>> Sounds like non-realistic case. If the promote file exists at
>> the beginning of recovery, the recovery might try to end before
>> reaching the consistency point, and then it would fail.
>> Why do people want to make the recovery fail in that way?
>
> Hm. OK.
>
>>>>> Also, this intermediate promote file, let's say promote.detected, would be
>>>>> useful for external tools to let them know that the promotion has been
>>>>> acknowledged (you can already know it if your tool knows that a promote has
>>>>> been triggered, that promote has been removed by the server and if
>>>>> recovery.conf is still present). That's not something you would want on back
>>>>> branches btw as this changes how promotion behaves seen from an external
>>>>> point of view. But that would be a patch simple enough (got a WIP for people
>>>>> wondering).
>>>>>
>>>>> An open question would be what to do with pg_ctl promote if a promote file
>>>>> already exists. I think that we should ignore the creation of the promote
>>>>> file but still kick the signal SIGUSR1.
>>>>>
>>>>>>
>>>>>> In addition to that change, we should make pg_basebackup skip
>>>>>> the signal file?
>>>>>
>>>>>
>>>>> Well, yes, and it we would be just fine for the case reported by Feike to
>>>>> just ignore promote and fallback_promote in a base backup, as the problem
>>>>> reported was about a standby that contained the signal promote file after
>>>>> pg_basebackup. And I think that we would be fine by doing that as well in
>>>>> the back-branches. trigger_file is not exposed out of xlog.c in the startup
>>>>> process, but I can live with the fact that it is not ignored.
>>>>>
>>>>> In short, I guess that the patch attached would be fine.
>>>>> Opinions?
>>>>
>>>> I have no strong objection to that change, but it seems half-baked.
>>>> That is, that idea doesn't address the case where a base backup is
>>>> taken by other than pg_basebackup at all.
>>>
>>> That's the same problem with for example postmaster.pid,
>>> postmaster.opts or similar when taking a FS-level backup.
>>
>> Yes, but even if such bogus postmaster.pid exists in $PGDATA,
>> the server can handle it and restart almost harmlessly. But
>> the promote file not.
>
> Then what about an idea like the attached.

Thanks for the patch!

> At the beginning of
> StartupXLOG() before parsing recovery.conf we could check for the
> existence of promotion trigger files and unlink them unconditionally.

There seems to be still race condition: postmaster can receive SIGUSR1
before the startup process removes the promotion trigger file. Then
the variable promote_triggered can be set to true unexpectedly.

So, what about making postmaster remove the trigger file unconditionally
before the startup process is forked, instead? For example, just after
PostmasterMain() calls RemovePgTempFiles()?

Regards,

--
Fujii Masao

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: BUG #13126: table constraint loses its comment
Следующее
От: joao.antunes@tagus.ist.utl.pt
Дата:
Сообщение: BUG #13480: RE: #11732 and also possibly #12538 - Possible serializable isolation lepessimistic locking problem