Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done
Дата
Msg-id A950518B-4116-492B-8773-C9A5CE1620AF@yandex-team.ru
обсуждение исходный текст
Список pgsql-hackers

> On 21 Jan 2025, at 16:47, Roman Eskin <r.eskin@arenadata.io> wrote:
>
>>
>> Persisting recovery signal file for some _timeout_ seems super dangerous to me. In distributed systems every extra
_timeout_is a source of complexity, uncertainty and despair. 
>
> The approach is not about persisting the signal files for some timeout. Currently the files are removed in
StartupXLOG()before writeTimeLineHistory() and PerformRecoveryXLogAction() are called. The suggestion is to move the
fileremoval after PerformRecoveryXLogAction() inside StartupXLOG(). 

Sending node to repeated promote-fail cycle without resolving root cause seems like even less appealing idea.
If something prevented promotion, why we should retry by this particular method?

Even in case of transient failure which you described - power loss - it does not sound like a very good idea to retry
promotionafter returning online. The user will get unexpected splitbrain. 


Best regards, Andrey Borodin.


В списке pgsql-hackers по дате отправления: