Re: Crash by targetted recovery

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Crash by targetted recovery
Дата
Msg-id ab80bd8c-1a3c-4ef2-e846-28258875f37a@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Crash by targetted recovery  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Crash by targetted recovery  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers

On 2020/02/27 15:23, Kyotaro Horiguchi wrote:
> At Thu, 27 Feb 2020 14:40:55 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>
>>
>> On 2020/02/27 12:48, Kyotaro Horiguchi wrote:
>>> Hello.
>>> We found that targetted promotion can cause an assertion failure.  The
>>> attached TAP test causes that.
>>>
>>>> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12078)
>>> After recovery target is reached, StartupXLOG turns off standby mode
>>> then refetches the last record. If the last record starts from the
>>> previous WAL segment, the assertion failure is triggered.
>>
>> Good catch!
>>
>>> The wrong point is that StartupXLOG does random access fetching while
>>> WaitForWALToBecomeAvailable is thinking it is still in streaming.  I
>>> think if it is called with random access mode,
>>> WaitForWALToBecomeAvailable should move to XLOG_FROM_ARCHIVE even
>>> though it is thinking that it is still reading from stream.
>>
>> I failed to understand why random access while reading from
>> stream is bad idea. Could you elaborate why?
> 
> It seems to me the word "streaming" suggests that WAL record should be
> read sequentially. Random access, which means reading from arbitrary
> location, breaks a stream.  (But the patch doesn't try to stop wal
> sender if randAccess.)
> 
>> Isn't it sufficient to set currentSource to 0 when disabling
>> StandbyMode?
> 
> I thought that and it should work, but I hesitated to manipulate on
> currentSource in StartupXLOG. currentSource is basically a private
> state of WaitForWALToBecomeAvailable. ReadRecord modifies it but I
> think it's not good to modify it out of the the logic in
> WaitForWALToBecomeAvailable.

If so, what about adding the following at the top of
WaitForWALToBecomeAvailable()?

     if (!StandbyMode && currentSource == XLOG_FROM_STREAM)
          currentSource = 0;

> Come to think of that I got to think the
> following part in ReadRecord should use randAccess instead..
> 
> xlog.c:4384
>>      /*
> -      * Before we retry, reset lastSourceFailed and currentSource
> -      * so that we will check the archive next.
> +      * Streaming has broken, we retry from the same LSN.
>>       */
>>      lastSourceFailed = false;
> -     currentSource = 0;
> +     private->randAccess = true;

Sorry, I failed to understand why this change is necessary...
At least the comment that you added seems incorrect
because WAL streaming should not have started yet when
we reach the above point.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: [HACKERS] WAL logging problem in 9.4.3?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: reindex concurrently and two toast indexes