Re: [BUG?] lag of minRecoveryPont in archive recovery

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: [BUG?] lag of minRecoveryPont in archive recovery
Дата
Msg-id 50C7612E.5060008@vmware.com
обсуждение исходный текст
Ответ на Re: [BUG?] lag of minRecoveryPont in archive recovery  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
On 11.12.2012 17:04, Fujii Masao wrote:
> On Tue, Dec 11, 2012 at 6:14 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> On 11.12.2012 08:07, Kyotaro HORIGUCHI wrote:
>>>>
>>>> The cause is that CheckRecoveryConsistency() is called before rm_redo(),
>>>> as Horiguchi-san suggested upthead. Imagine the case where
>>>> minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
>>>> record. When restarting the server with that minRecoveryPoint,
>>>> the followings would happen, and then PANIC occurs.
>>>>
>>>> 1. XLOG_SMGR_TRUNCATE record is read.
>>>> 2. CheckRecoveryConsistency() is called, and database is marked as
>>>>       consistent since we've reached minRecoveryPoint (i.e., the location
>>>>       of XLOG_SMGR_TRUNCATE).
>>>> 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
>>>>       found.
>>>> 4. Since the database has already been marked as consistent, an invalid
>>>>       page leads to PANIC.
>>>
>>>
>>> Exactly.
>>>
>>> In smgr_redo, EndRecPtr which is pointing the record next to
>>> SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other
>>> hand, during the second startup of the standby,
>>> CheckRecoveryConsistency checks for consistency by
>>> XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at
>>> just BEFORE the SMGR_TRUNCATE record is applied. So
>>> reachedConsistency becomes true just before the SMGR_TRUNCATE
>>> record will be applied. Bang!
>>
>>
>> Ah, I see. I thought moving CheckRecoveryConsistency was just a nice-to-have
>> thing, so that we'd notice that we're consistent earlier, so didn't pay
>> attention to that part.
>>
>> I think the real issue here is that CheckRecoveryConsistency should not be
>> comparing minRecoveryPoint against recoveryLastRecPtr, not EndRecPtr as it
>> currently does. EndRecPtr points to the end of the last record *read*, while
>> recoveryLastRecPtr points to the end of the last record *replayed*. The
>> latter is what CheckRecoveryConsistency should be concerned with.
>>
>> BTW, it occurs to me that we have two variables in the shared struct that
>> are almost the same thing: recoveryLastRecPtr and replayEndRecPtr. The
>> former points to the end of the last record successfully replayed, while
>> replayEndRecPtr points to the end of the last record successfully replayed,
>> or begin replayed at the moment. I find that confusing, so I suggest that we
>> rename recoveryLastRecPtr to make that more clear. Included in the attached
>> patch.
>
> The patch looks good. I confirmed that the PANIC error didn't happen,
> with the patch.

Thanks. After thinking about this more, I think it's still not quite 
right. Now that we fix the check in CheckRecoveryConsistency, we have to 
move the call to CheckRecoveryConsistency to after the rm_redo call. 
Otherwise you don't enter hot standby mode after replaying the last 
record, the one ending at minRecoveryPoint. You have to wait for one 
more record to be read (but not replayed), so that 
CheckRecoveryConsistency gets called and we let backends in.

As the code stands, the bad placement of CheckRecoveryConsistency is 
compensated by the bug that we open up for hot standby one record too 
early. The net result is that we open up for hot standby just before 
replaying the last record that makes us consistent. The window for 
someone to see the inconsistent state is really small, postmaster will 
have to receive the signal, start accepting connections, and you have 
connect and run a query, all before the startup process gets around to 
replay the record its already read into memory.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Re: [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [BUG?] lag of minRecoveryPont in archive recovery