Re: ThisTimeLineID in checkpointer and bgwriter processes

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: ThisTimeLineID in checkpointer and bgwriter processes
Дата
Msg-id 50D41744.2020006@vmware.com
обсуждение исходный текст
Ответ на Re: ThisTimeLineID in checkpointer and bgwriter processes  (Amit Kapila <amit.kapila@huawei.com>)
Список pgsql-hackers
On 21.12.2012 08:18, Amit Kapila wrote:
> On Thursday, December 20, 2012 11:15 PM Heikki Linnakangas wrote:
>> On 20.12.2012 18:19, Fujii Masao wrote:
>>> InstallXLogFileSegment() also uses ThisTimeLineID. But your recent
>> commit
>>> doesn't take care of it and prevents the standby from recycling the
>> WAL files
>>> properly. Specifically, the standby recycles the WAL file to wrong
>> name.
>>
>> A-ha, good catch. So that's actually a live bug in 9.1 and 9.2 as well:
>> after the recovery target timeline has changed, restartpoints will
>> continue to preallocate/recycle WAL files for the old timeline. That's
>> otherwise harmless, but the useless WAL files waste space, and
>> walreceiver will have to always create new files.
>>
>> So instead of always running with ThisTimeLineID = 0 in the
>> checkpointer
>> process, I guess we'll have to update it to the timeline being
>> replayed,
>> when creating a restartpoint.
>
> Shouldn't there be a check if(RecoveryInProgress), before assigning
> RecoveryTargetTLI to ThisTimeLineID in CreateRestartPoint()?

Hmm, I don't think so. You're not supposed to get that far in 
CreateRestartPoint() if recovery has already ended, or just being ended. 
The startup process "ends recovery", ie. makes RecoveryInProgress() 
return false, only after writing the end-of-recovery checkpoint. And 
after the end-of-recovery checkpoint has been written, 
CreateRestartPoint() will do nothing, because the end-of-recovery 
checkpoint is later than the last potential restartpoint. I'm talking 
about this check in CreateRestartPoint():

>     if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
>         XLByteLE(lastCheckPoint.redo, ControlFile->checkPointCopy.redo))
>     {
>         ereport(DEBUG2,
>                 (errmsg("skipping restartpoint, already performed at %X/%X",
>                         (uint32) (lastCheckPoint.redo >> 32), (uint32) lastCheckPoint.redo)));
>         ...
>         return false;
>     }

However, there's this just before we recycle WAL segments:

>     /*
>      * Update pg_control, using current time.  Check that it still shows
>      * IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing;
>      * this is a quick hack to make sure nothing really bad happens if somehow
>      * we get here after the end-of-recovery checkpoint.
>      */
>     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>     if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
>         XLByteLT(ControlFile->checkPointCopy.redo, lastCheckPoint.redo))
>     {> ...

but I believe that "quick hack" is just paranoia. You should not get 
that far after the end-of-recovery checkpoint.

In any case, if you somehow get there anyway, the worst that will happen 
is that some old WAL segments are recycled/preallocated on the old 
timeline, wasting some space until the next checkpoint.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Brett Maton"
Дата:
Сообщение: Re: pg_top
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Review of Row Level Security