Re: [BUG] Archive recovery failure on 9.3+.

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: [BUG] Archive recovery failure on 9.3+.
Дата
Msg-id 52CEE7FC.70702@vmware.com
обсуждение исходный текст
Ответ на [BUG] Archive recovery failure on 9.3+.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: [BUG] Archive recovery failure on 9.3+.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On 12/12/2013 04:00 AM, Kyotaro HORIGUCHI wrote:
> Hello, we happened to see server crash on archive recovery under
> some condition.
>
> After TLI was incremented, there should be the case that the WAL
> file for older timeline is archived but not for that of the same
> segment id but for newer timeline. Archive recovery should fail
> for the case with PANIC error like follows,
>
> | PANIC: record with zero length at 0/1820D40
>
> Replay script is attached. This issue occured for 9.4dev, 9.3.2,
> and not for 9.2.6 and 9.1.11. The latter search pg_xlog for the
> TLI before trying archive for older TLIs.
>
> This occurrs during fetching checkpoint redo record in archive
> recovery.
>
>> if (checkPoint.redo < RecPtr)
>> {
>>     /* back up to find the record */
>>     record = ReadRecord(xlogreader, checkPoint.redo, PANIC, false);

Hmm. After running the script, pg_controldata says:

Latest checkpoint location:           0/2000090
Prior checkpoint location:            0/2000090
Latest checkpoint's REDO location:    0/2000090

The PANIC is actually coming from here:

>     /*
>      * Re-fetch the last valid or last applied record, so we can identify the
>      * exact endpoint of what we consider the valid portion of WAL.
>      */
>     record = ReadRecord(xlogreader, LastRec, PANIC, false);

If recovery started from an online checkpoint, then I think you'd get a 
similar PANIC from the code you quoted.

> And this is caused by that the segment file for older timeline in
> archive directory is preferred to that for newer timeline in
> pg_xlog.

Yep.

> Looking into pg_xlog before trying the older TLIs in archive like
> 9.2- fixes this issue. The attached patch is one possible
> solution for 9.4dev.
>
> Attached files are,
>
>   - recvtest.sh: Replay script. Step 1 and 2 makes the condition
>     and step 3 causes the issue.
>
>   - archrecvfix_20131212.patch: The patch fixes the issue. Archive
>     recovery reads pg_xlog before trying older TLI in archive
>     similarly to 9.1- by this patch.

Hmm, that seems reasonable at a quick glance. I think it also needs a 
change to the state transition code earlier in the loop, to not move 
from XLOG_FROM_ARCHIVE to XLOG_FROM_PG_XLOG, if we've already tried 
reading pg_xlog.

Another way to look at this is that it's wrong that we immediately PANIC 
if we successfully restore a file from archive, but then fail to read 
the record we're looking for. Instead, we should advance to next state 
in the state machine, ie. try reading the same record from pg_xlog, and 
only give up if the read fails from all sources.

Yet another way to look at this is that we shouldn't even try to read 
the file with TLI 1 from the archive, when we know the checkpoint record 
is on timeline 2. It can't possibly work.

I've been thinking for some time that we should stop doing the scan of 
all possible TLIs, and only try to read the exact file that contains the 
record we're reading. We have that information now, in the timeline 
history file, and we already do that during streaming replication. I was 
afraid of changing the behavior of archive recovery in 9.4, but maybe 
that's the way to go in the long term.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: [PATCH] Relocation of tablespaces in pg_basebackup
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Turning off HOT/Cleanup sometimes