Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Поиск
Список
Период
Сортировка
От Mitsumasa KONDO
Тема Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Дата
Msg-id CADupcHWjBsozhZFZctpvxQryA=ikKL84m2th+B0wgomS3GpMBQ@mail.gmail.com
обсуждение исходный текст
Ответ на Failing start-up archive recovery at Standby mode in PG9.2.4  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Список pgsql-hackers
I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during archive recovery. But I cannot recognize why illegal WAL file was created in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(),  first check WAL file does not get fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()
>     if (sources & XLOG_FROM_ARCHIVE)
>     {
>       fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
>        if (fd != -1)
>        {
>           elog(DEBUG1, "got WAL segment from archive");
>           return fd;
>        }
>     }

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And return illegal WAL File’s fd.

XLogFileReadAnyTLI()
>      if (sources & XLOG_FROM_PG_XLOG)
>      {
>         fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
>         if (fd != -1)
>            return fd;
>      }

Returned fd is be readFile value. Of cource readFile value is over 0. So out of for-loop.

XLogPageRead
>              readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
>                                      sources);
>               switched_segment = true;
>               if (readFile >= 0)
>                  break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead
>   if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
>  {
>      ereport(emode_for_corrupt_record(emode, *RecPtr),
>            (errcode_for_file_access(),
>       errmsg("could not seek in log file %u, segment %u to offset %u: %m",
>            readId, readSeg, readOff)));
>      goto next_record_is_invalid;
>   }
>   if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
>   {
>      ereport(emode_for_corrupt_record(emode, *RecPtr),
>            (errcode_for_file_access(),
>       errmsg("could not read from log file %u, segment %u, offset %u: %m",
>            readId, readSeg, readOff)));
>      goto next_record_is_invalid;
>   }
>   if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
>      goto next_record_is_invalid;


I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Recovery target 'immediate'
Следующее
От: Robert Haas
Дата:
Сообщение: Re: libpq COPY handling