Re: Failing start-up archive recovery at Standby mode in PG9.2.4

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Дата
Msg-id 20130424085947.GB4536@awork2.anarazel.de
обсуждение исходный текст
Ответ на Failing start-up archive recovery at Standby mode in PG9.2.4  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Ответы Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (Kyotaro HORIGUCHI <kyota.horiguchi@gmail.com>)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Список pgsql-hackers
ello,

On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote:
> Hi,
>
> I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication.
> I test same problem in PG9.2.3. But it is not occerd...

> > cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません
> > [Standby] 2013-04-22 01:27:25 EDTLOG:  00000: restored log file "000000020000000000000013" from archive

I can't read the error message here, but this looks suspicious. The
recovery command seems to be returning success although it prints such
an error message?

> > cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
> [* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG:  00000: unexpected pageaddr 0/6000000 in log
file0, segment 20, offset 0 
> > [Standby] 2013-04-22 01:27:25 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
> > cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
> > cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
> > [Master] 2013-04-22 01:27:25 EDTDEBUG:  00000: received replication command: IDENTIFY_SYSTEM
> > [Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, walsender.c:449
> > [Master] 2013-04-22 01:27:25 EDTDEBUG:  00000: received replication command: START_REPLICATION 0/14000000
> > [Master] 2013-04-22 01:27:25 EDTLOCATION:  HandleReplicationCommand, walsender.c:449
> > [Master] 2013-04-22 01:27:25 EDTFATAL:  58P01: requested WAL segment 000000030000000000000014 has already been
removed
> > [Master] 2013-04-22 01:27:25 EDTLOCATION:  XLogRead, walsender.c:1023
> > [Standby] 2013-04-22 01:27:25 EDTLOG:  00000: streaming replication successfully connected to primary
> > [Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
> > [Standby] 2013-04-22 01:27:25 EDTFATAL:  XX000: could not receive data from WAL stream: FATAL:  requested WAL
segment000000030000000000000014 has already been removed 
> >
> > [Standby] 2013-04-22 01:27:25 EDTLOCATION:  libpqrcv_receive, libpqwalreceiver.c:389
> >
> > [mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
> > [Standby] 2013-04-22 01:27:30 EDTDEBUG:  00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0
> > [Standby] 2013-04-22 01:27:30 EDTLOCATION:  ValidXLOGHeader, xlog.c:4395
> > cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません
> > cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません
> > [Master] 2013-04-22 01:27:30 EDTDEBUG:  00000: received replication command: IDENTIFY_SYSTEM
> > [Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, walsender.c:449
> > [Master] 2013-04-22 01:27:30 EDTDEBUG:  00000: received replication command: START_REPLICATION 0/14000000
> > [Master] 2013-04-22 01:27:30 EDTLOCATION:  HandleReplicationCommand, walsender.c:449
> > [Master] 2013-04-22 01:27:30 EDTFATAL:  58P01: requested WAL segment 000000030000000000000014 has already been
removed
> > [Master] 2013-04-22 01:27:30 EDTLOCATION:  XLogRead, walsender.c:1023
> > [Standby] 2013-04-22 01:27:30 EDTLOG:  00000: streaming replication successfully connected to primary
> > [Standby] 2013-04-22 01:27:30 EDTLOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
> > [Standby] 2013-04-22 01:27:30 EDTFATAL:  XX000: could not receive data from WAL stream: FATAL:  requested WAL
segment000000030000000000000014 has already been removed 

And this seems to be the second problem. You probably need to configure
a higher wal_keep_segments on the primary or you need to fix your
recovery_command.


> We try to fix this problem with my company's colleaguea,
> but it is too difficult and complicated to WAL contorol and validation method...
>
> This problem is occerd in xlog.c:10692.
> > 10692         if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
> > 10693                 goto next_record_is_invalid;
>
> And problem is occerd this checking in ValidXLOGHeader().
> >  4390         if (!XLByteEQ(hdr->xlp_pageaddr, recaddr))
> >  4391         {
> >  4392                 ereport(emode_for_corrupt_record(emode, recaddr),
> >  4393                                 (errmsg("unexpected pageaddr %X/%X in log file %u, segment %u, offset %u",
> >  4394                                                 hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff,
> >  4395                                                 readId, readSeg, readOff)));
> >  4396                 return false;
> >  4397         }
>
> We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace!
> We cannot analyze why this value is uncorrect!

I think the tests is just noticing that we don't have valid WAL - we are
probably reading old preallocated wal files from before starting the
node as a standby. Normally it will get the correct data from the
primary after that (you can see the START_REPLICATION commands in the
log), but that failed.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Fast promotion, loose ends
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Fast promotion, loose ends