Re: Failing start-up archive recovery at Standby mode in PG9.2.4
От | Andres Freund |
---|---|
Тема | Re: Failing start-up archive recovery at Standby mode in PG9.2.4 |
Дата | |
Msg-id | 20130424085947.GB4536@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Failing start-up archive recovery at Standby mode in PG9.2.4 (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>) |
Ответы |
Re: Failing start-up archive recovery at Standby mode in PG9.2.4
(Kyotaro HORIGUCHI <kyota.horiguchi@gmail.com>)
Re: Failing start-up archive recovery at Standby mode in PG9.2.4 (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>) |
Список | pgsql-hackers |
ello, On 2013-04-24 17:43:39 +0900, KONDO Mitsumasa wrote: > Hi, > > I find problem about failing start-up achive recovery at Standby mode in PG9.2.4 streaming replication. > I test same problem in PG9.2.3. But it is not occerd... > > cp: cannot stat `../arc/000000030000000000000013': そのようなファイルやディレクトリはありません > > [Standby] 2013-04-22 01:27:25 EDTLOG: 00000: restored log file "000000020000000000000013" from archive I can't read the error message here, but this looks suspicious. The recovery command seems to be returning success although it prints such an error message? > > cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません > [* This point is illegal work]> [Standby] 2013-04-22 01:27:25 EDTLOG: 00000: unexpected pageaddr 0/6000000 in log file0, segment 20, offset 0 > > [Standby] 2013-04-22 01:27:25 EDTLOCATION: ValidXLOGHeader, xlog.c:4395 > > cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません > > cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません > > [Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM > > [Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449 > > [Master] 2013-04-22 01:27:25 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000 > > [Master] 2013-04-22 01:27:25 EDTLOCATION: HandleReplicationCommand, walsender.c:449 > > [Master] 2013-04-22 01:27:25 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed > > [Master] 2013-04-22 01:27:25 EDTLOCATION: XLogRead, walsender.c:1023 > > [Standby] 2013-04-22 01:27:25 EDTLOG: 00000: streaming replication successfully connected to primary > > [Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171 > > [Standby] 2013-04-22 01:27:25 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment000000030000000000000014 has already been removed > > > > [Standby] 2013-04-22 01:27:25 EDTLOCATION: libpqrcv_receive, libpqwalreceiver.c:389 > > > > [mitsu-ko@localhost postgresql-9.2.4]$ bincp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません > > [Standby] 2013-04-22 01:27:30 EDTDEBUG: 00000: unexpected pageaddr 0/6000000 in log file 0, segment 20, offset 0 > > [Standby] 2013-04-22 01:27:30 EDTLOCATION: ValidXLOGHeader, xlog.c:4395 > > cp: cannot stat `../arc/000000030000000000000014': そのようなファイルやディレクトリはありません > > cp: cannot stat `../arc/00000004.history': そのようなファイルやディレクトリはありません > > [Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: IDENTIFY_SYSTEM > > [Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449 > > [Master] 2013-04-22 01:27:30 EDTDEBUG: 00000: received replication command: START_REPLICATION 0/14000000 > > [Master] 2013-04-22 01:27:30 EDTLOCATION: HandleReplicationCommand, walsender.c:449 > > [Master] 2013-04-22 01:27:30 EDTFATAL: 58P01: requested WAL segment 000000030000000000000014 has already been removed > > [Master] 2013-04-22 01:27:30 EDTLOCATION: XLogRead, walsender.c:1023 > > [Standby] 2013-04-22 01:27:30 EDTLOG: 00000: streaming replication successfully connected to primary > > [Standby] 2013-04-22 01:27:30 EDTLOCATION: libpqrcv_connect, libpqwalreceiver.c:171 > > [Standby] 2013-04-22 01:27:30 EDTFATAL: XX000: could not receive data from WAL stream: FATAL: requested WAL segment000000030000000000000014 has already been removed And this seems to be the second problem. You probably need to configure a higher wal_keep_segments on the primary or you need to fix your recovery_command. > We try to fix this problem with my company's colleaguea, > but it is too difficult and complicated to WAL contorol and validation method... > > This problem is occerd in xlog.c:10692. > > 10692 if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false)) > > 10693 goto next_record_is_invalid; > > And problem is occerd this checking in ValidXLOGHeader(). > > 4390 if (!XLByteEQ(hdr->xlp_pageaddr, recaddr)) > > 4391 { > > 4392 ereport(emode_for_corrupt_record(emode, recaddr), > > 4393 (errmsg("unexpected pageaddr %X/%X in log file %u, segment %u, offset %u", > > 4394 hdr->xlp_pageaddr.xlogid, hdr->xlp_pageaddr.xrecoff, > > 4395 readId, readSeg, readOff))); > > 4396 return false; > > 4397 } > > We think hdr->xlp_pageaddr.xrecoff has wrong value. This is very rare cace! > We cannot analyze why this value is uncorrect! I think the tests is just noticing that we don't have valid WAL - we are probably reading old preallocated wal files from before starting the node as a standby. Normally it will get the correct data from the primary after that (you can see the START_REPLICATION commands in the log), but that failed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: