Re: [BUG?] lag of minRecoveryPont in archive recovery
От | Fujii Masao |
---|---|
Тема | Re: [BUG?] lag of minRecoveryPont in archive recovery |
Дата | |
Msg-id | CAHGQGwHB-ToOv1W52YRBX9+5vW2B5BvHad0XjfW6FfYS=5F-=Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [BUG?] lag of minRecoveryPont in archive recovery (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: [BUG?] lag of minRecoveryPont in archive recovery
Re: [BUG?] lag of minRecoveryPont in archive recovery |
Список | pgsql-hackers |
On Tue, Dec 11, 2012 at 6:14 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 11.12.2012 08:07, Kyotaro HORIGUCHI wrote: >>> >>> The cause is that CheckRecoveryConsistency() is called before rm_redo(), >>> as Horiguchi-san suggested upthead. Imagine the case where >>> minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE >>> record. When restarting the server with that minRecoveryPoint, >>> the followings would happen, and then PANIC occurs. >>> >>> 1. XLOG_SMGR_TRUNCATE record is read. >>> 2. CheckRecoveryConsistency() is called, and database is marked as >>> consistent since we've reached minRecoveryPoint (i.e., the location >>> of XLOG_SMGR_TRUNCATE). >>> 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is >>> found. >>> 4. Since the database has already been marked as consistent, an invalid >>> page leads to PANIC. >> >> >> Exactly. >> >> In smgr_redo, EndRecPtr which is pointing the record next to >> SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other >> hand, during the second startup of the standby, >> CheckRecoveryConsistency checks for consistency by >> XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at >> just BEFORE the SMGR_TRUNCATE record is applied. So >> reachedConsistency becomes true just before the SMGR_TRUNCATE >> record will be applied. Bang! > > > Ah, I see. I thought moving CheckRecoveryConsistency was just a nice-to-have > thing, so that we'd notice that we're consistent earlier, so didn't pay > attention to that part. > > I think the real issue here is that CheckRecoveryConsistency should not be > comparing minRecoveryPoint against recoveryLastRecPtr, not EndRecPtr as it > currently does. EndRecPtr points to the end of the last record *read*, while > recoveryLastRecPtr points to the end of the last record *replayed*. The > latter is what CheckRecoveryConsistency should be concerned with. > > BTW, it occurs to me that we have two variables in the shared struct that > are almost the same thing: recoveryLastRecPtr and replayEndRecPtr. The > former points to the end of the last record successfully replayed, while > replayEndRecPtr points to the end of the last record successfully replayed, > or begin replayed at the moment. I find that confusing, so I suggest that we > rename recoveryLastRecPtr to make that more clear. Included in the attached > patch. The patch looks good. I confirmed that the PANIC error didn't happen, with the patch. xlog.c > * Initialize shared replayEndRecPtr, recoveryLastRecPtr, and s/recoveryLastRecPtr/lastReplayedEndRecPtr Regards, -- Fujii Masao
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andres FreundДата:
Сообщение: Re: pg_dump cosmetic problem while dumping/restoring rules