On 1 December 2012 22:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> tarvip@gmail.com writes:
>> [ txid_current can show a bogus value near XID wraparound ]
>> This happens only if wal_level=hot_standby.
>
> I believe what is happening here is
>
> (1) CreateCheckPoint sets up checkPoint.nextXid and
> checkPoint.nextXidEpoch, near xlog.c line 7070 in HEAD. At this point,
> nextXid is still a bit less than the wrap point.
>
> (2) After performing the checkpoint, at line 7113, CreateCheckPoint
> calls LogStandbySnapshot() which "helpfully" updates checkPoint.nextXid
> to the latest value. Which by now has wrapped around. But it doesn't
> fix checkPoint.nextXidEpoch, so the checkpoint that gets written out has
> effectively lost the epoch bump that should have happened.
>
> While we could add some more logic to try to correct the epoch value
> in this scenario, I think it's a much better idea to just stop having
> LogStandbySnapshot update the nextXid. That seems to me to be useless
> complication. I also quite dislike the fact that we're effectively
> redefining the checkpoint nextXid from being taken before the main
> body of the checkpoint to being taken afterwards, but *only* in
> XLogStandbyInfoActive mode. If that inconsistency isn't already causing
> bugs (besides this one) today, it'll probably cause them in the future.
I agree that the coding looks weird and agree it shouldn't be there.
The meaning of the checkpoint values should not differ because
wal_level has changed.
> So barring objections, I'm going to remove LogStandbySnapshot's behavior
> of returning the updated nextXid.
Removing it may cause other bugs, but if so, those other bugs need to
be solved in the right way, not by having a too-far-forwards nextxid
on the checkpoint record. Having said that, I can't see any bugs that
would be caused by this.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services