On 13.02.2012 01:04, Jeff Janes wrote:
> Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD.
> I have no idea if I did it correctly, in particular if calling
> GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side
> effects that make that a bad thing to do. I'm not proposing it as the
> real fix, I just wanted to get around this problem in order to do more
> testing.
Thanks. That's basically the right approach. Attached patch contains a
cleaned up version of that.
> It does get rid of the "there is no contrecord flag" errors, but
> recover still does not work.
>
> Now the count of tuples in the table is always correct (I never
> provoke a crash during the initial table load), but sometimes updates
> to those tuples that were reported to have been committed are lost.
>
> This is more subtle, it does not happen on every crash.
>
> It seems that when recovery ends on "record with zero length at...",
> that recovery is correct.
>
> But when it ends on "invalid magic number 0000 in log file.." then the
> recovery is screwed up.
Can you write a self-contained test case for that? I've been trying to
reproduce that by running the regression tests and pgbench with a
streaming replication standby, which should be pretty much the same as
crash recovery. No luck this far.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com