Hi hackers,
I believe I've discovered a race condition between the startup and
checkpointer processes that can cause a CRC mismatch in the pg_control
file. If a cluster crashes at the right time, the following error
appears when you attempt to restart it:
FATAL: incorrect checksum in control file
This appears to be caused by some code paths in xlog_redo() that
update ControlFile without taking the ControlFileLock. The attached
patch seems to be sufficient to prevent the CRC mismatch in the
control file, but perhaps this is a symptom of a bigger problem with
concurrent modifications of ControlFile->checkPointCopy.nextFullXid.
Nathan