Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
>
> > I think we need to think about when these CRCs would be read and
> > written. It would be written when it hits the disk, hopefully by the
> > background writer, and I think after a server crash, all pages would
> > have to be read and checked. The good news is that both of these are
> > non-critical paths.
>
> If you're protecting against torn pages then yes, if the system is shut down
> uncleanly by a system crash or power failure you would in theory have to scan
> every page of every table and index before starting up.
>
> But if the system was shut down uncleanly as the result of a Postgres crash or
> fast shutdown of Postgres then that isn't an issue. And many users may prefer
> to bring the system up as soon as possible as long as they know any corrupt
> pages will be spotted and throw errors as soon as it's seen.
I don't think we should start up a system and only detect the errors
later.
> So I think you need a mode that only checks checksums when a page is read from
> disk. That would protect against torn pages (but not necessarily before
> bringing up the system) and against bad i/o hardware.
>
> Unfortunately memory errors are far more common than disk errors and I it
> would be much harder to protect against them. You can't check it when someone
> may be writing to the buffer, which limits you to checking it only when you
> acquire some form of lock on the buffer. It also means you would have to write
> it before you release a lock if you've made any changes.
>
> Worse, I'm not sure how to handle hint bits though. We currently don't require
> any lock at all to set hint bits which means someone may think they can check
> a checksum while or after you've fiddled some bits.
Yep, a problem.
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +