Re: Enabling Checksums
От | Jeff Davis |
---|---|
Тема | Re: Enabling Checksums |
Дата | |
Msg-id | 1353361583.1102.18.camel@sussancws0025 обсуждение исходный текст |
Ответ на | Re: Enabling Checksums (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-hackers |
On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote: > Yes, the blocks written *after* the checkpoint might have a bad checksum > that will be fixed during recovery. But the blocks written *before* the > checkpoint should have a valid checksum, but if they don't, then > recovery doesn't know about them. > > So, we can't verify the checksums in the base backup because it's > expected that some blocks will fail the check, and they can be fixed > during recovery. That gives us no protection for blocks that were truly > corrupted and written long before the last checkpoint. > > I suppose if we could somehow differentiate the blocks, that might work. > Maybe look at the LSN and only validate blocks written before the > checkpoint? But of course, that's a problem because a corrupt block > might have the wrong LSN (in fact, it's likely, because garbage is more > likely to make the LSN too high than too low). It might be good enough here to simply retry the checksum verification if it fails for any block. Postgres shouldn't be issuing write()s for the same block very frequently, and they shouldn't take very long, so the chances of failing several times seems vanishingly small unless it's a real failure. Through a suitably complex mechanism, I think we can be more sure. The external program could wait for a checkpoint (or force one manually), and then recalculate the checksum for that page. If checksum is the same as the last time, then we know the block is bad (because the checkpoint would have waited for any writes in progress). If the checksum does change, then we assume postgres must have modified it since the backup started, so we can assume that we have a full page image to fix it. (A checkpoint is a blunt tool here, because all we need to do is wait for the write() call to finish, but it suffices.) That complexity is probably not required, and simply retrying a few times is probably much more practical. But it still bothers me a little to think that the external tool could falsely indicate a checksum failure, however remote that chance. Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: