On 21.01.2017 19:37, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> Because I see having checksums as, frankly, something we always should
>>> have had (as most other databases do, for good reason...) and because
>>> they will hopefully prevent data loss. I'm willing to give us a fair
>>> bit to minimize the risk of losing data.
>>
>> To be perfectly blunt, that's just magical thinking. Checksums don't
>> prevent data loss in any way, shape, or form. In fact, they can *cause*
>> data loss, or at least make it harder for you to retrieve your data,
>> in the event of bugs causing false-positive checksum failures.
>
> This is not a new argument, at least to me, and I don't agree with it.
I don't agree also. Yes, statistically it is more likely that checksum
causes data-loss. The IO is greater, therefore the disc has more to do
and breaks faster.
But the same is true for RAID: adding more disk increases the odds of an
disk-fallout.
So: yes. If you use checksums at a single disc its more likely to cause
problems. But if you managed it right (like ZFS for example) its an
overall gain.
>> What checksums can do for you, perhaps, is notify you in a reasonably
>> timely fashion if you've already lost data due to storage-subsystem
>> problems. But in a pretty high percentage of cases, that fact would
>> be extremely obvious anyway, because of visible data corruption.
>
> Exactly, and that awareness will allow a user to prevent further data
> loss or corruption. Slow corruption over time is a very much known and
> accepted real-world case that people do experience, as well as bit
> flipping enough for someone to write a not-that-old blog post about
> them:
>
> https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1
>
> A really nice property of checksums on pages is that they also tell you
> what data you *didn't* lose, which can be extremely valuable.
Indeed!
Greetings,
Torsten