Re: Enabling Checksums

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Enabling Checksums
Дата
Msg-id 14584.1363480867@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 15 March 2013 13:08, Andres Freund <andres@2ndquadrant.com> wrote:
>> I commented on this before, I personally think this property makes fletcher a
>> not so good fit for this. Its not uncommon for parts of a block being all-zero
>> and many disk corruptions actually change whole runs of bytes.

> I think you're right to pick up on this point, and Ants has done a
> great job of explaining the issue more clearly.

> My perspective, after some thought, is that this doesn't matter to the
> overall effectiveness of this feature.

> PG blocks do have large runs of 0x00 in them, though that is in the
> hole in the centre of the block. If we don't detect problems there,
> its not such a big deal. Most other data we store doesn't consist of
> large runs of 0x00 or 0xFF as data. Most data is more complex than
> that, so any runs of 0s or 1s written to the block will be detected.

Meh.  I don't think that argument holds a lot of water.  The point of
having checksums is not so much to notice corruption as to be able to
point the finger at flaky hardware.  If we have an 8K page with only
1K of data in it, and we fail to notice that the hardware dropped a lot
of bits in the other 7K, we're not doing our job; and that's not really
something to write off, because it would be a lot better if we complain
*before* the hardware manages to corrupt something valuable.

So I think we'd be best off to pick an algorithm whose failure modes
don't line up so nicely with probable hardware failure modes.  It's
worth noting that one of the reasons that CRCs are so popular is
precisely that they were designed to detect burst errors with high
probability.

> What I think we could do here is to allow people to set their checksum
> algorithm with a plugin.

Please, no.  What happens when their plugin goes missing?  Or they
install the wrong one on their multi-terabyte database?  This feature is
already on the hairy edge of being impossible to manage; we do *not*
need to add still more complication.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Strange Windows problem, lock_timeout test request
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Strange Windows problem, lock_timeout test request