Re: Enabling Checksums
От | Andres Freund |
---|---|
Тема | Re: Enabling Checksums |
Дата | |
Msg-id | 20130413152626.GD10556@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Re: Enabling Checksums (Ants Aasma <ants@cybertec.at>) |
Ответы |
Re: Enabling Checksums
(Ants Aasma <ants@cybertec.at>)
|
Список | pgsql-hackers |
On 2013-04-13 18:14:28 +0300, Ants Aasma wrote: > On Sat, Apr 13, 2013 at 5:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > >> On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: > >>> As I understand it, SIMD is just a CPU-optimized method for producing a > >>> CRC checksum. Is that right? Does it produce the same result as a > >>> non-CPU-optimized CRC calculation? > > > >> No we are talking about a different algorithm that results in different > >> results, thats why its important to choose now since we can't change it > >> later without breaking pg_upgrade in further releases. > >> http://en.wikipedia.org/wiki/SIMD_%28hash_function%29 > > > > [ squint... ] We're talking about a *cryptographic* hash function? > > Why in the world was this considered a good idea for page checksums? > > > > In the first place, it's probably not very fast compared to some > > alternatives, and in the second place, the criteria by which people > > would consider it a good crypto hash function have approximately nothing > > to do with what we need for a checksum function. What we want for a > > checksum function is high probability of detection of common hardware > > failure modes, such as burst errors and all-zeroes. This is > > particularly critical when we're going with only a 16-bit checksum --- > > the probabilities need to be skewed in the right direction, or it's not > > going to be all that terribly useful. > > > > CRCs are known to be good for that sort of thing; it's what they were > > designed for. I'd like to see some evidence that any substitute > > algorithm has similar properties. Without that, I'm going to vote > > against this idea. > > Sorry for creating confusion here by playing fast and loose with the > terminology. We are not talking about that hash function at all. What > we are talking about here is Fowler-Noll-Vo-ish > (http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function) > hash function that is restructured to be parallelisable with SIMD > instructions with the explicit goal of being as fast as possible. The > resulting hash function is roughly two orders of magnitude faster than > 1-byte-at-a-time CRC32 currently in use. Performance is about > comparable with optimized fixed size memcpy running in cache. Gah, one shouldn't look to quick for a reference, sorry. > Based on current analysis, it is particularly good at detecting single > bit errors, as good at detecting burst errors as can be expected from > 16 bits and not horrible at detecting burst writes of zeroes. It is > quite bad at detecting multiple uncorrelated single bit errors and > extremely bad at detecting repeating patterns of errors in low order > bits. > All in all I would say that the performance is worth the loss in > detection capability as we are not talking about using the checksum to > prove correctness. Is it actually a loss compared to our 16bit flavor of crc32 we now use? I didn't think so far from the properties you described? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Следующее
От: Tomas VondraДата:
Сообщение: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database