Re: Enabling Checksums

Поиск
Список
Период
Сортировка
От Ants Aasma
Тема Re: Enabling Checksums
Дата
Msg-id CA+CSw_vfBoVbnBq=gjiBXUj5kxRCtQSFb4iu6mrN6Ve3v3m_Yg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Enabling Checksums  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 9 April 2013 03:35, Ants Aasma <ants@cybertec.at> wrote:
>> On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma <ants@cybertec.at> wrote:
>>> Unless somebody tells me not to waste my time I'll go ahead and come
>>> up with a workable patch by Monday.
>>
>> And here you go. I decided to be verbose with the comments as it's
>> easier to delete a comment to write one. I also left in a huge jumble
>> of macros to calculate the contents of a helper var during compile
>> time. This can easily be replaced with the calculated values once we
>> settle on specific parameters.
>>
>> Currently only x86-64 is implemented. 32bit x86 would be mostly a
>> copy-and-paste job, replacing 64bit pointer registers with 32bit ones.
>> For other platforms the simplest way would be to use a vectorizing
>> compiler on the generic variant. -funroll-loops -ftree-vectorize is
>> enough on gcc.
>>
>> Quick bench results on the worst case workload:
>> master no checksums: tps = 15.561848
>> master with checksums: tps = 1.695450
>> simd checksums: tps = 14.602698
>
> Numbers look very good on this. Well done.
>
> I support the direction of this, but I don't think I'm sufficiently
> well qualified to verify that the code does what it should and/or fix
> it if it breaks. If others want to see this happen you'll need to
> pitch in.
>
> My only review comments are to ask for some explanation of the magic numbers...
> #define CSUM_PRIME1 0x49
> #define CSUM_PRIME2 0x986b
> #define CSUM_TRUNC 65521
>
> Where magic means a level of technology far above my own
> understanding, and yet no (or not enough) code comments to assist me.

The specific values used are mostly magic to me too. As mentioned in a
short sentence in the patch, the values are experimentally chosen,
guided by some intuition about what good values should look like.

Basically the methodology for the choice was that I took all the pages
from a 2.8GB test database, and then for each page introduced a bunch
of common errors and observed how many errors were undetected. The
main observations were: 1) the exact value of the primes doesn't
really matter for detection efficiency.
2) values with a non-uniform distribution of zeroes and ones seem to
work slightly better.

I'll write up a readme of why the values are how they are and with
some more explanation of the algorithm.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [GENERAL] currval and DISCARD ALL
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] currval and DISCARD ALL