Re: Enabling Checksums

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: Enabling Checksums
Дата
Msg-id 3A16C439-AC0F-4842-A4DD-BD25526462BF@phlo.org
обсуждение исходный текст
Ответ на Re: Enabling Checksums  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Apr18, 2013, at 00:32 , Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ants Aasma <ants@cybertec.at> writes:
>> I was thinking about something similar too. The big issue here is that
>> the parallel checksums already hide each other latencies effectively
>> executing one each of movdqu/pmullw/paddw each cycle, that's why the
>> N_SUMS adds up to 128 bytes not 16 bytes.
>
> The more I read of this thread, the more unhappy I get.  It appears that
> the entire design process is being driven by micro-optimization for CPUs
> being built by Intel in 2013.  That ought to be, at best, a fifth-order
> consideration, with full recognition that it'll be obsolete in two years,
> and is already irrelevant to anyone not running one of those CPUs.

Micro-optimization for particular CPUs yes, but general performance
considerations, no. For example, 2^n is probably one of the worst modulus
you can pick for a hash function - any prime would work much better.
But doing the computations modulo 2^16 or 2^32 carries zero performance
overhead, whereas picking another modulus requires some renormalization
after every operation. That, however, is *not* a given - it stems from
the fact nearly all CPUs in existence operated on binary integers. This
fact must thus enter into the design phase very early, and makes
2^16 or 2^32 a sensible choice for a modulus *despite* it's shortcomings,
simply because it allows for fast implementations.

> I would like to ban all discussion of assembly-language optimizations
> until after 9.3 is out, so that we can concentrate on what actually
> matters.  Which IMO is mostly the error detection rate and the probable
> nature of false successes.  I'm glad to see that you're paying at least
> some attention to that, but the priorities in this discussion are
> completely backwards.

I'd say lots of attention is paid to that, but there's *also* attention
paid to speed. Which I good, because ideally we want to end up with
a checksum with both has good error-detection properties *and* good
performance. If performance is of no concern to us, then there's little
reason not to use CRC…

> And I reiterate that there is theory out there about the error detection
> capabilities of CRCs.  I'm not seeing any theory here, which leaves me
> with very little confidence that we know what we're doing.

If you've got any pointers to literature on error-detection capabilities
of CPU-friendly checksum functions, please share. I am aware of the vast
literature on CRC, and also on some other algebraic approaches, but
none of those even come close to the speed of FNV+shift (unless there's
a special CRC instruction, that is). And there's also a ton of stuff on
cryptographic hashing, but those are optimized for a completely different
use-case...

best regards,
Florian Pflug




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [PATCH] Add \ns command to psql
Следующее
От: Florian Pflug
Дата:
Сообщение: Re: Enabling Checksums