Re: [HACKERS] Hash Functions

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Hash Functions
Дата
Msg-id CA+TgmoYRYrBk8+RU1sd=GGirvtSM1-Su1VuB4zsuJy0h-dDH9w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Hash Functions  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] Hash Functions  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund <andres@anarazel.de> wrote:
> Just to clarify: I don't think it's a problem to do so for integers and
> most other simple scalar types. There's plenty hash algorithms that are
> endianess independent, and the rest is just a bit of care.

Do you have any feeling for which of those endianness-independent hash
functions might be a reasonable choice for us?

https://github.com/markokr/pghashlib implements various hash functions
for PostgreSQL, and claims that, of those implemented, crc32, Jenkins,
lookup3be and lookup3le, md5, and siphash24 are endian-independent.

An interesting point here is that Jeff Davis asserted in the original
post on this thread that our existing hash_any() wasn't portable, but
our current hash_any seems to be the Jenkins algorithm -- so I'm
confused.  Part of the problem seems to be that, according to
https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of
those.  I don't know whether the one in pghashlib is the same one
we've implemented.

Kennel Marshall suggested xxhash as an endian-independent algorithm
upthread.  Code for that is available under a 2-clause BSD license.

PostgreSQL page checksums use an algorithm based on, but not exactly,
FNV-1a.  See storage/checksum_impl.h.  The comments there say this
algorithm was chosen with speed in mind.  Our version is not
endian-independent because it folds in 4-byte integers rather than
1-byte integers, but plain old FNV-1a *is* endian-independent and
could be used.

We also have an implementation of CRC32C in core - see port/pg_crc32.h
and port/pg_crc32c_sb8.c.  It's not clear to me whether this is
Endian-independent or not, although there is stuff that depends on
WORDS_BIGENDIAN, so, uh, maybe?

Some other possibly-interesting links:
https://research.neustar.biz/2011/12/29/choosing-a-good-hash-function-part-2/
http://greenrobot.org/essentials/features/performant-hash-functions-for-java/comparison-of-hash-functions/
https://www.strchr.com/hash_functions

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [HACKERS] PL_stashcache, or, what's our minimum Perl version?
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative