Re: Hash support for arrays

Поиск
Список
Период
Сортировка
От Ron Mayer
Тема Re: Hash support for arrays
Дата
Msg-id 4CD0E390.4080708@cheapcomplexdevices.com
обсуждение исходный текст
Ответ на Re: Hash support for arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote:
> It's possible that the multiply-by-31 method is as good as the
> rotate-and-xor method by that measure, or even better; but it's far from
> obvious that it's better.  And I'm not convinced that the multiply
> method has a pedigree that should encourage me to take it on faith.

Short summary: * some history of it's trivia follows * (nothing here suggests it's better - just old and common and
cheap)


Longer - some trivia regarding its pedigree:

It (or at least a *31 variant) seems to have a history of advocacy
going back to Chris Torek in 1990:
http://groups.google.com/group/comp.lang.c/browse_thread/thread/28c2095282f0c1b5/193be99e9836791b?q=#193be99e9836791b
 X#define        HASH(str, h, p) \ X       for (p = str, h = 0; *p;) h = (h << 5) - h + *p++

and gets referred to in Usenet papers in the early 90's as well:
http://www.usenix.com/publications/library/proceedings/sa92/salz.pdf


Regarding "why the magic number 31 [or 33 which also often comes
up]" apparently the only thing magic about it is that it's an
odd number != 1.    The rest of the odd numbers work about as well
according to this guy who tried to explain it:

http://svn.eu.apache.org/repos/asf/apr/apr/trunk/tables/apr_hash.c    * The magic of number 33, i.e. why it works
betterthan many other    * constants, prime or not, has never been adequately explained by    * anyone. So I try an
explanation:if one experimentally tests all    * multipliers between 1 and 256 (as I did while writing a low-level    *
datastructure library some time ago) one detects that even    * numbers are not useable at all. The remaining 128 odd
numbers   * (except for the number 1) work more or less all equally well.    * They all distribute in an acceptable way
andthis way fill a hash    * table with an average percent of approx. 86%.    *    * If one compares the chi^2 values
ofthe variants (see    * Bob Jenkins ``Hashing Frequently Asked Questions'' at    *
http://burtleburtle.net/bob/hash/hashfaq.htmlfor a description    * of chi^2), the number 33 not even has the best
value.But the    * number 33 and a few other equally good numbers like 17, 31, 63,    * 127 and 129 have nevertheless a
greatadvantage to the remaining    * numbers in the large set of possible multipliers: their multiply    * operation
canbe replaced by a faster operation based on just one    * shift plus either a single addition or subtraction
operation.And    * because a hash function has to both distribute good _and_ has to    * be very fast to compute, those
fewnumbers should be preferred.    ...
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] V3: Idle in transaction cancellation
Следующее
От: Marc Cousin
Дата:
Сообщение: Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up