Re: Hash support for arrays

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Hash support for arrays
Дата
Msg-id 4CD0217202000025000371A2@gw.wicourts.gov
обсуждение исходный текст
Ответ на Re: Hash support for arrays  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Hash support for arrays  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> The goal is to make those hard to predict, though.
>>
>> Really?  I think "I don't understand when this fails" isn't
>> obviously better than being able to predict when it fails ...
> 
> Isn't that the whole point of hash functions?  Collisions are
> inevitable, but you want them to be unpredictable.
Are you sure you're not confusing attributes of a good random number
generator with hash generation?  (They have much in common, but I've
only seen concern with "hard to predict" when working on random
number algorithms, as for financial audits or jury selection.)
> If you want a hash function with predictable collision behavior,
> just has the first element.  It'll be highly predictable AND
> wicked fast.
You're confusing "unpredictable" with "widely distributed", I think.
There's no reason that the hash value of an integer of the same size
as the hash shouldn't be the integer itself, for example.  It's hard
to get more predictable than that, yet it works well for hash
lookups.
I know that for a hash of a character string, the total = (total *
31) + nextcharacter has been shown to be effective.  That's hardly
random or hard to predict, but it does tend to spread out the hash
values well.  Whether it is as good for arrays, I don't know.  It
seems like a reasonable place to start, though, since a character
string is rather like an array of characters.
>> What concerns me about that is that it tends to push the bits to
>> the left --- I think the effects of the earlier inputs are going
>> to overflow out as you incorporate a lot of newer inputs.  What
>> you want is a scheme where every input item affects all the bits
>> of the final result about equally, and my gut feeling is this
>> doesn't provide that.
> 
> I don't think so.  That would be a problem if you multiplied by an
> even number.  You can see that you don't get dups if you just add
> in the same value over and over:
Yeah, that 1 in the low order position ensures that the impact of
any value which is ever added into the total is never entirely lost.
-Kevin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Hash support for arrays
Следующее
От: Robert Haas
Дата:
Сообщение: Re: ALTER TYPE recursion to typed tables