Re: Hash index todo list item

Поиск
Список
Период
Сортировка
От Kenneth Marshall
Тема Re: Hash index todo list item
Дата
Msg-id 20070907145507.GJ19403@it.is.rice.edu
обсуждение исходный текст
Ответ на Re: Hash index todo list item  (Brian Hurt <bhurt@janestcapital.com>)
Ответы Re: Hash index todo list item  (Brian Hurt <bhurt@janestcapital.com>)
Список pgsql-hackers
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case we
>> know that the value in the heap is the correct one and a possibly
>> very expensive string comparison can be skipped. Given that the hash
>> function is doing its job, almost every string comparison can be skipped.
>> How long would it take to compare 1-32K of data? How much CPU usage?
>> With this field in place, you only need to check tuple visibility.
>>  
>
> How likely is it that you will get a hash collision, two strings that are 
> different that will hash to the same value?  To avoid this requires a very 
> large hash key (128 bits, minimum)- otherwise you get into birthday attack 
> problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
> strings in a collection of 100,000 will hash to the same value.  With a 
> 64-bit hash, the likelyhood is greater than 50% that two strings in a 
> collection of 10 billion will has to same value.  10 billion is a large 
> number, but not an unreasonable number, of strings to want to put into a 
> hash table- and it's exactly this case where the O(1) cost of hashtables 
> starts being a real win.
>
> Brian
>
Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.

Regards,
Ken


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Markus Schiltknecht
Дата:
Сообщение: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Следующее
От: Markus Schiltknecht
Дата:
Сообщение: terms for database replication: synchronous vs eager