Re: Hash index todo list item
| От | Brian Hurt |
|---|---|
| Тема | Re: Hash index todo list item |
| Дата | |
| Msg-id | 46E161F9.8070705@janestcapital.com обсуждение исходный текст |
| Ответ на | Re: Hash index todo list item (Kenneth Marshall <ktm@rice.edu>) |
| Ответы |
Re: Hash index todo list item
Re: Hash index todo list item |
| Список | pgsql-hackers |
Kenneth Marshall wrote: >I understand that a hash value is a many-to-one mapping. That is the >point of the flag in the index. The flag means that there is only one >item in the heap corresponding to that hash value. In this case we >know that the value in the heap is the correct one and a possibly >very expensive string comparison can be skipped. Given that the hash >function is doing its job, almost every string comparison can be skipped. >How long would it take to compare 1-32K of data? How much CPU usage? >With this field in place, you only need to check tuple visibility. > > How likely is it that you will get a hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you get into birthday attack problems. With a 32-bit hash, the likelyhood is greater than 50% that two strings in a collection of 100,000 will hash to the same value. With a 64-bit hash, the likelyhood is greater than 50% that two strings in a collection of 10 billion will has to same value. 10 billion is a large number, but not an unreasonable number, of strings to want to put into a hash table- and it's exactly this case where the O(1) cost of hashtables starts being a real win. Brian
В списке pgsql-hackers по дате отправления: