hashtext & collisions

Поиск

Список

Период

Сортировка

От	Leon Mergen
Тема	hashtext & collisions
Дата	12 апреля 2007 г. 03:59:26
Msg-id	5eaaef180704111546j742c802et7619dd180ce68a71@mail.gmail.com обсуждение исходный текст
Список	pgsql-general

Дерево обсуждения

Hello,

Okay, I have some troubles trying to determine how to most efficiently
store a database which will contain a couple of huge tables (think
5bil+ rows). These tables each have a bigint id and a character
varying value. Now, I'm currently partitioning these tables based on
the hashtext (value) % 1000, to determine which subtable a certain
value should be stored in.

However, I often also need to find a value for an id; instead of using
the sequential numbering that a BIGSERIAL would provide, I am
thinking: wouldn't it make some kind of sense if I used the value of
hashtext('value') to determine the id ? Then, if I need to determine
the value that belongs to a certain id, I can just % 1000 the value
and know which subtable the value is stored in, reducing the amount of
tables to search with a factor 500.

Now, my question is: how big is the chance that a collision happens
between hashes ? I noticed that the function only returns a 32 bit
number, so I figure it must be at least once in the 4 billion values.
If this approach is not recommended (using hashes as keys), any other
suggestions on how to make the subtable name derivable from an
identification number ?

--
Leon Mergen
http://www.solatis.com

В списке pgsql-general по дате отправления:

Предыдущее

От: Andrew - Supernews
Дата: 12 апреля 2007 г., 03:55:05
Сообщение: Re: hashtext () and collisions

Следующее

От: "Chris Fischer"
Дата: 12 апреля 2007 г., 04:17:59
Сообщение: Re: SQL - finding next date

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

hashtext & collisions

Предыдущее

Следующее