Re: Hash indexes

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: Hash indexes
Дата	1 августа 2006 г. 21:08:16
Msg-id	873bcgxbv3.fsf@stark.xeocode.com обсуждение исходный текст
Ответ на	Re: Hash indexes (Hannu Krosing <hannu@skype.net>)
Список	pgsql-hackers

Дерево обсуждения

Hannu Krosing <hannu@skype.net> writes:

> Ãhel kenal pÃ¤eval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan:
> > Gregory Stark wrote:
> > > 
> > > I looked a while back and was suspicious about the actual hash functions too.
> > > It seemed like a lot of them were vastly suboptimal. That would mean we're
> > > often dealing with mostly empty and mostly full buckets instead of well
> > > distributed hash tables.
> > >
> > >
> > >   
> > 
> > This is now sounding like a lot of low hanging fruit ... highly 
> > performant hash indexed tables could possibly be a very big win.
> > 
> 
> Are you sure about the badness of our hash functions ?
> 
> I just tested and hashtext(text) has about 1.4% of collisions on about
> 120M distinct texts, which is not bad considering thet total space for
> hashes is 4G, meaning that 120M covers itself already 3% of possible
> hash space.

I don't recall exactly what triggered my suspicions, but I think it had more
to do with things like how int4 and int8 were mapped to hash codes and what
the hash function looks like that compresses the hash codes into the actual
bucket. IIRC it's just the low order bits for int8 and it's the actual int4
which then only takes the low order bits. I seem to recall wondering what
would happen if you had, say, a list of /24 network addresses for example.

I didn't actually do any tests though so it's quite possible I missed
something that mitigated the issues I feared.

-- 
greg

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hash indexes