Re: Hash indexes

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Hash indexes
Дата
Msg-id 873bcgxbv3.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: Hash indexes  (Hannu Krosing <hannu@skype.net>)
Список pgsql-hackers
Hannu Krosing <hannu@skype.net> writes:

> Ühel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan:
> > Gregory Stark wrote:
> > > 
> > > I looked a while back and was suspicious about the actual hash functions too.
> > > It seemed like a lot of them were vastly suboptimal. That would mean we're
> > > often dealing with mostly empty and mostly full buckets instead of well
> > > distributed hash tables.
> > >
> > >
> > >   
> > 
> > This is now sounding like a lot of low hanging fruit ... highly 
> > performant hash indexed tables could possibly be a very big win.
> > 
> 
> Are you sure about the badness of our hash functions ?
> 
> I just tested and hashtext(text) has about 1.4% of collisions on about
> 120M distinct texts, which is not bad considering thet total space for
> hashes is 4G, meaning that 120M covers itself already 3% of possible
> hash space.

I don't recall exactly what triggered my suspicions, but I think it had more
to do with things like how int4 and int8 were mapped to hash codes and what
the hash function looks like that compresses the hash codes into the actual
bucket. IIRC it's just the low order bits for int8 and it's the actual int4
which then only takes the low order bits. I seem to recall wondering what
would happen if you had, say, a list of /24 network addresses for example.

I didn't actually do any tests though so it's quite possible I missed
something that mitigated the issues I feared.


-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [PATCHES] Replication Documentation
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: [PATCHES] Replication Documentation