Re: Hash index todo list item

Поиск

Список

Период

Сортировка

От	Kenneth Marshall
Тема	Re: Hash index todo list item
Дата	25 сентября 2007 г. 13:26:14
Msg-id	20070925162604.GO14440@it.is.rice.edu обсуждение
Ответ на	Re: Hash index todo list item (Gregory Stark <stark@enterprisedb.com>)
Ответы	Re: Hash index todo list item Re: Hash index todo list item
Список	pgsql-hackers

Дерево обсуждения

On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
> "Kenneth Marshall" <ktm@rice.edu> writes:
> 
> > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> >
> >> Using our implementation, build times and index sizes are
> >> comparable with btree index build times and index sizes.
> >...
> > That is super! (and timely)
> 
> It is pretty super. I have a few comments to raise but don't take it to be too
> negative, it sounds like this is a big step forward towards making hash
> indexes valuable.
> 
> Firstly in the graphs it seems the index size graph has an exponential x-axis
> but a linear y-axis. This makes it hard to see what I would expect to be
> pretty much linear growth. The curves look exponential which would mean linear
> growth but of course it's hard to tell.
> 
> Also, the growth in the time chart looks pretty much linear. That seems weird
> since I would expect there would be a visible extra component since sort times
> are n-log(n). Perhaps you need to test still larger tables to see that though.
> 
> In any case it's clear from the results you have there that the change is a
> positive one and fixes a fundamental problem with the hash index build code.
> 
> Something else you should perhaps test is indexing something which is
> substantially larger than hash function output. A hash function is going to
> make the most sense when indexing something like strings for which you want to
> avoid the long comparison costs. Especially consider testing this on a UTF8
> locale with expensive comparisons (like a CJK locale for example).
> 
> Note that the bottom line for the problems with hash indexes is that the
> current implementation doesn't offer any advantages over btree indexes. Hash
> indexes need to be not only as good of a choice as btree indexes but
> significantly better a significantly better choice at least for some specific
> circumstances.
> 
> Also, if you're going to submit a patch you should check out a copy of the CVS
> HEAD and work from that. I don't think there are huge differences in the area
> of hash indexes though. But in most other areas you would be spending quite a
> bit of time dealing details which have changed since.
> 
> Finally note that we're in the final throes of the 8.3 feature freeze.
> Normally any patch submitted now would be held until 8.3 is released and
> development on 8.4 is started. I could imagine an exception being made for
> hash indexes since they're so moribund currently but probably not. The flip
> side of that argument is that there's not much point in making an exception
> for something which will only be really useful once further work is done in
> the same area.
> 

Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.

Regards,
Ken

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hash index todo list item