Re: Hash index todo list item
От | Kenneth Marshall |
---|---|
Тема | Re: Hash index todo list item |
Дата | |
Msg-id | 20070925162604.GO14440@it.is.rice.edu обсуждение исходный текст |
Ответ на | Re: Hash index todo list item (Gregory Stark <stark@enterprisedb.com>) |
Ответы |
Re: Hash index todo list item
Re: Hash index todo list item |
Список | pgsql-hackers |
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: > "Kenneth Marshall" <ktm@rice.edu> writes: > > > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: > > > >> Using our implementation, build times and index sizes are > >> comparable with btree index build times and index sizes. > >... > > That is super! (and timely) > > It is pretty super. I have a few comments to raise but don't take it to be too > negative, it sounds like this is a big step forward towards making hash > indexes valuable. > > Firstly in the graphs it seems the index size graph has an exponential x-axis > but a linear y-axis. This makes it hard to see what I would expect to be > pretty much linear growth. The curves look exponential which would mean linear > growth but of course it's hard to tell. > > Also, the growth in the time chart looks pretty much linear. That seems weird > since I would expect there would be a visible extra component since sort times > are n-log(n). Perhaps you need to test still larger tables to see that though. > > In any case it's clear from the results you have there that the change is a > positive one and fixes a fundamental problem with the hash index build code. > > Something else you should perhaps test is indexing something which is > substantially larger than hash function output. A hash function is going to > make the most sense when indexing something like strings for which you want to > avoid the long comparison costs. Especially consider testing this on a UTF8 > locale with expensive comparisons (like a CJK locale for example). > > Note that the bottom line for the problems with hash indexes is that the > current implementation doesn't offer any advantages over btree indexes. Hash > indexes need to be not only as good of a choice as btree indexes but > significantly better a significantly better choice at least for some specific > circumstances. > > Also, if you're going to submit a patch you should check out a copy of the CVS > HEAD and work from that. I don't think there are huge differences in the area > of hash indexes though. But in most other areas you would be spending quite a > bit of time dealing details which have changed since. > > Finally note that we're in the final throes of the 8.3 feature freeze. > Normally any patch submitted now would be held until 8.3 is released and > development on 8.4 is started. I could imagine an exception being made for > hash indexes since they're so moribund currently but probably not. The flip > side of that argument is that there's not much point in making an exception > for something which will only be really useful once further work is done in > the same area. > Although I am very excited about this patch, I do not see any real value in including it in 8.3. As you mention, we need to to have a hash index implementation that outperforms btree in some problem regime and that is currently not the case. I have just recently started the process of gathering ideas and having discussions on various approaches to making hash indexes more performant and we have a number of ideas on which to start our investigation. I do think that this patch will make the testing and evaluation, that will be needed to truly improve the hash index, much much easier. Regards, Ken
В списке pgsql-hackers по дате отправления: