Re: Hash index todo list item

Поиск
Список
Период
Сортировка
От Kenneth Marshall
Тема Re: Hash index todo list item
Дата
Msg-id 20071018004331.GI27320@it.is.rice.edu
обсуждение исходный текст
Ответ на Hash index todo list item  (Kenneth Marshall <ktm@rice.edu>)
Ответы Re: Hash index todo list item  (Tom Raney <twraney@comcast.net>)
Список pgsql-hackers
Tom,

That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:

maxtuples - Not really the maximum, but a target value to use for setting up the initial buckets. This would allow you
toset it for data loads and avoid the "split-n-copy" trauma that you are trying to avoid with your new hash build
process.

multiplicity - Try to capture use cases that would require many overflow pages. In particular, if we discard the normal
indexpage layout we can skip the space overhead of the page pointer and generate a more compact index. Then you could
usea few more hash bits to lookup the index entry in the page. How many bits would be determined by this factor. 8-bits
wouldgive you 256 sub-pieces that could each hold about 3 entries using the current 4-byte hash, or 2 entries using an
8-bytehash.
 

What do you think?

Cheers,
Ken

On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
> Kenneth,
>
> Great!
>
> Yes, we did update the code to use the estimate.  I will post the patch 
> with this update.  We only saw a very small difference in index build time, 
> but you may when you add many columns to the base relation. 
> With 1 billion tuples, you should start to see the hash index outperform 
> the btree for some equality probes, I would imagine.  With a 90% fill 
> factor, the btree would require 4 levels to index that many tuples.  If the 
> top two were in memory, there would be 3 IOs needed.  I don't think PG 
> supports index only scans, so it will take the extra IO to probe the base 
> relation.  The hash may take up to 2 IOs and maybe even less (or maybe more 
> depending on how many overflow buckets there are).  It might be interesting 
> to fiddle with the fill factors of each index - hash pages (buckets) 
> default to 75% full. 
> -Tom
>> Tom,
>>
>> I am getting ready to stitch in an updated, simplified version
>> of Neil Conway's hash-only hash index patch. Did you have a
>> chance to update your sizing function to use the planner-like
>> estimate and not a full table scan? I would like to be able
>> to use that when my test table start to have 10^9 entries.
>> If you have not had a chance, I will try and add it myself.
>>
>> Regards,
>> Ken
>>
>>   
>
>


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: tsearch2api project
Следующее
От: Tom Lane
Дата:
Сообщение: ts_rewrite aggregate API seems mighty ugly