Re: Hash index todo list item

Поиск
Список
Период
Сортировка
От Mark Mielke
Тема Re: Hash index todo list item
Дата
Msg-id 46E5566C.3050106@mark.mielke.cc
обсуждение исходный текст
Ответ на Re: Hash index todo list item  (Kenneth Marshall <ktm@rice.edu>)
Список pgsql-hackers
Kenneth Marshall wrote: <blockquote cite="mid:20070910024258.GC5679@it.is.rice.edu" type="cite"><pre wrap="">On Sun,
Sep02, 2007 at 10:41:22PM -0400, Tom Lane wrote: </pre><blockquote type="cite"><pre wrap="">Kenneth Marshall <a
class="moz-txt-link-rfc2396E"href="mailto:ktm@rice.edu"><ktm@rice.edu></a> writes:   </pre><blockquote
type="cite"><prewrap="">... This is the rough plan. Does anyone see anything critical that
 
is missing at this point?     </pre></blockquote><pre wrap="">Sounds pretty good.  Let me brain-dump one item on you:
onething that
 
hash currently has over btree is the ability to handle index items up
to a full page.  Now, if you go with a scheme that only stores hash
codes and not the underlying data, you can not only handle that but
improve on it; but if you reduce the bucket size and don't remove the
data, it'd be a step backward.  The idea I had about dealing with that
was to only reduce the size of primary buckets --- if it's necessary to
add overflow space to a bucket, the overflow units are still full pages.
So an index tuple up to a page in size can always be accommodated by
adding an overflow page to the bucket.

Just a thought, but AFAIR it's not in the archives anywhere.
        regards, tom lane
   </pre></blockquote><pre wrap="">I was thinking about this some more, and it strikes me that we can
keep the page size = bucket size = overflow size in the new scheme
of storing the hash value in the index and still reduce the effective
bucket size. Let's make the new size the (page size / 2^n) where n
is chosen appropriately. Then we we store the value in the page we
simply use n bits of the hash to determine which sub-piece of the
page to use to actually store the value. We may need to add a multiplier
to adjust the decision to split the page based on the mini-page. This
should allow us to much more densely pack the pages and approach the
1 item per bucket. This could easily shrink the size of the index by
a factor of 2^n. Let me know what you think. </pre></blockquote> My personal opinion is that something like this is
requiredto take best advantage of hashes. I didn't respond immediately because I don't have advice on what the best
implementationwould look like.<br /><br /> I have also been wondering about the effect of a hash index that includes
multiplevalues to the same key (either a non-unique key, or different tuples from the same key due to table updates). I
startedby thinking that the maximum number of hash entries per hash bucket should be kept to 2 or 3 to prevent
reductionin performance to that of btree, but I think this doesn't work if multiple tuples can have the same key.
Unless- the maps is hash value =1:1> index page =1:1> hash bucket =1:N> hash value =1:M=> tuples. Guarantee
thanN is small (either <= 2 or <=4 depending on performance evaluation) by re-hashing if N ever becomes > 2 or
>4. Choose a sparse harsh. Let 1:M be indefinite? Also, optimize the 1:M=1:1 case, such that the value can be
inline?<br/><br /> For most cases, I would think the above model would make it cheap to determine if the key does not
exist,as well as the 1:1 (hash value:key) case requiring a single page lookup. Update performance would suffer, but
lookupshould be faster than btree in these cases, as btree often requires > 1 index page scan.<br /><br />
Cheers,<br/> mark<br /><br /><pre class="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: Include Lists for Text Search
Следующее
От: Tom Lane
Дата:
Сообщение: Re: integrated tsearch doesn't work with non utf8 database