[GSoC08]some detail plan of improving hash index
От | Xiao Meng |
---|---|
Тема | [GSoC08]some detail plan of improving hash index |
Дата | |
Msg-id | ded849dd0805151942j5b66d480u3f06aceae2b464af@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: [GSoC08]some detail plan of improving hash index
(Kenneth Marshall <ktm@rice.edu>)
Re: [GSoC08]some detail plan of improving hash index (Josh Berkus <josh@agliodbs.com>) |
Список | pgsql-hackers |
Hi, hackers.<br /><br />I'm reading the source codes of hash and reviewing Neil's old patch of improving hash index.<br />Hereis some detail plan. I'm trying to adjust Neil's patch to the current version of PostgreSQL first. I'm not quite familarwith the code yet, so please make some comment.<br /><br />* Phase 1. Just store hash value instead of hash keys<br/><br />First, define a macro to make it optional.<br /><br />Second, add a new function _hash_form_item to constructIndexTuple with hash code to replace index_form_tuple uaed in hash access method. It seems easy since we did'ntneed to deal with TOAST.<br /><br />Third, modify _hash_checkqual. We can first compare the hash value; if it's thesame, we compare the real key value.<br />Also, HashScanOpaqueData adds an element hashso_sk_hash to hold scan key's hashvalue to support scan function.<br /><br />Finally, it seems the system catalog pg_amop also need to be modified.<br/>In Neil's patch, he set the amopreqcheck to be True. <br />In the documents, it means index hit must be recheckedin the document. But I'm not so clear. Does it just means we need to recheck the value when use _hash_chechqual? <br /><br /><br />* Phase 2. Sort the hash value when insert into the bucket and use binary search whenscan<br />Add a function _hash_binsearch to support binary search in a bucket;<br />It involved in all functions whenwe need to search, insert and delete. <br /><br />* Phase 3. When it's necessary, store the real value.<br />When weinsert a new index tuple , for example tp , to a bucket, we can check whether there's the same hash code.<br />If there'salready an index tuple with such a hash code, we store both the hash code and real key of tp.<br /> Alternatively,we add a flag to represent the tuple stores a real value or just hash code. It seems a little complex.<br /><br/>Phase 1 seems extremely easy. I'm trying to do it first. <br />Additionally, I need a benchmark to test the performance.It seems there's some tools list in <a href="http://wiki.postgresql.org/wiki/Performances_QA_testing">http://wiki.postgresql.org/wiki/Performances_QA_testing</a> .Any advice?<br clear="all" /><br />-- <br />Have a good day;-)<br />Best Regards,<br />Xiao Meng<br /><br />━━━━━━━━━━━━━━━━━━━━━━━<br/>Data and Knowledge Engineering Research Center<br />Harbin Institute of Technology, China<br/>Gtalk: <a href="mailto:mx.cogito@gmail.com" target="_blank">mx.cogito@gmail.com</a><br /> MSN: <a href="mailto:cnEnder@live.com"target="_blank">cnEnder@live.com</a><br /><a href="http://xiaomeng.yo2.cn" target="_blank"></a>
В списке pgsql-hackers по дате отправления: