Re: Hash Indexes
От | Amit Kapila |
---|---|
Тема | Re: Hash Indexes |
Дата | |
Msg-id | CAA4eK1+k-ZA_7zrSvxLDk+pUVKnm7CVhU+4OWao62BA16-w3RA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Hash Indexes (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Fri, Nov 4, 2016 at 9:37 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Nov 1, 2016 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> [ new patches ] >> >> I looked over parts of this today, mostly the hashinsert.c changes. > > Some more review... > > @@ -656,6 +678,10 @@ _hash_squeezebucket(Relation rel, > IndexTuple itup; > Size itemsz; > > + /* skip dead tuples */ > + if (ItemIdIsDead(PageGetItemId(rpage, roffnum))) > + continue; > > Is this an optimization independent of the rest of the patch, or is > there something in this patch that necessitates it? > This specific case is independent of rest of patch, but the same optimization is used in function _hash_splitbucket_guts() which is mandatory, because otherwise it will make a copy of that tuple without copying dead flag. > i.e. Could this > small change be committed independently? Both the places _hash_squeezebucket() and _hash_splitbucket can use this optimization irrespective of rest of the patch. I will prepare a separate patch for these and send along with main patch after some testing. > If not, then I think it > needs a better comment explaining why it is now mandatory. > > - * Caller must hold exclusive lock on the target bucket. This allows > + * Caller must hold cleanup lock on the target bucket. This allows > * us to safely lock multiple pages in the bucket. > > The notion of a lock on a bucket no longer really exists; with this > patch, we'll now properly speak of a lock on a primary bucket page. > Also, I think the bit about safely locking multiple pages is bizarre > -- that's not the issue at all: the problem is that reorganizing a > bucket might confuse concurrent scans into returning wrong answers. > > I've included a broader updating of that comment, and some other > comment changes, in the attached incremental patch, which also > refactors your changes to _hash_freeovflpage() a bit to avoid some > code duplication. Please consider this for inclusion in your next > version. > Your modifications looks good to me, so will include it in next version. > In hashutil.c, I think that _hash_msb() is just a reimplementation of > fls(), which you can rely on being present because we have our own > implementation in src/port. It's quite similar to yours but slightly > shorter. :-) Also, some systems have a builtin fls() function which > actually optimizes down to a single machine instruction, and which is > therefore much faster than either version. > Agreed, will change as per suggestion. > I don't like the fact that _hash_get_newblk() and _hash_get_oldblk() > are working out the bucket number by using the HashOpaque structure > within the bucket page they're examining. First, it seems weird to > pass the whole structure when you only need the bucket number out of > it. More importantly, the caller really ought to know what bucket > they care about without having to discover it. > > For example, in _hash_doinsert(), we figure out the bucket into which > we need to insert, and we store that in a variable called "bucket". > Then from there we work out the primary bucket page's block number, > which we store in "blkno". We read the page into "buf" and put a > pointer to that buffer's contents into "page" from which we discover > the HashOpaque, a pointer to which we store into "pageopaque". Then > we pass that to _hash_get_newblk() which will go look into that > structure to find the bucket number ... but couldn't we have just > passed "bucket" instead? > Yes, it can be done. However, note that pageopaque is not only retrieved for passing to _hash_get_newblk(), rather it is used in below code as well, so we can't remove that. > Similarly, _hash_expandtable() has the value > available in the variable "old_bucket". > > The only caller of _hash_get_oldblk() is _hash_first(), which has the > bucket number available in a variable called "bucket". > > So it seems to me that these functions could be simplified to take the > bucket number as an argument directly instead of the HashOpaque. > Okay, I agree that it is better to use bucket number in both the API's, so will change it accordingly. > Generally, this pattern recurs throughout the patch. Every time you > use the data in the page to figure something out which the caller > already knew, you're introducing a risk of bugs: what if the answers > don't match? I think you should try to root out as much of that from > this code as you can. > Okay, I will review the patch once with this angle and see if I can improve it. > As you may be able to tell, I'm working my way into this patch > gradually, starting with peripheral parts and working toward the core > of it. Generally, I think it's in pretty good shape, but I still have > quite a bit left to study. > Thanks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: