Re: Hash Indexes

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Hash Indexes
Дата	4 ноября 2016 г. 22:41:21
Msg-id	CAA4eK1+k-ZA_7zrSvxLDk+pUVKnm7CVhU+4OWao62BA16-w3RA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Hash Indexes (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Nov 4, 2016 at 9:37 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Nov 1, 2016 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> [ new patches ]
>>
>> I looked over parts of this today, mostly the hashinsert.c changes.
>
> Some more review...
>
> @@ -656,6 +678,10 @@ _hash_squeezebucket(Relation rel,
>              IndexTuple  itup;
>              Size        itemsz;
>
> +            /* skip dead tuples */
> +            if (ItemIdIsDead(PageGetItemId(rpage, roffnum)))
> +                continue;
>
> Is this an optimization independent of the rest of the patch, or is
> there something in this patch that necessitates it?
>

This specific case is independent of rest of patch, but the same
optimization is used in function _hash_splitbucket_guts() which is
mandatory, because otherwise it will make a copy of that tuple without
copying dead flag.

>  i.e. Could this
> small change be committed independently?

Both the places _hash_squeezebucket() and  _hash_splitbucket can use
this optimization irrespective of rest of the patch.  I will prepare a
separate patch for these and send along with main patch after some
testing.

>  If not, then I think it
> needs a better comment explaining why it is now mandatory.
>
> - *  Caller must hold exclusive lock on the target bucket.  This allows
> + *  Caller must hold cleanup lock on the target bucket.  This allows
>   *  us to safely lock multiple pages in the bucket.
>
> The notion of a lock on a bucket no longer really exists; with this
> patch, we'll now properly speak of a lock on a primary bucket page.
> Also, I think the bit about safely locking multiple pages is bizarre
> -- that's not the issue at all: the problem is that reorganizing a
> bucket might confuse concurrent scans into returning wrong answers.
>
> I've included a broader updating of that comment, and some other
> comment changes, in the attached incremental patch, which also
> refactors your changes to _hash_freeovflpage() a bit to avoid some
> code duplication.  Please consider this for inclusion in your next
> version.
>

Your modifications looks good to me, so will include it in next version.

> In hashutil.c, I think that _hash_msb() is just a reimplementation of
> fls(), which you can rely on being present because we have our own
> implementation in src/port.  It's quite similar to yours but slightly
> shorter.  :-)   Also, some systems have a builtin fls() function which
> actually optimizes down to a single machine instruction, and which is
> therefore much faster than either version.
>

Agreed, will change as per suggestion.

> I don't like the fact that _hash_get_newblk() and _hash_get_oldblk()
> are working out the bucket number by using the HashOpaque structure
> within the bucket page they're examining.  First, it seems weird to
> pass the whole structure when you only need the bucket number out of
> it.  More importantly, the caller really ought to know what bucket
> they care about without having to discover it.
>
> For example, in _hash_doinsert(), we figure out the bucket into which
> we need to insert, and we store that in a variable called "bucket".
> Then from there we work out the primary bucket page's block number,
> which we store in "blkno".  We read the page into "buf" and put a
> pointer to that buffer's contents into "page" from which we discover
> the HashOpaque, a pointer to which we store into "pageopaque".  Then
> we pass that to _hash_get_newblk() which will go look into that
> structure to find the bucket number ... but couldn't we have just
> passed "bucket" instead?
>

Yes, it can be done.  However, note that pageopaque is not only
retrieved for passing to _hash_get_newblk(), rather it is used in
below code as well, so we can't remove that.

>  Similarly, _hash_expandtable() has the value
> available in the variable "old_bucket".
>
> The only caller of _hash_get_oldblk() is _hash_first(), which has the
> bucket number available in a variable called "bucket".
>
> So it seems to me that these functions could be simplified to take the
> bucket number as an argument directly instead of the HashOpaque.
>

Okay, I agree that it is better to use bucket number in both the
API's, so will change it accordingly.

> Generally, this pattern recurs throughout the patch.  Every time you
> use the data in the page to figure something out which the caller
> already knew, you're introducing a risk of bugs: what if the answers
> don't match?   I think you should try to root out as much of that from
> this code as you can.
>

Okay, I will review the patch once with this angle and see if I can improve it.

> As you may be able to tell, I'm working my way into this patch
> gradually, starting with peripheral parts and working toward the core
> of it.  Generally, I think it's in pretty good shape, but I still have
> quite a bit left to study.
>

Thanks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Thomas Munro
Дата: 04 ноября 2016 г., 22:36:29
Сообщение: Re: Gather Merge

Следующее

От: Thomas Munro
Дата: 04 ноября 2016 г., 22:42:54
Сообщение: Re: Gather Merge

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hash Indexes

Предыдущее

Следующее