Re: Hash Indexes

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Hash Indexes
Дата
Msg-id CA+TgmoZa4RZXooXL3WUF8NpB+TRT4fh0gsYx7wVJ8k7bW0qQNQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hash Indexes  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Hash Indexes
Re: Hash Indexes
Список pgsql-hackers
On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> [ new patches ]

I looked over parts of this today, mostly the hashinsert.c changes.

+    /*
+     * Copy bucket mapping info now;  The comment in _hash_expandtable where
+     * we copy this information and calls _hash_splitbucket explains why this
+     * is OK.
+     */

So, I went and tried to find the comments to which this comment is
referring and didn't have much luck.  At the point this code is
running, we have a pin but no lock on the metapage, so this is only
safe if changing any of these fields requires a cleanup lock on the
metapage.  If that's true, it seems like you could just make the
comment say that; if it's false, you've got problems.

This code seems rather pointless anyway, the way it's written.  All of
these local variables are used in exactly one place, which is here:

+            _hash_finish_split(rel, metabuf, buf, nbuf, maxbucket,
+                               highmask, lowmask);

But you hold the same locks at the point where you copy those values
into local variables and the point where that code runs.  So if the
code is safe as written, you could instead just pass
metap->hashm_maxbucket, metap->hashm_highmask, and
metap->hashm_lowmask to that function instead of having these local
variables.  Or, for that matter, you could just let that function read
the data itself: it's got metabuf, after all.

+     * In future, if we want to finish the splits during insertion in new
+     * bucket, we must ensure the locking order such that old bucket is locked
+     * before new bucket.

Not if the locks are conditional anyway.

+        nblkno = _hash_get_newblk(rel, pageopaque);

I think this is not a great name for this function.  It's not clear
what "new blocks" refers to, exactly.  I suggest
FIND_SPLIT_BUCKET(metap, bucket) or OLD_BUCKET_TO_NEW_BUCKET(metap,
bucket) returning a new bucket number.  I think that macro can be
defined as something like this: bucket + (1 <<
(fls(metap->hashm_maxbucket) - 1)). Then do nblkno =
BUCKET_TO_BLKNO(metap, newbucket) to get the block number.  That'd all
be considerably simpler than what you have for hash_get_newblk().

Here's some test code I wrote, which seems to work:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <assert.h>

int
newbucket(int bucket, int nbuckets)
{
    assert(bucket < nbuckets);
    return bucket + (1 << (fls(nbuckets) - 1));
}

int
main(int argc, char **argv)
{
    int    nbuckets = 1;
    int restartat = 1;
    int    splitbucket = 0;

    while (splitbucket < 32)
    {
        printf("old bucket %d splits to new bucket %d\n", splitbucket,
               newbucket(splitbucket, nbuckets));
        if (++splitbucket >= restartat)
        {
            splitbucket = 0;
            restartat *= 2;
        }
        ++nbuckets;
    }

    exit(0);
}

Moving on ...

             /*
              * ovfl page exists; go get it.  if it doesn't have room, we'll
-             * find out next pass through the loop test above.
+             * find out next pass through the loop test above.  Retain the
+             * pin, if it is a primary bucket page.
              */
-            _hash_relbuf(rel, buf);
+            if (pageopaque->hasho_flag & LH_BUCKET_PAGE)
+                _hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
+            else
+                _hash_relbuf(rel, buf);

It seems like it would be cheaper, safer, and clearer to test whether
buf != bucket_buf here, rather than examining the page opaque data.
That's what you do down at the bottom of the function when you ensure
that the pin on the primary bucket page gets released, and it seems
like it should work up here, too.

+            bool        retain_pin = false;
+
+            /* page flags must be accessed before releasing lock on a page. */
+            retain_pin = pageopaque->hasho_flag & LH_BUCKET_PAGE;

Similarly.

I have also attached a patch with some suggested comment changes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kouhei Kaigai
Дата:
Сообщение: Re: Proposal: scan key push down to heap [WIP]
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Patch to implement pg_current_logfile() function