Re: patch: avoid heavyweight locking on hash metapage

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: patch: avoid heavyweight locking on hash metapage
Дата
Msg-id CAMkU=1zsjKXLian2OCqX0wt6+VUMuSTcyNDBKB_ZVaFj0SA6xg@mail.gmail.com
обсуждение исходный текст
Ответ на patch: avoid heavyweight locking on hash metapage  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: patch: avoid heavyweight locking on hash metapage  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, May 30, 2012 at 3:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I developed the attached patch to avoid taking a heavyweight lock on
> the metapage of a hash index.  Instead, an exclusive buffer content
> lock is viewed as sufficient permission to modify the metapage, and a
> shared buffer content lock is used when such modifications need to be
> prevented.  For the most part this is a trivial change, because we
> were already taking these locks: we were just taking the heavyweight
> locks in addition.  The only sticking point is that, when we're
> searching or inserting, we previously locked the bucket before
> releasing the heavyweight metapage lock, which is unworkable when
> holding only a buffer content lock because (1) we might deadlock and
> (2) buffer content locks can't be held for long periods of time even
> when there's no deadlock risk.  To fix this, I implemented a simple
> loop-and-retry system: we release the metapage content lock, acquire
> the heavyweight lock on the target bucket, and then reacquire the
> metapage content lock and check that the bucket mapping has not
> changed.   Normally it hasn't, and we're done.  But if by chance it
> has, we simply unlock the metapage, release the heavyweight lock we
> acquired previously, lock the new bucket, and loop around again.  Even
> in the worst case we cannot loop very many times here, since we don't
> split the same bucket again until we've split all the other buckets,
> and 2^N gets big pretty fast.

Do we need the retry flag (applies to two places)?  If oldblkno is
still InvalidBlockNumber then it can't equal blkno.
I think the extra variable might be clearer than the magic value, but
we already have the magic value so do we want to have both a flag
variable and a magic value?

+       if (retry)
+       {
+           if (oldblkno == blkno)
+               break;
+           _hash_droplock(rel, oldblkno, HASH_SHARE);
+       }


In the README, the psuedo codes probably needs to mention re-locking
the meta page in the loop.

Also, "page" is used to mean either the disk page or the shared buffer
currently holding that page, depending on context.  This is confusing.Maybe we should clarify "Lock the meta page
buffer". Of course this 
gripe precedes your patch and applies to other parts of the code as
well, but since we mingle LW locks (on buffers) and heavy locks (on
names of disk pages) it might make sense to be more meticulous here.

"exclusive-lock page 0 (assert the right to begin a split)" is no
longer true, nor is "release X-lock on page 0"


Also in the README, section "To prevent deadlock we enforce these
coding rules:" would need to be changed as those rules are being
changed.  But, should we change them at all?

In _hash_expandtable, the claim "But since we are only trylocking here
it should be OK" doesn't completely satisfy me.  Even a conditional
heavy-lock acquire still takes a transient non-conditional LW Lock on
the lock manager partition.  Unless there is a global rule that no one
can take a buffer content lock while holding a lock manager partition
lock, this seems dangerous.  Could this be redone to follow the
pattern of heavy locking with no content lock, then reacquiring the
buffer content lock to check to make sure we locked the correct
things?  I don't know if it would be better to loop, or just give up,
if the meta page changed underneath us.

Cheers,

Jeff


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Strange behavior with pg_locks and partitioning
Следующее
От: Marko Kreen
Дата:
Сообщение: [patch] libpq one-row-at-a-time API