Re: GSoC on WAL-logging hash indexes

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: GSoC on WAL-logging hash indexes
Дата
Msg-id CA+TgmoacpdE5x3epiwtww_jM3SL5gzV-o6XRG6o0q+W7XrfgzQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GSoC on WAL-logging hash indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: GSoC on WAL-logging hash indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: GSoC on WAL-logging hash indexes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
[ I just sent a reply to Greg Stark's email which touches on some of
the same points you mentioned here; that was mostly written last
night, and finished this morning before seeing this.  ]

On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Hmm. You suggested ensuring that a scan always has at least a pin, and split
> takes a vacuum-lock. That ought to work. There's no need for the more
> complicated maneuvers you described, ISTM that you can just replace the
> heavy-weight share lock with holding a pin on the primary page of the
> bucket, and an exclusive lock with a vacuum-lock. Note that
> _hash_expandtable already takes the exclusive lock conditionally, ie. if it
> doesn't get the lock immediately it just gives up. We could do the same with
> the cleanup lock.

We could try that.  I assume you mean do *just* what you describe
here, without the split-in-progress or moved-by-split flags I
suggested.  The only issue I see with that is that instead of everyone
piling up on the heavyweight lock, a wait which is interruptible,
they'd all pile up on the buffer content lwlock, a wait which isn't.
And splitting a bucket can involve an arbitrary number of I/O
operations, so that's kind of unappealing.  Even checkpoints would be
blocked until the bucket split completed, which seems unfortunate.

But I like the idea of teaching each scan to retain a pin on the
primary buffer page.  If we then do the split the way I proposed
before, we can implement the "outwait concurrent scans" step by taking
a cleanup lock on the primary buffer page.  In this design, we only
need to acquire and release the cleanup lock.  Once we get the cleanup
lock on the primary buffer page, even for an instant, we know that
there are no remaining scans in the bucket that need the pre-split
tuples that have now been moved to the new bucket.  We can then remove
tuples with a lesser lock (or keep the stronger lock if we wish to
re-pack).

> Vacuum could also be enhanced. It currently takes an exclusive lock on the
> bucket, then removes any dead tuples and finally "squeezes" the bucket by
> moving tuples to earlier pages. But you only really need the exclusive lock
> for the squeeze-phase. You could do the dead-tuple removal without the
> bucket-lock, and only grab for the squeeze-phase. And the squeezing is
> optional, so you could just skip that if you can't get the lock. But that's
> a separate patch as well.

Yeah, I think this would be a useful improvement.

> One more thing we could do to make hash indexes more scalable, independent
> of the above: Cache the information in the metapage in backend-private
> memory. Then you (usually) wouldn't need to access the metapage at all when
> scanning. Store a copy of the bitmask for that bucket in the primary page,
> and when scanning, check that it matches the cached value. If not, refresh
> the cached metapage and try again.

I don't know whether this would be a useful improvement.

> So, there seems to be a few fairly simple and independent improvements to be
> made:
>
> 1. Replace the heavy-weight lock with pin & vacuum-lock.
> 2. Make it crash-safe, by adding WAL-logging
> 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for
> the squeeze phase.
> 4. Cache the metapage.
>
> We still don't know if it's going to be any better than B-tree after all
> that's done, but the only way to find out is to go ahead and implement it.

I'm of the opinion that we ought to start by making splits and
vacuuming more concurrent, and then do that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: pg_ctl status with nonexistent data directory
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: GSoC on WAL-logging hash indexes