Обсуждение: B-tree descend for insertion locking

Поиск
Список
Период
Сортировка

B-tree descend for insertion locking

От
Heikki Linnakangas
Дата:
When inserting into a B-tree index, all the pages are read-locked when
descending the tree. When we reach the leaf page, the read-lock is
exchanged for a write-lock.

There's nothing wrong with that, but why don't we just directly grab a
write-lock on the leaf page? When descending, we know the level we're
on, and what level the child page is. The only downside I can see is
that we would unnecessarily hold a write-lock when a read-lock would
suffice, if the page was just split and we have to move right. But that
seems like a really bad bet - hitting the page when it was just split is
highly unlikely.

Grabbing the write-lock directly would obviously save one buffer
unlock+lock sequence, for what it's worth, but I think it would also
slightly simplify the code. Am I missing something?

See attached patch. The new contract of _bt_getroot is a bit weird: it
locks the returned page in the requested lock-mode, shared or exclusive,
if the root page was also the leaf page. Otherwise it's locked in shared
mode regardless off the requested lock mode. But OTOH, the new contract
for _bt_search() is more clear now: it actually locks the returned page
in the requested mode, where it used to only use the access parameter to
decide whether to create a root page if the index was empty.

- Heikki

Вложения

Re: B-tree descend for insertion locking

От
Peter Geoghegan
Дата:
On Tue, Mar 18, 2014 at 4:12 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> See attached patch. The new contract of _bt_getroot is a bit weird: it locks
> the returned page in the requested lock-mode, shared or exclusive, if the
> root page was also the leaf page. Otherwise it's locked in shared mode
> regardless off the requested lock mode. But OTOH, the new contract for
> _bt_search() is more clear now: it actually locks the returned page in the
> requested mode, where it used to only use the access parameter to decide
> whether to create a root page if the index was empty.

I had considered this myself, and am under the impression that the
only reason things work this way is because it makes the interface of
_bt_getroot() clearer. I think what you propose is logical, and you
should pursue it, but fwiw I'm not convinced that you've really
simplified _bt_search(). However, you have kind of made _bt_search()
into something that succinctly represents what is useful about Lehman
and Yao as compared to earlier concurrent locking techniques, and
that's a good thing. I would probably have remarked on that in the
comments if I were you. I certainly would not have left out some
reference to L&Y. You've removed an existing one where the lock
escalation occurs, emphasizing that it's safe, and I'd like to see you
at least compensate for that.


-- 
Peter Geoghegan



Re: B-tree descend for insertion locking

От
Amit Kapila
Дата:
On Tue, Mar 18, 2014 at 4:42 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> When inserting into a B-tree index, all the pages are read-locked when
> descending the tree. When we reach the leaf page, the read-lock is exchanged
> for a write-lock.
>
> There's nothing wrong with that, but why don't we just directly grab a
> write-lock on the leaf page? When descending, we know the level we're on,
> and what level the child page is. The only downside I can see is that we
> would unnecessarily hold a write-lock when a read-lock would suffice, if the
> page was just split and we have to move right. But that seems like a really
> bad bet - hitting the page when it was just split is highly unlikely.

Another case could be when the page is half dead or deleted, but again
chances of same are relatively less.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: B-tree descend for insertion locking

От
Simon Riggs
Дата:
On 18 March 2014 11:12, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> When inserting into a B-tree index, all the pages are read-locked when
> descending the tree. When we reach the leaf page, the read-lock is exchanged
> for a write-lock.
>
> There's nothing wrong with that, but why don't we just directly grab a
> write-lock on the leaf page? When descending, we know the level we're on,
> and what level the child page is. The only downside I can see is that we
> would unnecessarily hold a write-lock when a read-lock would suffice, if the
> page was just split and we have to move right. But that seems like a really
> bad bet - hitting the page when it was just split is highly unlikely.

Sounds good.

Grabbing write lock directly will reduce contention on the buffer, not
just reduce the code path.

If we have a considerable number of duplicates we would normally step
thru until we found a place to insert. Presumably that will happen
with write lock enabled, rather than read lock. Would this slow down
the insertion of highly duplicate keys under concurrent load? i.e. is
this a benefit for nearly-unique but not for other cases?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: B-tree descend for insertion locking

От
Peter Geoghegan
Дата:
On Tue, Mar 18, 2014 at 4:12 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> When inserting into a B-tree index, all the pages are read-locked when
> descending the tree. When we reach the leaf page, the read-lock is exchanged
> for a write-lock.
>
> There's nothing wrong with that, but why don't we just directly grab a
> write-lock on the leaf page?


Whatever happened to this idea?

-- 
Peter Geoghegan