Re: Stating the significance of Lehman & Yao in the nbtree README

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Stating the significance of Lehman & Yao in the nbtree README
Дата
Msg-id CAM3SWZRYk65rkPGGuwaWP+CBiDUXX8t2bZaUXwuuwiPq5q30Ew@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Stating the significance of Lehman & Yao in the nbtree README  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Stating the significance of Lehman & Yao in the nbtree README  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Stating the significance of Lehman & Yao in the nbtree README  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> There is a mention about the race condition where it needs to move right
> in another caller (_bt_search) of _bt_moveright() as well.
>
> /*
> * Race -- the page we just grabbed may have split since we read its
> * pointer in the parent (or metapage).  If it has, we may need to
> * move right to its new sibling.  Do that.
> ..
>
> Do you think there is more to what is already mentioned on top of second
> caller which we should add or you think if it is true for both, then it
> should
> be on top of _bt_moveright()?

Well, maybe it is justified to mention it multiple times, so as to not
break the reader's train of thought. I'm not sure.

> In general, I agree with you that we should mention about any advantage
> of the algorithm we are using and especially if it is significant.  I think
> it
> will be better if can also mention how that advantage or use is realized
> in our implementation as we are already doing in README of nbtree.

Right. It seems like the nbtree README is very shy about telling us
what the point of all this extra work is. IMV that should be stated
very prominently to aid understanding. A lot of people believe that we
have to do lock coupling/"crabbing" when descending the tree. We do
not. The locks acquired when descending a B-Tree in Postgres are
pretty granular. One read buffer lock is held at a time in the process
of servicing index scans. There are times during the descent of the
tree when no buffer locks are held whatsoever. Moreover, (with some
caveats) it doesn't really matter if a stale view of the page is seen
during a descent (as it happens I've been trying to think of ways to
further take advantage of this). That's pretty cool. If you only know
one thing about how the nbtree code works, that should probably be it.

> The above indicates 2 things:
> a. L & Y doesn't need to hold read locks concurrently.
> b. Advantage of right-links at all levels and "high-key".
>
> As per my understanding, we are not following point (a) in our code,
> so what is the benefit we get by having a reference of same in README?

The major reason that we don't completely avoid read locks, is, I
suppose, the need for self-consistent pages (but also because it would
break page deletion - I'm pretty sure that L&Y don't consider page
deletion, and the page deletion logic is entirely based on the Lanin
and Shasha paper and original research, but I didn't check). I think
that the sentence "Lehman and Yao don't require read locks, but assume
that in-memory copies of tree pages are unshared" is intended to
convey the point on the self-consistency of pages. Of course, Lehman
and Yao must assume that the B-Tree is in some sense in shared memory.
Otherwise, there wouldn't be much point to their elaborate locking
protocol.  :-)

> Isn't it better if we mention how the point (b) is used in our code and
> it's advantage together rather than keeping it at top of README?

Maybe it deserves more prominent mention in the code too.

> Already README mentions in brief about right-link and how it is used
> as below:
> ".. The scan must remember the page's right-link at the time it was scanned,
> since that is the page to move right to; if we move right to the current
> right-link then we'd re-scan any items moved by a page split. ..."

This is talking about how index scans interlock against VACUUM while
going to the heap, by keeping a leaf page pinned (this prevents "super
exclusive lock" acquisition). This is after the tree has been
descended. This is really just a detail (albeit one that follows
similar principles, since pages split right and it similarly exploits
that fact), whereas the use of right links and high keys while
descending the tree, and in particular the fact that the "move right"
L&Y technique obviates the prior need for "lock coupling" is pretty
much the whole point of L&Y.

In more concrete terms, _bt_search() releases and only then acquires
read locks during a descent of the tree (by calling
_bt_relandgetbuf()), and, perhaps counterintuitively, that's just
fine.
-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Stating the significance of Lehman & Yao in the nbtree README