Re: Add a berief general comment on BTScanInsertData's nextkey and backward
| От | Yugo Nagata |
|---|---|
| Тема | Re: Add a berief general comment on BTScanInsertData's nextkey and backward |
| Дата | |
| Msg-id | 20260324192537.089a69fd6729326e6276441f@sraoss.co.jp обсуждение исходный текст |
| Ответ на | Re: Add a berief general comment on BTScanInsertData's nextkey and backward (Yugo Nagata <nagata@sraoss.co.jp>) |
| Список | pgsql-hackers |
Hi Peter, On Wed, 31 Dec 2025 19:21:24 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > > I wonder if we also do something about these existing _bt_binsrch comments: > > > > * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber > > * of the last key < given scankey, or last key <= given scankey if nextkey > > * is true. (Since _bt_compare treats the first data key of such a page as > > * minus infinity, there will be at least one key < scankey, so the result > > * always points at one of the keys on the page.) > > > > Here we describe what happens on an internal page. This is correct, > > but *seems* to contradict the higher level comments at the end of > > _bt_first. There is actually no contradiction between the two -- not > > when you understand that _bt_first describes the whole end-to-end > > process of calling _bt_search and then calling _bt_binsrch on the leaf > > page (not of calling _bt_binsrch once, against an internal page). > > > > One could think of this _bt_binsrch internal page behavior as > > compensating for the fact that internal pages have pivot tuples that > > consist of a separator key (except for the first key, which is just > > has a -inf key/no key), plus a downlink that points to the *next* page > > after that separator key one level down (except for the internal page > > high key, which has no downlink at all). Might be useful to say > > something like that instead. > > > > This is hard to explain without an example. Right now, an internal > > page might have pivot tuples that look like this: > > > > Separator: -inf, Downlink: 1 > > Separator: 'a', Downlink: 2 > > Separator: 'c', Downlink: 3 > > Separator: 'f', Downlink: (none, this is the high key) > > > > But _bt_binsrch "pretends" that our internal page actually contains: > > > > Downlink: 1 > > Separator: 'a' > > Downlink: 2 > > Separator: 'c' > > Downlink: 3 > > Separator: 'f' > > > > So if our = scan key is (say) 'c', we should descend using the > > downlink that points to block 2 (not the one that points to block 3, > > as might be expected from looking at the real physical representation > > consisting of pivot tuples). That is what the scan needs to do to get > > to the page one level down whose high key is also 'c' -- that's where > > the scan needs to look. (If the next level down is the leaf level, > > then the leaf page we descend to might also contain a *non*-pivot > > tuple with the key value 'c', "right before" the high key with 'c', > > which the scan will need to return in _bt_readpage. Lehman & Yao allow > > the key before a leaf page's high key to be equal to the high key, but > > otherwise forbid all duplicate keys.) > > > > I find it very hard to know what explanation will be the least > > confusing to other people, at least in this area. Since you're > > interested in this area, I thought I'd ask what you think. > > I do not see any contradiction between the comment on _bt_binsrch and > the comments at the end of _bt_first. The former explicitly states that > it refers to internal (non-leaf) pages, and I understand the latter to > describe loading data from a leaf page. > > That said, we could make this clearer by explicitly mentioning the leaf > page in the first sentence, for example: > > * Now load data from the first leaf page of the scan (usually the > *page currently in so->currPos.buf). I've added a follow-up patch to clarify the _bt_binsrch comments a bit more, distinguishing internal and leaf page behavior. Does this help reduce your concern? Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
В списке pgsql-hackers по дате отправления: