Re: Add a berief general comment on BTScanInsertData's nextkey and backward
| От | Yugo Nagata |
|---|---|
| Тема | Re: Add a berief general comment on BTScanInsertData's nextkey and backward |
| Дата | |
| Msg-id | 20251231192124.9aeda68f5518d78483bd84e7@sraoss.co.jp обсуждение исходный текст |
| Ответ на | Re: Add a berief general comment on BTScanInsertData's nextkey and backward (Peter Geoghegan <pg@bowt.ie>) |
| Список | pgsql-hackers |
On Tue, 18 Nov 2025 20:15:40 -0500
Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > I've attached a patch that adds the following comment:
> >
> > + * nextkey determines how the scankey's boundary is interpreted, and backward
> > + * indicates a backward scan. See comments in _bt_first for a more detailed
> > + * explanation of these fields.
> >
> > What do think?
>
> Seems reasonable to me.
Thank you for your review.
>
> I wonder if we also do something about these existing _bt_binsrch comments:
>
> * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber
> * of the last key < given scankey, or last key <= given scankey if nextkey
> * is true. (Since _bt_compare treats the first data key of such a page as
> * minus infinity, there will be at least one key < scankey, so the result
> * always points at one of the keys on the page.)
>
> Here we describe what happens on an internal page. This is correct,
> but *seems* to contradict the higher level comments at the end of
> _bt_first. There is actually no contradiction between the two -- not
> when you understand that _bt_first describes the whole end-to-end
> process of calling _bt_search and then calling _bt_binsrch on the leaf
> page (not of calling _bt_binsrch once, against an internal page).
>
> One could think of this _bt_binsrch internal page behavior as
> compensating for the fact that internal pages have pivot tuples that
> consist of a separator key (except for the first key, which is just
> has a -inf key/no key), plus a downlink that points to the *next* page
> after that separator key one level down (except for the internal page
> high key, which has no downlink at all). Might be useful to say
> something like that instead.
>
> This is hard to explain without an example. Right now, an internal
> page might have pivot tuples that look like this:
>
> Separator: -inf, Downlink: 1
> Separator: 'a', Downlink: 2
> Separator: 'c', Downlink: 3
> Separator: 'f', Downlink: (none, this is the high key)
>
> But _bt_binsrch "pretends" that our internal page actually contains:
>
> Downlink: 1
> Separator: 'a'
> Downlink: 2
> Separator: 'c'
> Downlink: 3
> Separator: 'f'
>
> So if our = scan key is (say) 'c', we should descend using the
> downlink that points to block 2 (not the one that points to block 3,
> as might be expected from looking at the real physical representation
> consisting of pivot tuples). That is what the scan needs to do to get
> to the page one level down whose high key is also 'c' -- that's where
> the scan needs to look. (If the next level down is the leaf level,
> then the leaf page we descend to might also contain a *non*-pivot
> tuple with the key value 'c', "right before" the high key with 'c',
> which the scan will need to return in _bt_readpage. Lehman & Yao allow
> the key before a leaf page's high key to be equal to the high key, but
> otherwise forbid all duplicate keys.)
>
> I find it very hard to know what explanation will be the least
> confusing to other people, at least in this area. Since you're
> interested in this area, I thought I'd ask what you think.
I do not see any contradiction between the comment on _bt_binsrch and
the comments at the end of _bt_first. The former explicitly states that
it refers to internal (non-leaf) pages, and I understand the latter to
describe loading data from a leaf page.
That said, we could make this clearer by explicitly mentioning the leaf
page in the first sentence, for example:
* Now load data from the first leaf page of the scan (usually the
*page currently in so->currPos.buf).
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
В списке pgsql-hackers по дате отправления: