Re: Add a berief general comment on BTScanInsertData's nextkey and backward

Поиск
Список
Период
Сортировка
От Yugo Nagata
Тема Re: Add a berief general comment on BTScanInsertData's nextkey and backward
Дата
Msg-id 20251231192124.9aeda68f5518d78483bd84e7@sraoss.co.jp
обсуждение исходный текст
Ответ на Re: Add a berief general comment on BTScanInsertData's nextkey and backward  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Tue, 18 Nov 2025 20:15:40 -0500
Peter Geoghegan <pg@bowt.ie> wrote:

> On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <nagata@sraoss.co.jp> wrote:
> > I've attached a patch that adds the following comment:
> >
> > + * nextkey determines how the scankey's boundary is interpreted, and backward
> > + * indicates a backward scan.  See comments in _bt_first for a more detailed
> > + * explanation of these fields.
> >
> > What do think?
> 
> Seems reasonable to me.

Thank you for your review.

> 
> I wonder if we also do something about these existing _bt_binsrch comments:
> 
>  * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber
>  * of the last key < given scankey, or last key <= given scankey if nextkey
>  * is true.  (Since _bt_compare treats the first data key of such a page as
>  * minus infinity, there will be at least one key < scankey, so the result
>  * always points at one of the keys on the page.)
> 
> Here we describe what happens on an internal page. This is correct,
> but *seems* to contradict the higher level comments at the end of
> _bt_first. There is actually no contradiction between the two -- not
> when you understand that _bt_first describes the whole end-to-end
> process of calling _bt_search and then calling _bt_binsrch on the leaf
> page (not of calling _bt_binsrch once, against an internal page).
> 
> One could think of this _bt_binsrch internal page behavior as
> compensating for the fact that internal pages have pivot tuples that
> consist of a separator key (except for the first key, which is just
> has a -inf key/no key), plus a downlink that points to the *next* page
> after that separator key one level down (except for the internal page
> high key, which has no downlink at all). Might be useful to say
> something like that instead.
> 
> This is hard to explain without an example. Right now, an internal
> page might have pivot tuples that look like this:
> 
> Separator: -inf, Downlink: 1
> Separator: 'a', Downlink: 2
> Separator: 'c', Downlink: 3
> Separator: 'f', Downlink: (none, this is the high key)
> 
> But _bt_binsrch "pretends" that our internal page actually contains:
> 
> Downlink: 1
> Separator: 'a'
> Downlink: 2
> Separator: 'c'
> Downlink: 3
> Separator: 'f'
> 
> So if our = scan key is (say) 'c', we should descend using the
> downlink that points to block 2 (not the one that points to block 3,
> as might be expected from looking at the real physical representation
> consisting of pivot tuples). That is what the scan needs to do to get
> to the page one level down whose high key is also 'c' -- that's where
> the scan needs to look. (If the next level down is the leaf level,
> then the leaf page we descend to might also contain a *non*-pivot
> tuple with the key value 'c', "right before" the high key with 'c',
> which the scan will need to return in _bt_readpage. Lehman & Yao allow
> the key before a leaf page's high key to be equal to the high key, but
> otherwise forbid all duplicate keys.)
> 
> I find it very hard to know what explanation will be the least
> confusing to other people, at least in this area. Since you're
> interested in this area, I thought I'd ask what you think.

I do not see any contradiction between the comment on _bt_binsrch and
the comments at the end of _bt_first. The former explicitly states that
it refers to internal (non-leaf) pages, and I understand the latter to
describe loading data from a leaf page.

That said, we could make this clearer by explicitly mentioning the leaf
page in the first sentence, for example:

      * Now load data from the first leaf page of the scan (usually the
      *page currently in so->currPos.buf).

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>



В списке pgsql-hackers по дате отправления: