Обсуждение: Removing BTScanPosUnpinIfPinned idiom from nbtree, simplifying mark/restore support

Поиск
Список
Период
Сортировка
As already discussed on another nearby thread [1], I think that we
should get rid of nbtree's BTScanPosUnpinIfPinned idiom (which
originated in 2015 commit 2ed5b87f), since it is rather brittle.
Recent experience from today's bugfix commit 7c319f54 made that pretty
clear.

Attached patch gets rid of nbtree's use of BTScanPosUnpinIfPinned.
Rather than deciding whether or not to unpin a saved position's buffer
on the basis of it having a buffer set at all (at any point where we
invalidate the buffer), the patch has us condition everything on the
scan-level so->dropPin field. This so->dropPin field was added by my
recent commit e6eed40e; you can think of this patch as finishing off
the work started in that commit.

I'll add this patch to the Postgres 19 cycle's first commitfest.

More on the underlying motivation:

Removing BTScanPosUnpinIfPinned allows me to significantly simplify
the management of buffer pins with mark/restore. The patch also gets
rid of all of the calls to IncrBufferRefCount made from
nbtree, since it's no longer necessary to support a so->markPos
representation of a mark that needs its own pin, independent of the
pin held by/for so->currPos (if so->markPos needs its own pin it'll be
because it has its own page).

In other words, with the patch, there is exactly one way of
representing that there's a mark held on the current page. We still
need multiple pins (one on currPos, another on markPos) during
!so->dropPin scans, but now that only happens when they are on
different pages. In other other words, the only valid representation
of a mark held on the page currently in so->currPos is the
"so->markItemIndex" field (we're now consistent about that).

I also believe that this simplified approach to holding buffer pins in
mark/restore will facilitate better nbtree designs. Having "exactly
one way of representing that there's a mark held on the current page"
helps with designs that do more heap prefetching by reading multiple
leaf pages, before earlier pages have performed all of their required
heap page accesses.

I'm thinking of new designs that maintain multiple leaf page
details/returned items within one big currPos-like position (the
current so->currPos just maintains a single leaf page). The main thing
that makes that difficult right now is the complex rules around buffer
pins, particularly with mark/restore, which is generally a very hard
code path to test.

[1] https://postgr.es/m/CAH2-Wzm3a8i31aBXi6oQt9uG7m1-xpX-MXjMMYoDJH=sBj1jnA@mail.gmail.com
-- 
Peter Geoghegan

Вложения

Re: Removing BTScanPosUnpinIfPinned idiom from nbtree, simplifying mark/restore support

От
Peter Geoghegan
Дата:
On Wed, Jun 11, 2025 at 1:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Removing BTScanPosUnpinIfPinned allows me to significantly simplify
> the management of buffer pins with mark/restore. The patch also gets
> rid of all of the calls to IncrBufferRefCount made from
> nbtree, since it's no longer necessary to support a so->markPos
> representation of a mark that needs its own pin, independent of the
> pin held by/for so->currPos (if so->markPos needs its own pin it'll be
> because it has its own page).

I'm going to withdraw this patch.

The rationale for the patch still makes perfect sense. However, the
patch has now been superseded by the ongoing work on adding a new
amgetbatch interface. That other work (which is tied up in work on I/O
prefetching for index scans) does everything that this patch does as a
stepping stone to allowing nbtree index scans to return batches
(so->currPos style positions that contain at least one matching item
to return to the scan) that are controlled by the caller/the table AM
that manages the index scan.

Independently implementing these changes without any of the much more
extensive changes required for amgetbatch was a useful exercise. But
there's no reason to keep the patch in the next CF anymore.

--
Peter Geoghegan