Re: index prefetching

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: index prefetching
Дата
Msg-id CAH2-Wz=UL7Zi+a1qtJp8Rp370z4rpOPgvJJfkGSToPuMGpaYFQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: index prefetching  (Tomas Vondra <tomas@vondra.me>)
Список pgsql-hackers
On Sun, Jul 13, 2025 at 5:57 PM Tomas Vondra <tomas@vondra.me> wrote:
> Thank you! I'll take a look next week, but these numbers suggest you
> simplified it a lot..

Right.

I'm still not done removing code from nbtree here. I still haven't
done things like generalize _bt_killitems across all index AMs. That
can largely (though not entirely) work the same way across all index
AMs. Including the stuff about checking LSN/not dropping pins to avoid
blocking VACUUM. It's already totally index-AM-agnostic, even though
the avoid-blocking-vacuum thing happens to be nbtree-only right now.

> Another thing is hardware. I've been testing on local NVMe drives, and
> those don't seem to need very long queues (it's diminishing returns).
> Maybe the results would be different on systems with more I/O latency
> (e.g. because the storage is not local).

That seems likely. Cloud storage with 1ms latency is going to have
very different performance characteristics. The benefit of reading
multiple leaf pages will also only be seen with certain workloads.

Other thing is that leaf pages are typically much denser and more
likely to be cached than heap pages. And, the potential to combine
heap I/Os for TIDs that appear on adjacent index leaf pages seems like
an interesting avenue.

> I don't remember the array key details, I'll need to swap the context
> back in. But I think the thing I've been concerned about the most is the
> coordination of advancing to the next leaf page vs. the next array key
> (and then perhaps having to go back when the scan direction changes).

But we don't require anything like that. That's just not how it works.

The scan can change direction, and the array keys will automatically
be maintained correctly; _bt_advance_array_keys will be called as
needed, taking care of everything. This all happens in a way that code
in nbtree.c and nbtsearch.c knows nothing about (obviously that means
that your patch won't need to, either).

We do need to be careful about the scan direction changing when the
so->needPrimscan flag is set, but that won't affect your
patch/indexam.c, either. It also isn't very complicated; we only have
to be sure to *unset* the flag when we detect a *change* in direction
at the point where we're stepping off a page/pos. We don't need to
modify the array keys themselves at this point --  the next call to
_bt_advance_array_keys will just take care of that for us
automatically (we lean on _bt_advance_array_keys like this in a number
of places).

The only thing in my revised version of your "complex" patch set does
in indexam.c that is in any way related to nbtree arrays is the call
to amrestrpos. But you'd never be able to tell -- since the amrestrpos
call is nothing new. It just so happens that the only reason we still
need the amrestrpos call/the whole entire concept of amrestrpos
(having completely moved mark/restore out of nbtree and into
indexam.c) is so that the index AM (nbtree) gets a signal that we
(indexam.c) are going to restore *some* mark. Because nbtree *will*
need to reset its array keys (if any) at that point. But that's it.

We don't need to tell the index AM any specific details about the
mark, and indexam.c is blissfully unaware of why it is that an index
AM might need this. So it's a total non-issue, from a layering
cleanliness point of view. There is no mutable state involved at *any*
layer.

(FWIW, even when we restore a mark like this, nbtree is still mostly
leaning on _bt_advance_array_keys to advance the array keys properly
later on. If you're interested in why we need the remaining hard reset
of the arrays within amrestrpos/btrestrpos, let me know and I'll
explain.)

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления: