Re: index prefetching
От | Peter Geoghegan |
---|---|
Тема | Re: index prefetching |
Дата | |
Msg-id | CAH2-WzkgkvbN_GqR+pfE7uKwhWxQ6h4jst7Rpjgrt68Vc1=FDA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: index prefetching
Re: index prefetching |
Список | pgsql-hackers |
On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > If this same mechanism remembered (say) the last 2 heap blocks it > requested, that might be enough to totally fix this particular > problem. This isn't a serious proposal, but it'll be simple enough to > implement. Hopefully when I do that (which I plan to soon) it'll fully > validate your theory. I spoke too soon. It isn't going to be so easy, since heapam_index_fetch_tuple wants to consume buffers as a simple stream. There's no way that index_scan_stream_read_next can just suppress duplicate block number requests (in a way that's more sophisticated than the current trivial approach that stores the very last block number in IndexScanBatchState.lastBlock) without it breaking the whole concept of a stream of buffers. > > We can optimize that by deferring the StartBufferIO() if we're encountering a > > buffer that is undergoing IO, at the cost of some complexity. I'm not sure > > real-world queries will often encounter the pattern of the same block being > > read in by a read stream multiple times in close proximity sufficiently often > > to make that worth it. > > We definitely need to be prepared for duplicate prefetch requests in > the context of index scans. Can you (or anybody else) think of a quick and dirty way of working around the problem on the read stream side? I would like to prioritize getting the patch into a state where its overall performance profile "feels right". From there we can iterate on fixing the underlying issues in more principled ways. FWIW it wouldn't be that hard to require the callback (in our case index_scan_stream_read_next) to explicitly point out that it knows that the block number it's requesting has to be a duplicate. It might make sense to at least place that much of the burden on the callback/client side. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: