Re: index prefetching
От | Tomas Vondra |
---|---|
Тема | Re: index prefetching |
Дата | |
Msg-id | dfb34cd5-9e99-41aa-b76f-15d449fbd3d2@vondra.me обсуждение исходный текст |
Ответ на | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: index prefetching
|
Список | pgsql-hackers |
On 8/13/25 23:57, Peter Geoghegan wrote: > On Wed, Aug 13, 2025 at 5:19 PM Tomas Vondra <tomas@vondra.me> wrote: >> It's also not very surprising this happens with backwards scans more. >> The I/O is apparently much slower (due to missing OS prefetch), so we're >> much more likely to hit the I/O limits (max_ios and various other limits >> in read_stream_start_pending_read). > > But there's no OS prefetch with direct I/O. At most, there might be > some kind of readahead implemented in the SSD's firmware. > Good point, I keep forgetting direct I/O means no OS read-ahead. Not sure if there's a good way to determine if the SSD can do something like that (and how well). I wonder if there's a way to do backward sequential scans in fio .. > Even assuming that the SSD issue is relevant, I can't help but suspect > that something is off here. To recap from yesterday, the forwards scan > showed "I/O Timings: shared read=45.313" and "Execution Time: 330.379 > ms" on my system, while the equivalent backwards scan showed "I/O > Timings: shared read=194.774" and "Execution Time: 1236.655 ms". Does > that kind of disparity *really* make sense with a modern NVME SSD such > as this (I use a Samsung 980 pro), in the context of a scan that can > use aggressive prefetching? Are we really, truly operating at the > limits of what is possible with this hardware, for this backwards > scan? > Hard to say. Would be interesting to get some numbers using fio. I'll try to do that for my devices. The timings I see on my ryzen (which has a RAID0 with 4 samsung 990 pro), I see these stats: 1) Q1 ASC Buffers: shared hit=4545 read=52801 I/O Timings: shared read=127.700 Execution Time: 432.266 ms 2) Q1 DESC Buffers: shared hit=7406 read=52801 I/O Timings: shared read=306.676 Execution Time: 769.246 ms 3) Q2 ASC Buffers: shared hit=32605 read=52801 I/O Timings: shared read=127.610 Execution Time: 1047.333 ms 4) Q2 DESC Buffers: shared hit=36105 read=52801 I/O Timings: shared read=157.667 Execution Time: 1140.286 ms Those timings are much better (more stable) that the numbers I shared yesterday (that was from my laptop). All of this is with direct I/O and 12 workers. > What if I use a ramdisk for this? That'll be much faster, no matter > the scan order. Should I expect this step to make the effect with > duplicates being produced by read_stream_look_ahead to just go away, > regardless of the scan direction in use? > How's that different from just running with buffered I/O and not dropping the page cache? regards -- Tomas Vondra
В списке pgsql-hackers по дате отправления: