Re: index prefetching
От | Tomas Vondra |
---|---|
Тема | Re: index prefetching |
Дата | |
Msg-id | 0dd33755-cab8-49c8-b1ed-698732577fbb@vondra.me обсуждение исходный текст |
Ответ на | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On 8/26/25 01:48, Andres Freund wrote: > Hi, > > On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote: >> Thanks. Based on the testing so far, the patch seems to be a substantial >> improvement. What's needed to make this prototype committable? > > Mainly some testing infrastructure that can trigger this kind of stream. The > logic is too finnicky for me to commit it without that. > So, what would that look like? The "naive" approach to testing is to simply generate a table/index, producing the right sequence of blocks. That shouldn't be too hard, it'd be enough to have an index that - has ~2-3 rows per value, on different heap pages - the values "overlap", e.g. like this (value,page) (A,1), (A,2), (A,3), (B,2), (B,3), (B,4), ... Another approach would be to test this at C level, sidestepping the query execution entirely. We'd have a "stream generator" that just generates a sequence of blocks of our own choosing (could be hard-coded, some pattern, read from a file ...), and feed it into a read stream. But how would we measure success for these tests? I don't think we want to look at query duration, that's very volatile. > >> I assume this is PG19+ improvement, right? It probably affects PG18 too, >> but it's harder to hit / the impact is not as bad as on PG19. > > Yea. It does apply to 18 too, but I can't come up with realistic scenarios > where it's a real issue. I can repro a slowdown when using many parallel > seqscans with debug_io_direct=data - but that's even slower in 17... > Makes sense. > >> On a related note, my test that generates random datasets / queries, and >> compares index prefetching with different io_method values found a >> pretty massive difference between worker and io_uring. I wonder if this >> might be some issue in io_method=worker. > >> while with index prefetching (with the aio prototype patch), it looks >> like this: >> >> QUERY PLAN >> ---------------------------------------------------------------------- >> Index Scan using idx on t (actual rows=9048576.00 loops=1) >> Index Cond: ((a >= 16150) AND (a <= 4540437)) >> Index Searches: 1 >> Prefetch Distance: 2.032 >> Prefetch Count: 868165 >> Prefetch Stalls: 2140228 >> Prefetch Skips: 6039906 >> Prefetch Resets: 0 >> Stream Ungets: 0 >> Stream Forwarded: 4 >> Prefetch Histogram: [2,4) => 855753, [4,8) => 12412 >> Buffers: shared hit=2577599 read=455610 >> Planning: >> Buffers: shared hit=78 read=26 dirtied=1 >> Planning Time: 1.032 ms >> Execution Time: 3150.578 ms >> (16 rows) >> >> So it's about 2x slower. The prefetch distance collapses, because >> there's a lot of cache hits (about 50% of requests seem to be hits of >> already visited blocks). I think that's a problem with how we adjust the >> distance, but I'll post about that separately. >> >> Let's try to simply set io_method=io_uring: >> >> QUERY PLAN >> ---------------------------------------------------------------------- >> Index Scan using idx on t (actual rows=9048576.00 loops=1) >> Index Cond: ((a >= 16150) AND (a <= 4540437)) >> Index Searches: 1 >> Prefetch Distance: 2.032 >> Prefetch Count: 868165 >> Prefetch Stalls: 2140228 >> Prefetch Skips: 6039906 >> Prefetch Resets: 0 >> Stream Ungets: 0 >> Stream Forwarded: 4 >> Prefetch Histogram: [2,4) => 855753, [4,8) => 12412 >> Buffers: shared hit=2577599 read=455610 >> Planning: >> Buffers: shared hit=78 read=26 >> Planning Time: 2.212 ms >> Execution Time: 1837.615 ms >> (16 rows) >> >> That's much closer to master (and the difference could be mostly noise). >> >> I'm not sure what's causing this, but almost all regressions my script >> is finding look like this - always io_method=worker, with distance close >> to 2.0. Is this some inherent io_method=worker overhead? > > I think what you might be observing might be the inherent IPC / latency > overhead of the worker based approach. This is particularly pronounced if the > workers are idle (and the CPU they get scheduled on is clocked down). The > latency impact of that is small, but if you never actually get to do much > readahead it can be visible. > Yeah, that's quite possible. If I understand the mechanics of this, this can behave in a rather unexpected way - lowering the load (i.e. issuing fewer I/O requests) can make the workers "more idle" and therefore more likely to get suspended ... Is there a good way to measure if this is what's happening, and the impact? For example, it'd be interesting to know how long it took for a submitted process to get picked up by a worker. And % of time a worker spent handling I/O. regards -- Tomas Vondra
В списке pgsql-hackers по дате отправления: