Re: index prefetching

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: index prefetching
Дата
Msg-id 0dd33755-cab8-49c8-b1ed-698732577fbb@vondra.me
обсуждение исходный текст
Ответ на Re: index prefetching  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

On 8/26/25 01:48, Andres Freund wrote:
> Hi,
> 
> On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
>> Thanks. Based on the testing so far, the patch seems to be a substantial
>> improvement. What's needed to make this prototype committable?
> 
> Mainly some testing infrastructure that can trigger this kind of stream. The
> logic is too finnicky for me to commit it without that.
> 

So, what would that look like? The "naive" approach to testing is to
simply generate a table/index, producing the right sequence of blocks.
That shouldn't be too hard, it'd be enough to have an index that

- has ~2-3 rows per value, on different heap pages
- the values "overlap", e.g. like this (value,page)

   (A,1), (A,2), (A,3), (B,2), (B,3), (B,4), ...

Another approach would be to test this at C level, sidestepping the
query execution entirely. We'd have a "stream generator" that just
generates a sequence of blocks of our own choosing (could be hard-coded,
some pattern, read from a file ...), and feed it into a read stream.

But how would we measure success for these tests? I don't think we want
to look at query duration, that's very volatile.

> 
>> I assume this is PG19+ improvement, right? It probably affects PG18 too,
>> but it's harder to hit / the impact is not as bad as on PG19.
> 
> Yea. It does apply to 18 too, but I can't come up with realistic scenarios
> where it's a real issue. I can repro a slowdown when using many parallel
> seqscans with debug_io_direct=data - but that's even slower in 17...
> 

Makes sense.

> 
>> On a related note, my test that generates random datasets / queries, and
>> compares index prefetching with different io_method values found a
>> pretty massive difference between worker and io_uring. I wonder if this
>> might be some issue in io_method=worker.
> 
>> while with index prefetching (with the aio prototype patch), it looks
>> like this:
>>
>>                                 QUERY PLAN
>>   ----------------------------------------------------------------------
>>    Index Scan using idx on t (actual rows=9048576.00 loops=1)
>>      Index Cond: ((a >= 16150) AND (a <= 4540437))
>>      Index Searches: 1
>>      Prefetch Distance: 2.032
>>      Prefetch Count: 868165
>>      Prefetch Stalls: 2140228
>>      Prefetch Skips: 6039906
>>      Prefetch Resets: 0
>>      Stream Ungets: 0
>>      Stream Forwarded: 4
>>      Prefetch Histogram: [2,4) => 855753, [4,8) => 12412
>>      Buffers: shared hit=2577599 read=455610
>>    Planning:
>>      Buffers: shared hit=78 read=26 dirtied=1
>>    Planning Time: 1.032 ms
>>    Execution Time: 3150.578 ms
>>   (16 rows)
>>
>> So it's about 2x slower. The prefetch distance collapses, because
>> there's a lot of cache hits (about 50% of requests seem to be hits of
>> already visited blocks). I think that's a problem with how we adjust the
>> distance, but I'll post about that separately.
>>
>> Let's try to simply set io_method=io_uring:
>>
>>                                 QUERY PLAN
>>   ----------------------------------------------------------------------
>>    Index Scan using idx on t  (actual rows=9048576.00 loops=1)
>>      Index Cond: ((a >= 16150) AND (a <= 4540437))
>>      Index Searches: 1
>>      Prefetch Distance: 2.032
>>      Prefetch Count: 868165
>>      Prefetch Stalls: 2140228
>>      Prefetch Skips: 6039906
>>      Prefetch Resets: 0
>>      Stream Ungets: 0
>>      Stream Forwarded: 4
>>      Prefetch Histogram: [2,4) => 855753, [4,8) => 12412
>>      Buffers: shared hit=2577599 read=455610
>>    Planning:
>>      Buffers: shared hit=78 read=26
>>    Planning Time: 2.212 ms
>>    Execution Time: 1837.615 ms
>>   (16 rows)
>>
>> That's much closer to master (and the difference could be mostly noise).
>>
>> I'm not sure what's causing this, but almost all regressions my script
>> is finding look like this - always io_method=worker, with distance close
>> to 2.0. Is this some inherent io_method=worker overhead?
> 
> I think what you might be observing might be the inherent IPC / latency
> overhead of the worker based approach. This is particularly pronounced if the
> workers are idle (and the CPU they get scheduled on is clocked down). The
> latency impact of that is small, but if you never actually get to do much
> readahead it can be visible.
> 

Yeah, that's quite possible. If I understand the mechanics of this, this
can behave in a rather unexpected way - lowering the load (i.e. issuing
fewer I/O requests) can make the workers "more idle" and therefore more
likely to get suspended ...

Is there a good way to measure if this is what's happening, and the
impact? For example, it'd be interesting to know how long it took for a
submitted process to get picked up by a worker. And % of time a worker
spent handling I/O.


regards

-- 
Tomas Vondra




В списке pgsql-hackers по дате отправления: