Обсуждение: pgstattuple: Use streaming read API in pgstatindex functions

Поиск
Список
Период
Сортировка

pgstattuple: Use streaming read API in pgstatindex functions

От
Xuneng Zhou
Дата:
Hi hackers,

While reading the code related to streaming reads and their current
use cases, I noticed that pgstatindex could potentially benefit from
adopting the streaming read API. The required change is relatively
simple—similar to what has already been implemented in the pg_warm and
pg_visibility extensions. I also ran some performance tests on an
experimental patch to validate the improvement.

Summary
Cold cache performance (the typical use case for diagnostic tools):
  - Medium indexes (~21MB): 1.21x - 1.79x faster (20-44% speedup)
  - Large indexes (~214MB): 1.50x - 1.90x faster (30-47% speedup)
  - Xlarge indexes (~1351MB):1.4x–1.9x gains. (29–47% speedup)

Hardware: AX162-R from hetzner

Test matrix:
  - Index types: Primary key, timestamp, float, composite (3 columns)
  - Sizes: Medium (1M rows, ~21MB), Large (10M rows, ~214MB), XLarge
(50M rows, ~ 1351MB))
  - Layouts: Unfragmented (sequential) and Fragmented (random insert order)
  - Cache states: Cold (dropped OS cache) and Warm (pg_prewarm)

Xlarge fragmented example:
==> Creating secondary indexes on test_xlarge
     Created 3 secondary indexes: created_at, score, composite
     Created test_xlarge_pkey: 1351 MB
     Fragmentation stats (random insert order):
 leaf_frag_pct | avg_density_pct | leaf_pages |  size
---------------+-----------------+------------+---------
          49.9 |            71.5 |     172272 | 1351 MB
(1 row)

configuration:
  - shared_buffers = 16GB
  - effective_io_concurrency = 500
  - io_combine_limit = 16
  - autovacuum = off
  - checkpoint_timeout = 1h
  - bgwriter_delay = 10000ms (minimize background writes)
  - jit = off
  - max_parallel_workers_per_gather = 0

Unfragmented Indexes (Cold Cache)

Index Type    Size  Baseline  Patched  Speedup
Primary Key    Medium  31.5 ms  19.6 ms  1.58×
Primary Key    Large  184.0 ms  119.0 ms  1.54×
Timestamp     Medium  13.4 ms  10.5 ms  1.28×
Timestamp     Large  85.0 ms  45.6 ms  1.86×
Float (score)   Medium  13.7 ms  11.4 ms  1.21×
Float (score)   Large  84.0 ms  45.0 ms  1.86×
Composite (3 col) Medium  26.7 ms  17.2 ms  1.56×
Composite (3 col) Large  89.8 ms  51.2 ms  1.75×

⸻

Fragmented Indexes (Cold Cache)

To address concerns about filesystem fragmentation, I tested indexes built
with random inserts (ORDER BY random()) to trigger page splits and create
fragmented indexes:

Index Type    Size  Baseline  Patched  Speedup
Primary Key    Medium  41.9 ms  23.5 ms  1.79×
Primary Key    Large  236.0 ms  148.0 ms  1.58×
Primary Key    XLarge  953.4 ms  663.1 ms  1.43×
Timestamp     Medium  32.1 ms  18.8 ms  1.70×
Timestamp     Large  188.0 ms  117.0 ms  1.59×
Timestamp     XLarge  493.0 ms  518.6 ms  0.95×
Float (score)   Medium  14.0 ms  10.9 ms  1.28×
Float (score)   Large  85.8 ms  45.2 ms  1.89×
Float (score)   XLarge  263.2 ms  176.5 ms  1.49×
Composite (3 col) Medium  42.3 ms  24.1 ms  1.75×
Composite (3 col) Large  245.0 ms  162.0 ms  1.51×
Composite (3 col) XLarge  1052.5 ms  716.5 ms  1.46×

Summary: Fragmentation generally does not hurt streaming reads; most
fragmented cases still see 1.4×–1.9× gains. One outlier (XLarge
Timestamp) shows a slight regression (0.95×).

⸻

Warm Cache Results
When indexes are fully cached in shared_buffers:
Unfragmented:  infrequent little regression for small to medium size
index(single digit ms variance, barely noticeable); small gains for
large size index
Fragmented:   infrequent little regression for small to medium size
index(single digit ms variance, barely noticeable); small gains for
large size index

Best,
Xuneng

Вложения

Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Xuneng Zhou
Дата:
Hi,

On Mon, Oct 13, 2025 at 10:07 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi hackers,
>
> While reading the code related to streaming reads and their current
> use cases, I noticed that pgstatindex could potentially benefit from
> adopting the streaming read API. The required change is relatively
> simple—similar to what has already been implemented in the pg_warm and
> pg_visibility extensions. I also ran some performance tests on an
> experimental patch to validate the improvement.
>
> Summary
> Cold cache performance (the typical use case for diagnostic tools):
>   - Medium indexes (~21MB): 1.21x - 1.79x faster (20-44% speedup)
>   - Large indexes (~214MB): 1.50x - 1.90x faster (30-47% speedup)
>   - Xlarge indexes (~1351MB):1.4x–1.9x gains. (29–47% speedup)
>
> Hardware: AX162-R from hetzner
>
> Test matrix:
>   - Index types: Primary key, timestamp, float, composite (3 columns)
>   - Sizes: Medium (1M rows, ~21MB), Large (10M rows, ~214MB), XLarge
> (50M rows, ~ 1351MB))
>   - Layouts: Unfragmented (sequential) and Fragmented (random insert order)
>   - Cache states: Cold (dropped OS cache) and Warm (pg_prewarm)
>
> Xlarge fragmented example:
> ==> Creating secondary indexes on test_xlarge
>      Created 3 secondary indexes: created_at, score, composite
>      Created test_xlarge_pkey: 1351 MB
>      Fragmentation stats (random insert order):
>  leaf_frag_pct | avg_density_pct | leaf_pages |  size
> ---------------+-----------------+------------+---------
>           49.9 |            71.5 |     172272 | 1351 MB
> (1 row)
>
> configuration:
>   - shared_buffers = 16GB
>   - effective_io_concurrency = 500
>   - io_combine_limit = 16
>   - autovacuum = off
>   - checkpoint_timeout = 1h
>   - bgwriter_delay = 10000ms (minimize background writes)
>   - jit = off
>   - max_parallel_workers_per_gather = 0
>
> Unfragmented Indexes (Cold Cache)
>
> Index Type    Size  Baseline  Patched  Speedup
> Primary Key    Medium  31.5 ms  19.6 ms  1.58×
> Primary Key    Large  184.0 ms  119.0 ms  1.54×
> Timestamp     Medium  13.4 ms  10.5 ms  1.28×
> Timestamp     Large  85.0 ms  45.6 ms  1.86×
> Float (score)   Medium  13.7 ms  11.4 ms  1.21×
> Float (score)   Large  84.0 ms  45.0 ms  1.86×
> Composite (3 col) Medium  26.7 ms  17.2 ms  1.56×
> Composite (3 col) Large  89.8 ms  51.2 ms  1.75×
>
> ⸻
>
> Fragmented Indexes (Cold Cache)
>
> To address concerns about filesystem fragmentation, I tested indexes built
> with random inserts (ORDER BY random()) to trigger page splits and create
> fragmented indexes:
>
> Index Type    Size  Baseline  Patched  Speedup
> Primary Key    Medium  41.9 ms  23.5 ms  1.79×
> Primary Key    Large  236.0 ms  148.0 ms  1.58×
> Primary Key    XLarge  953.4 ms  663.1 ms  1.43×
> Timestamp     Medium  32.1 ms  18.8 ms  1.70×
> Timestamp     Large  188.0 ms  117.0 ms  1.59×
> Timestamp     XLarge  493.0 ms  518.6 ms  0.95×
> Float (score)   Medium  14.0 ms  10.9 ms  1.28×
> Float (score)   Large  85.8 ms  45.2 ms  1.89×
> Float (score)   XLarge  263.2 ms  176.5 ms  1.49×
> Composite (3 col) Medium  42.3 ms  24.1 ms  1.75×
> Composite (3 col) Large  245.0 ms  162.0 ms  1.51×
> Composite (3 col) XLarge  1052.5 ms  716.5 ms  1.46×
>
> Summary: Fragmentation generally does not hurt streaming reads; most
> fragmented cases still see 1.4×–1.9× gains. One outlier (XLarge
> Timestamp) shows a slight regression (0.95×).
>
> ⸻
>
> Warm Cache Results
> When indexes are fully cached in shared_buffers:
> Unfragmented:  infrequent little regression for small to medium size
> index(single digit ms variance, barely noticeable); small gains for
> large size index
> Fragmented:   infrequent little regression for small to medium size
> index(single digit ms variance, barely noticeable); small gains for
> large size index
>

Fix indentation issue in v1.

Best,
Xuneng

Вложения

Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Nazir Bilal Yavuz
Дата:
Hi,

Thank you for working on this!

On Mon, 13 Oct 2025 at 06:20, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Fix indentation issue in v1.

I did not look at the benchmarks, so here are my code comments.

- I would avoid creating a new scope for the streaming read. While it
makes the streaming code easier to interpret, it introduces a large
diff due to indentation changes.

- I suggest adding the following comment about streaming batching, as
it is used in other places:

        /*
         * It is safe to use batchmode as block_range_read_stream_cb takes no
         * locks.
         */

- For the '/* Scan all blocks except the metapage using streaming
reads */' comments, it might be helpful to clarify that the 0th page
is the metapage. Something like: '/* Scan all blocks except the
metapage (0th page) using streaming reads */'.

Other than these comments, the code looks good to me.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Xuneng Zhou
Дата:
Hi Bilal,

Thanks for looking into this.

On Mon, Oct 13, 2025 at 3:00 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Mon, 13 Oct 2025 at 06:20, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Fix indentation issue in v1.
>
> I did not look at the benchmarks, so here are my code comments.
>
> - I would avoid creating a new scope for the streaming read. While it
> makes the streaming code easier to interpret, it introduces a large
> diff due to indentation changes.
>
> - I suggest adding the following comment about streaming batching, as
> it is used in other places:
>
>         /*
>          * It is safe to use batchmode as block_range_read_stream_cb takes no
>          * locks.
>          */
>
> - For the '/* Scan all blocks except the metapage using streaming
> reads */' comments, it might be helpful to clarify that the 0th page
> is the metapage. Something like: '/* Scan all blocks except the
> metapage (0th page) using streaming reads */'.
>
> Other than these comments, the code looks good to me.
>

Here is patch v3. The comments have been added, and the extra scope
({}) has been removed as suggested.

Best,
Xuneng

Вложения

Re: pgstattuple: Use streaming read API in pgstatindex functions

От
wenhui qiu
Дата:
Hi Xuneng Zhou
   

> - /* Unlock and release buffer */
> - LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> - ReleaseBuffer(buffer);
> + UnlockReleaseBuffer(buffer);
>  }
 Thanks for your patch! Just to nitpick a bit — I think this comment is worth keeping, even though the function name already conveys its meaning.

On Mon, Oct 13, 2025 at 4:42 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi Bilal,

Thanks for looking into this.

On Mon, Oct 13, 2025 at 3:00 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Mon, 13 Oct 2025 at 06:20, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Fix indentation issue in v1.
>
> I did not look at the benchmarks, so here are my code comments.
>
> - I would avoid creating a new scope for the streaming read. While it
> makes the streaming code easier to interpret, it introduces a large
> diff due to indentation changes.
>
> - I suggest adding the following comment about streaming batching, as
> it is used in other places:
>
>         /*
>          * It is safe to use batchmode as block_range_read_stream_cb takes no
>          * locks.
>          */
>
> - For the '/* Scan all blocks except the metapage using streaming
> reads */' comments, it might be helpful to clarify that the 0th page
> is the metapage. Something like: '/* Scan all blocks except the
> metapage (0th page) using streaming reads */'.
>
> Other than these comments, the code looks good to me.
>

Here is patch v3. The comments have been added, and the extra scope
({}) has been removed as suggested.

Best,
Xuneng

Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Nazir Bilal Yavuz
Дата:
Hi,

On Mon, 13 Oct 2025 at 11:42, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Here is patch v3. The comments have been added, and the extra scope
> ({}) has been removed as suggested.

Thanks, the code looks good to me!

The benchmark results are nice. I have a quick question about the
setup, if you are benchmarking with the default io_method (worker),
then the results might show the combined benefit of async I/O and
streaming I/O, rather than just streaming I/O alone. If that is the
case, I would suggest also running a benchmark with io_method=sync to
isolate the performance impact of streaming I/O. You might get
interesting results.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Xuneng Zhou
Дата:
Hi Wenhui,

Thanks for looking into this.

On Mon, Oct 13, 2025 at 5:41 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>
> Hi Xuneng Zhou
>
>
> > - /* Unlock and release buffer */
> > - LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> > - ReleaseBuffer(buffer);
> > + UnlockReleaseBuffer(buffer);
> >  }
>  Thanks for your patch! Just to nitpick a bit — I think this comment is worth keeping, even though the function name
alreadyconveys its meaning. 
>

It makes sense. I'll keep it.

Best,
Xuneng



Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Xuneng Zhou
Дата:
Hi,

On Mon, Oct 13, 2025 at 5:42 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> On Mon, 13 Oct 2025 at 11:42, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Here is patch v3. The comments have been added, and the extra scope
> > ({}) has been removed as suggested.
>
> Thanks, the code looks good to me!
>
> The benchmark results are nice. I have a quick question about the
> setup, if you are benchmarking with the default io_method (worker),
> then the results might show the combined benefit of async I/O and
> streaming I/O, rather than just streaming I/O alone. If that is the
> case, I would suggest also running a benchmark with io_method=sync to
> isolate the performance impact of streaming I/O. You might get
> interesting results.

The previous benchmark was conducted using the default
io_method=worker. The following results are from a new benchmark run
with io_method=sync. The performance gains from streaming reads are
still present, though smaller than those observed when asynchronous
I/O and streaming I/O are combined — which is expected. Thanks for the
suggestion!

Fragmented Indexes (Cold Cache)

Index Type    Size  Baseline  Patched  Speedup
Primary Key    Medium  43.80 ms  38.61 ms  1.13×
Primary Key    Large  238.24 ms  202.47 ms  1.17×
Primary Key    XLarge  962.90 ms  793.57 ms  1.21×
Timestamp     Medium  33.34 ms  29.98 ms  1.11×
Timestamp     Large  190.41 ms  161.34 ms  1.18×
Timestamp     XLarge  794.52 ms  647.82 ms  1.22×
Float (score)   Medium  14.46 ms  13.51 ms  1.06×
Float (score)   Large  87.38 ms  77.22 ms  1.13×
Float (score)   XLarge  278.47 ms  233.22 ms  1.19×
Composite (3 col) Medium  44.49 ms  40.10 ms  1.10×
Composite (3 col) Large  244.86 ms  211.01 ms  1.16×
Composite (3 col) XLarge  1073.32 ms  872.42 ms  1.23×

________________________________

Fragmented Indexes (Warm Cache)

Index Type    Size  Baseline  Patched  Speedup
Primary Key    Medium  7.91 ms  7.88 ms  1.00×
Primary Key    Large  35.58 ms  36.41 ms  0.97×
Primary Key    XLarge  126.29 ms  126.95 ms  0.99×
Timestamp     Medium  5.14 ms  6.82 ms  0.75×
Timestamp     Large  22.96 ms  29.55 ms  0.77×
Timestamp     XLarge  104.26 ms  106.01 ms  0.98×
Float (score)   Medium  3.76 ms  4.18 ms  0.90×
Float (score)   Large  13.65 ms  13.01 ms  1.04×
Float (score)   XLarge  40.58 ms  41.28 ms  0.98×
Composite (3 col) Medium  8.23 ms  8.25 ms  0.99×
Composite (3 col) Large  37.10 ms  37.59 ms  0.98×
Composite (3 col) XLarge  139.89 ms  138.21 ms  1.01×

Best,
Xuneng

Вложения

Re: pgstattuple: Use streaming read API in pgstatindex functions

От
Xuneng Zhou
Дата:
Hi,

Here’s the updated summary report(cold cache, fragmented index), now including results for the streaming I/O + io_uring configuration.

image.png


Best,
Xuneng

Вложения