Обсуждение: Streamify more code paths

Поиск
Список
Период
Сортировка

Streamify more code paths

От
Xuneng Zhou
Дата:
Hi Hackers,

I noticed several additional paths in contrib modules, beyond [1],
that are potentially suitable for streamification:

1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()

The following patches streamify those code paths. No benchmarks have
been run yet.

[1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com

Feedbacks welcome.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Hackers,
>
> I noticed several additional paths in contrib modules, beyond [1],
> that are potentially suitable for streamification:
>
> 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
>
> The following patches streamify those code paths. No benchmarks have
> been run yet.
>
> [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
>
> Feedbacks welcome.
>

One more in ginvacuumcleanup().

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Nazir Bilal Yavuz
Дата:
Hi,

Thank you for working on this!

On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Hackers,
> >
> > I noticed several additional paths in contrib modules, beyond [1],
> > that are potentially suitable for streamification:
> >
> > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> >
> > The following patches streamify those code paths. No benchmarks have
> > been run yet.
> >
> > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> >
> > Feedbacks welcome.
> >
>
> One more in ginvacuumcleanup().

0001, 0002 and 0004 LGTM.

0003:

+        buf = read_stream_next_buffer(stream, NULL);
+        if (buf == InvalidBuffer)
+            break;

I think we are loosening the check here. We were sure that there were
no InvalidBuffers until the nblocks. Streamified version does not have
this check, it exits from the loop the first time it sees an
InvalidBuffer, which may be wrong. You might want to add
'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
have a similar check.

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi Bilal,

Thanks for your review!

On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi Hackers,
> > >
> > > I noticed several additional paths in contrib modules, beyond [1],
> > > that are potentially suitable for streamification:
> > >
> > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> > >
> > > The following patches streamify those code paths. No benchmarks have
> > > been run yet.
> > >
> > > [1]
https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> > >
> > > Feedbacks welcome.
> > >
> >
> > One more in ginvacuumcleanup().
>
> 0001, 0002 and 0004 LGTM.
>
> 0003:
>
> +        buf = read_stream_next_buffer(stream, NULL);
> +        if (buf == InvalidBuffer)
> +            break;
>
> I think we are loosening the check here. We were sure that there were
> no InvalidBuffers until the nblocks. Streamified version does not have
> this check, it exits from the loop the first time it sees an
> InvalidBuffer, which may be wrong. You might want to add
> 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
> have a similar check.
>

Agree. The check has been added in v2 per your suggestion.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Sat, Dec 27, 2025 at 12:41 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Bilal,
>
> Thanks for your review!
>
> On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Hi,
> >
> > Thank you for working on this!
> >
> > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi Hackers,
> > > >
> > > > I noticed several additional paths in contrib modules, beyond [1],
> > > > that are potentially suitable for streamification:
> > > >
> > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> > > >
> > > > The following patches streamify those code paths. No benchmarks have
> > > > been run yet.
> > > >
> > > > [1]
https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> > > >
> > > > Feedbacks welcome.
> > > >
> > >
> > > One more in ginvacuumcleanup().
> >
> > 0001, 0002 and 0004 LGTM.
> >
> > 0003:
> >
> > +        buf = read_stream_next_buffer(stream, NULL);
> > +        if (buf == InvalidBuffer)
> > +            break;
> >
> > I think we are loosening the check here. We were sure that there were
> > no InvalidBuffers until the nblocks. Streamified version does not have
> > this check, it exits from the loop the first time it sees an
> > InvalidBuffer, which may be wrong. You might want to add
> > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
> > have a similar check.
> >
>
> Agree. The check has been added in v2 per your suggestion.
>

Two more to go:
patch 5: Streamify log_newpage_range() WAL logging path
patch 6: Streamify hash index VACUUM primary bucket page reads

Benchmarks will be conducted soon.


--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Sun, Dec 28, 2025 at 7:41 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Sat, Dec 27, 2025 at 12:41 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Bilal,
> >
> > Thanks for your review!
> >
> > On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Thank you for working on this!
> > >
> > > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi Hackers,
> > > > >
> > > > > I noticed several additional paths in contrib modules, beyond [1],
> > > > > that are potentially suitable for streamification:
> > > > >
> > > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal
> > > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete()
> > > > >
> > > > > The following patches streamify those code paths. No benchmarks have
> > > > > been run yet.
> > > > >
> > > > > [1]
https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com
> > > > >
> > > > > Feedbacks welcome.
> > > > >
> > > >
> > > > One more in ginvacuumcleanup().
> > >
> > > 0001, 0002 and 0004 LGTM.
> > >
> > > 0003:
> > >
> > > +        buf = read_stream_next_buffer(stream, NULL);
> > > +        if (buf == InvalidBuffer)
> > > +            break;
> > >
> > > I think we are loosening the check here. We were sure that there were
> > > no InvalidBuffers until the nblocks. Streamified version does not have
> > > this check, it exits from the loop the first time it sees an
> > > InvalidBuffer, which may be wrong. You might want to add
> > > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to
> > > have a similar check.
> > >
> >
> > Agree. The check has been added in v2 per your suggestion.
> >
>
> Two more to go:
> patch 5: Streamify log_newpage_range() WAL logging path
> patch 6: Streamify hash index VACUUM primary bucket page reads
>
> Benchmarks will be conducted soon.
>

v6 in the last message has a problem and has not been updated. Attach
the right one again. Sorry for the noise.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Nazir Bilal Yavuz
Дата:
Hi,

On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
> >
> > Two more to go:
> > patch 5: Streamify log_newpage_range() WAL logging path
> > patch 6: Streamify hash index VACUUM primary bucket page reads
> >
> > Benchmarks will be conducted soon.
> >
>
> v6 in the last message has a problem and has not been updated. Attach
> the right one again. Sorry for the noise.

0003 and 0006:

You need to add 'StatApproxReadStreamPrivate' and
'HashBulkDeleteStreamPrivate' to the typedefs.list.

0005:

@@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
         nbufs = 0;
         while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
         {
-            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
-                                                 RBM_NORMAL, NULL);
+            Buffer        buf = read_stream_next_buffer(stream, NULL);
+
+            if (!BufferIsValid(buf))
+                break;

We are loosening a check here, there should not be a invalid buffer in
the stream until the endblk. I think you can remove this
BufferIsValid() check, then we can learn if something goes wrong.

0006:

You can use read_stream_reset() instead of read_stream_end(), then you
can use the same stream with different variables, I believe this is
the preferred way.

Rest LGTM!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

Thanks for looking into this.

On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> > >
> > > Two more to go:
> > > patch 5: Streamify log_newpage_range() WAL logging path
> > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > >
> > > Benchmarks will be conducted soon.
> > >
> >
> > v6 in the last message has a problem and has not been updated. Attach
> > the right one again. Sorry for the noise.
>
> 0003 and 0006:
>
> You need to add 'StatApproxReadStreamPrivate' and
> 'HashBulkDeleteStreamPrivate' to the typedefs.list.

Done.

> 0005:
>
> @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
>          nbufs = 0;
>          while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
>          {
> -            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
> -                                                 RBM_NORMAL, NULL);
> +            Buffer        buf = read_stream_next_buffer(stream, NULL);
> +
> +            if (!BufferIsValid(buf))
> +                break;
>
> We are loosening a check here, there should not be a invalid buffer in
> the stream until the endblk. I think you can remove this
> BufferIsValid() check, then we can learn if something goes wrong.

My concern before for not adding assert at the end of streaming is the
potential early break in here:

/* Nothing more to do if all remaining blocks were empty. */
if (nbufs == 0)
    break;

After looking more closely, it turns out to be a misunderstanding of the logic.

> 0006:
>
> You can use read_stream_reset() instead of read_stream_end(), then you
> can use the same stream with different variables, I believe this is
> the preferred way.
>
> Rest LGTM!
>

Yeah, reset seems a more proper way here.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> Thanks for looking into this.
>
> On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > > >
> > > > Two more to go:
> > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > >
> > > > Benchmarks will be conducted soon.
> > > >
> > >
> > > v6 in the last message has a problem and has not been updated. Attach
> > > the right one again. Sorry for the noise.
> >
> > 0003 and 0006:
> >
> > You need to add 'StatApproxReadStreamPrivate' and
> > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
>
> Done.
>
> > 0005:
> >
> > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> >          nbufs = 0;
> >          while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> >          {
> > -            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
> > -                                                 RBM_NORMAL, NULL);
> > +            Buffer        buf = read_stream_next_buffer(stream, NULL);
> > +
> > +            if (!BufferIsValid(buf))
> > +                break;
> >
> > We are loosening a check here, there should not be a invalid buffer in
> > the stream until the endblk. I think you can remove this
> > BufferIsValid() check, then we can learn if something goes wrong.
>
> My concern before for not adding assert at the end of streaming is the
> potential early break in here:
>
> /* Nothing more to do if all remaining blocks were empty. */
> if (nbufs == 0)
>     break;
>
> After looking more closely, it turns out to be a misunderstanding of the logic.
>
> > 0006:
> >
> > You can use read_stream_reset() instead of read_stream_end(), then you
> > can use the same stream with different variables, I believe this is
> > the preferred way.
> >
> > Rest LGTM!
> >
>
> Yeah, reset seems a more proper way here.
>

Run pgindent using the updated typedefs.list.

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Tue, Dec 30, 2025 at 10:43 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > Thanks for looking into this.
> >
> > On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > > >
> > > > > Two more to go:
> > > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > > >
> > > > > Benchmarks will be conducted soon.
> > > > >
> > > >
> > > > v6 in the last message has a problem and has not been updated. Attach
> > > > the right one again. Sorry for the noise.
> > >
> > > 0003 and 0006:
> > >
> > > You need to add 'StatApproxReadStreamPrivate' and
> > > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
> >
> > Done.
> >
> > > 0005:
> > >
> > > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> > >          nbufs = 0;
> > >          while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> > >          {
> > > -            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
> > > -                                                 RBM_NORMAL, NULL);
> > > +            Buffer        buf = read_stream_next_buffer(stream, NULL);
> > > +
> > > +            if (!BufferIsValid(buf))
> > > +                break;
> > >
> > > We are loosening a check here, there should not be a invalid buffer in
> > > the stream until the endblk. I think you can remove this
> > > BufferIsValid() check, then we can learn if something goes wrong.
> >
> > My concern before for not adding assert at the end of streaming is the
> > potential early break in here:
> >
> > /* Nothing more to do if all remaining blocks were empty. */
> > if (nbufs == 0)
> >     break;
> >
> > After looking more closely, it turns out to be a misunderstanding of the logic.
> >
> > > 0006:
> > >
> > > You can use read_stream_reset() instead of read_stream_end(), then you
> > > can use the same stream with different variables, I believe this is
> > > the preferred way.
> > >
> > > Rest LGTM!
> > >
> >
> > Yeah, reset seems a more proper way here.
> >
>
> Run pgindent using the updated typedefs.list.
>

I've completed benchmarking of the v4 streaming read patches across
three I/O methods (io_uring, sync, worker). Tests were run with cold
cache on large datasets.

--- Settings ---

shared_buffers = '8GB'
effective_io_concurrency = 200
io_method = $IO_METHOD
io_workers = $IO_WORKERS
io_max_concurrency = $IO_MAX_CONCURRENCY
track_io_timing = on
autovacuum = off
checkpoint_timeout = 1h
max_wal_size = 10GB
max_parallel_workers_per_gather = 0

--- Machine ---
CPU: 48-core
RAM: 256 GB DDR5
Disk: 2 x 1.92 TB NVMe SSD

--- Executive Summary ---

The patches provide significant benefits for I/O-bound sequential
operations, with the greatest improvements seen when using
asynchronous I/O methods (io_uring and worker). The synchronous I/O
mode shows reduced but still meaningful gains.

--- Results by I/O Method

Best Results: io_method=worker

bloom_scan: 4.14x (75.9% faster); 93% fewer reads
pgstattuple: 1.59x (37.1% faster); 94% fewer reads
hash_vacuum: 1.05x (4.4% faster); 80% fewer reads
gin_vacuum: 1.06x (5.6% faster); 15% fewer reads
bloom_vacuum: 1.04x (3.9% faster); 76% fewer reads
wal_logging: 0.98x (-2.5%, neutral/slightly slower); no change in reads

io_method=io_uring

bloom_scan: 3.12x (68.0% faster); 93% fewer reads
pgstattuple: 1.50x (33.2% faster); 94% fewer reads
hash_vacuum: 1.03x (3.3% faster); 80% fewer reads
gin_vacuum: 1.02x (2.1% faster); 15% fewer reads
bloom_vacuum: 1.03x (3.4% faster); 76% fewer reads
wal_logging: 1.00x (-0.5%, neutral); no change in reads

io_method=sync (baseline comparison)

bloom_scan: 1.20x (16.4% faster); 93% fewer reads
pgstattuple: 1.10x (9.0% faster); 94% fewer reads
hash_vacuum: 1.01x (0.8% faster); 80% fewer reads
gin_vacuum: 1.02x (1.7% faster); 15% fewer reads
bloom_vacuum: 1.03x (2.8% faster); 76% fewer reads
wal_logging: 0.99x (-0.7%, neutral); no change in reads

--- Observations ---

Async I/O amplifies streaming benefits: The same patches show 3-4x
improvement with worker/io_uring vs 1.2x with sync.

I/O operation reduction is consistent: All modes show the same ~93-94%
reduction in I/O operations for bloom_scan and pgstattuple.

VACUUM operations show modest gains: Despite large I/O reductions
(76-80%), wall-clock improvements are smaller (3-15%) since VACUUM has
larger CPU overhead (tuple processing, index maintenance, WAL
logging).

log_newpage_range shows no benefit: The patch provides no improvement (~0.97x).

--
Best,
Xuneng

Вложения

Re: Streamify more code paths

От
Xuneng Zhou
Дата:
Hi,

On Thu, Feb 5, 2026 at 12:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Dec 30, 2025 at 10:43 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Thanks for looking into this.
> > >
> > > On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > > >
> > > > > > Two more to go:
> > > > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > > > >
> > > > > > Benchmarks will be conducted soon.
> > > > > >
> > > > >
> > > > > v6 in the last message has a problem and has not been updated. Attach
> > > > > the right one again. Sorry for the noise.
> > > >
> > > > 0003 and 0006:
> > > >
> > > > You need to add 'StatApproxReadStreamPrivate' and
> > > > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
> > >
> > > Done.
> > >
> > > > 0005:
> > > >
> > > > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> > > >          nbufs = 0;
> > > >          while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> > > >          {
> > > > -            Buffer        buf = ReadBufferExtended(rel, forknum, blkno,
> > > > -                                                 RBM_NORMAL, NULL);
> > > > +            Buffer        buf = read_stream_next_buffer(stream, NULL);
> > > > +
> > > > +            if (!BufferIsValid(buf))
> > > > +                break;
> > > >
> > > > We are loosening a check here, there should not be a invalid buffer in
> > > > the stream until the endblk. I think you can remove this
> > > > BufferIsValid() check, then we can learn if something goes wrong.
> > >
> > > My concern before for not adding assert at the end of streaming is the
> > > potential early break in here:
> > >
> > > /* Nothing more to do if all remaining blocks were empty. */
> > > if (nbufs == 0)
> > >     break;
> > >
> > > After looking more closely, it turns out to be a misunderstanding of the logic.
> > >
> > > > 0006:
> > > >
> > > > You can use read_stream_reset() instead of read_stream_end(), then you
> > > > can use the same stream with different variables, I believe this is
> > > > the preferred way.
> > > >
> > > > Rest LGTM!
> > > >
> > >
> > > Yeah, reset seems a more proper way here.
> > >
> >
> > Run pgindent using the updated typedefs.list.
> >
>
> I've completed benchmarking of the v4 streaming read patches across
> three I/O methods (io_uring, sync, worker). Tests were run with cold
> cache on large datasets.
>
> --- Settings ---
>
> shared_buffers = '8GB'
> effective_io_concurrency = 200
> io_method = $IO_METHOD
> io_workers = $IO_WORKERS
> io_max_concurrency = $IO_MAX_CONCURRENCY
> track_io_timing = on
> autovacuum = off
> checkpoint_timeout = 1h
> max_wal_size = 10GB
> max_parallel_workers_per_gather = 0
>
> --- Machine ---
> CPU: 48-core
> RAM: 256 GB DDR5
> Disk: 2 x 1.92 TB NVMe SSD
>
> --- Executive Summary ---
>
> The patches provide significant benefits for I/O-bound sequential
> operations, with the greatest improvements seen when using
> asynchronous I/O methods (io_uring and worker). The synchronous I/O
> mode shows reduced but still meaningful gains.
>
> --- Results by I/O Method
>
> Best Results: io_method=worker
>
> bloom_scan: 4.14x (75.9% faster); 93% fewer reads
> pgstattuple: 1.59x (37.1% faster); 94% fewer reads
> hash_vacuum: 1.05x (4.4% faster); 80% fewer reads
> gin_vacuum: 1.06x (5.6% faster); 15% fewer reads
> bloom_vacuum: 1.04x (3.9% faster); 76% fewer reads
> wal_logging: 0.98x (-2.5%, neutral/slightly slower); no change in reads
>
> io_method=io_uring
>
> bloom_scan: 3.12x (68.0% faster); 93% fewer reads
> pgstattuple: 1.50x (33.2% faster); 94% fewer reads
> hash_vacuum: 1.03x (3.3% faster); 80% fewer reads
> gin_vacuum: 1.02x (2.1% faster); 15% fewer reads
> bloom_vacuum: 1.03x (3.4% faster); 76% fewer reads
> wal_logging: 1.00x (-0.5%, neutral); no change in reads
>
> io_method=sync (baseline comparison)
>
> bloom_scan: 1.20x (16.4% faster); 93% fewer reads
> pgstattuple: 1.10x (9.0% faster); 94% fewer reads
> hash_vacuum: 1.01x (0.8% faster); 80% fewer reads
> gin_vacuum: 1.02x (1.7% faster); 15% fewer reads
> bloom_vacuum: 1.03x (2.8% faster); 76% fewer reads
> wal_logging: 0.99x (-0.7%, neutral); no change in reads
>
> --- Observations ---
>
> Async I/O amplifies streaming benefits: The same patches show 3-4x
> improvement with worker/io_uring vs 1.2x with sync.
>
> I/O operation reduction is consistent: All modes show the same ~93-94%
> reduction in I/O operations for bloom_scan and pgstattuple.
>
> VACUUM operations show modest gains: Despite large I/O reductions
> (76-80%), wall-clock improvements are smaller (3-15%) since VACUUM has
> larger CPU overhead (tuple processing, index maintenance, WAL
> logging).
>
> log_newpage_range shows no benefit: The patch provides no improvement (~0.97x).
>
> --
> Best,
> Xuneng

There was an issue in the wal_log test of the original script.

--- The original benchmark used:
ALTER TABLE ... SET LOGGED

This path performs a full table rewrite via ATRewriteTable()
(tablecmds.c). It creates a new relfilenode and copies tuples into it.
It does not call log_newpage_range() on rewritten pages.

log_newpage_range() may only appear indirectly through the
pending-sync logic in storage.c, and only when:

wal_level = minimal, and
relation size < wal_skip_threshold (default 2MB).

Our test tables (1M–20M rows) are far larger than 2MB. In that case,
PostgreSQL fsyncs the file instead of WAL-logging it. Therefore, the
previous benchmark measured table rewrite I/O, not the
log_newpage_range() path.

--- Current design: GIN index build

The benchmark now uses:
CREATE INDEX ... USING gin (doc_tsv)

This reliably exercises log_newpage_range() because:
- ginbuild() constructs the index and WAL-logs all new index pages
using log_newpage_range().
- This is part of the normal GIN build path, independent of wal_skip_threshold.
- The streaming-read patch modifies the WAL logging path inside
log_newpage_range(), which this test directly targets.

--- Results (wal_logging_large)
worker: 1.00x (+0.5%); no meaningful change in reads
io_uring: 1.01x (+1.3%); no meaningful change in reads
sync: 1.01x (+1.1%); no meaningful change in reads

--
Best,
Xuneng

Вложения