Обсуждение: Streamify more code paths
Hi Hackers, I noticed several additional paths in contrib modules, beyond [1], that are potentially suitable for streamification: 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete() The following patches streamify those code paths. No benchmarks have been run yet. [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com Feedbacks welcome. -- Best, Xuneng
Вложения
Hi, On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi Hackers, > > I noticed several additional paths in contrib modules, beyond [1], > that are potentially suitable for streamification: > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete() > > The following patches streamify those code paths. No benchmarks have > been run yet. > > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com > > Feedbacks welcome. > One more in ginvacuumcleanup(). -- Best, Xuneng
Вложения
Hi, Thank you for working on this! On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi, > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi Hackers, > > > > I noticed several additional paths in contrib modules, beyond [1], > > that are potentially suitable for streamification: > > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete() > > > > The following patches streamify those code paths. No benchmarks have > > been run yet. > > > > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com > > > > Feedbacks welcome. > > > > One more in ginvacuumcleanup(). 0001, 0002 and 0004 LGTM. 0003: + buf = read_stream_next_buffer(stream, NULL); + if (buf == InvalidBuffer) + break; I think we are loosening the check here. We were sure that there were no InvalidBuffers until the nblocks. Streamified version does not have this check, it exits from the loop the first time it sees an InvalidBuffer, which may be wrong. You might want to add 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to have a similar check. -- Regards, Nazir Bilal Yavuz Microsoft
Hi Bilal, Thanks for your review! On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > Hi, > > Thank you for working on this! > > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi, > > > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > Hi Hackers, > > > > > > I noticed several additional paths in contrib modules, beyond [1], > > > that are potentially suitable for streamification: > > > > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete() > > > > > > The following patches streamify those code paths. No benchmarks have > > > been run yet. > > > > > > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com > > > > > > Feedbacks welcome. > > > > > > > One more in ginvacuumcleanup(). > > 0001, 0002 and 0004 LGTM. > > 0003: > > + buf = read_stream_next_buffer(stream, NULL); > + if (buf == InvalidBuffer) > + break; > > I think we are loosening the check here. We were sure that there were > no InvalidBuffers until the nblocks. Streamified version does not have > this check, it exits from the loop the first time it sees an > InvalidBuffer, which may be wrong. You might want to add > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to > have a similar check. > Agree. The check has been added in v2 per your suggestion. -- Best, Xuneng
Вложения
Hi, On Sat, Dec 27, 2025 at 12:41 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi Bilal, > > Thanks for your review! > > On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > > > Hi, > > > > Thank you for working on this! > > > > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > Hi, > > > > > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > > > Hi Hackers, > > > > > > > > I noticed several additional paths in contrib modules, beyond [1], > > > > that are potentially suitable for streamification: > > > > > > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal > > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete() > > > > > > > > The following patches streamify those code paths. No benchmarks have > > > > been run yet. > > > > > > > > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com > > > > > > > > Feedbacks welcome. > > > > > > > > > > One more in ginvacuumcleanup(). > > > > 0001, 0002 and 0004 LGTM. > > > > 0003: > > > > + buf = read_stream_next_buffer(stream, NULL); > > + if (buf == InvalidBuffer) > > + break; > > > > I think we are loosening the check here. We were sure that there were > > no InvalidBuffers until the nblocks. Streamified version does not have > > this check, it exits from the loop the first time it sees an > > InvalidBuffer, which may be wrong. You might want to add > > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to > > have a similar check. > > > > Agree. The check has been added in v2 per your suggestion. > Two more to go: patch 5: Streamify log_newpage_range() WAL logging path patch 6: Streamify hash index VACUUM primary bucket page reads Benchmarks will be conducted soon. -- Best, Xuneng
Вложения
- v2-0002-Streamify-Bloom-VACUUM-paths-Use-streaming-re.patch
- v2-0004-Replace-synchronous-ReadBufferExtended-loop-with.patch
- v2-0001-Switch-Bloom-scan-paths-to-streaming-read.patch
- v2-0003-Streamify-heap-bloat-estimation-scan-Introduc.patch
- v2-0005-Streamify-log_newpage_range-WAL-logging-path.patch
- v2-0006-Streamify-hash-index-VACUUM-primary-bucket-page-r.patch
Hi, On Sun, Dec 28, 2025 at 7:41 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi, > > On Sat, Dec 27, 2025 at 12:41 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > Hi Bilal, > > > > Thanks for your review! > > > > On Fri, Dec 26, 2025 at 6:59 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > > > > > Hi, > > > > > > Thank you for working on this! > > > > > > On Thu, 25 Dec 2025 at 09:34, Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > > > Hi, > > > > > > > > On Thu, Dec 25, 2025 at 1:51 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > > > > > > > Hi Hackers, > > > > > > > > > > I noticed several additional paths in contrib modules, beyond [1], > > > > > that are potentially suitable for streamification: > > > > > > > > > > 1) pgstattuple — pgstatapprox.c and parts of pgstattuple_approx_internal > > > > > 2) Bloom — scan paths in blgetbitmap() and maintenance paths in blbulkdelete() > > > > > > > > > > The following patches streamify those code paths. No benchmarks have > > > > > been run yet. > > > > > > > > > > [1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com > > > > > > > > > > Feedbacks welcome. > > > > > > > > > > > > > One more in ginvacuumcleanup(). > > > > > > 0001, 0002 and 0004 LGTM. > > > > > > 0003: > > > > > > + buf = read_stream_next_buffer(stream, NULL); > > > + if (buf == InvalidBuffer) > > > + break; > > > > > > I think we are loosening the check here. We were sure that there were > > > no InvalidBuffers until the nblocks. Streamified version does not have > > > this check, it exits from the loop the first time it sees an > > > InvalidBuffer, which may be wrong. You might want to add > > > 'Assert(p.current_blocknum == nblocks);' before read_stream_end() to > > > have a similar check. > > > > > > > Agree. The check has been added in v2 per your suggestion. > > > > Two more to go: > patch 5: Streamify log_newpage_range() WAL logging path > patch 6: Streamify hash index VACUUM primary bucket page reads > > Benchmarks will be conducted soon. > v6 in the last message has a problem and has not been updated. Attach the right one again. Sorry for the noise. -- Best, Xuneng
Вложения
- v2-0002-Streamify-Bloom-VACUUM-paths-Use-streaming-re.patch
- v2-0001-Switch-Bloom-scan-paths-to-streaming-read.patch
- v2-0004-Replace-synchronous-ReadBufferExtended-loop-with.patch
- v2-0003-Streamify-heap-bloat-estimation-scan-Introduc.patch
- v2-0005-Streamify-log_newpage_range-WAL-logging-path.patch
- v2-0006-Streamify-hash-index-VACUUM-primary-bucket-page-r.patch
Hi,
On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
> >
> > Two more to go:
> > patch 5: Streamify log_newpage_range() WAL logging path
> > patch 6: Streamify hash index VACUUM primary bucket page reads
> >
> > Benchmarks will be conducted soon.
> >
>
> v6 in the last message has a problem and has not been updated. Attach
> the right one again. Sorry for the noise.
0003 and 0006:
You need to add 'StatApproxReadStreamPrivate' and
'HashBulkDeleteStreamPrivate' to the typedefs.list.
0005:
@@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
nbufs = 0;
while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
{
- Buffer buf = ReadBufferExtended(rel, forknum, blkno,
- RBM_NORMAL, NULL);
+ Buffer buf = read_stream_next_buffer(stream, NULL);
+
+ if (!BufferIsValid(buf))
+ break;
We are loosening a check here, there should not be a invalid buffer in
the stream until the endblk. I think you can remove this
BufferIsValid() check, then we can learn if something goes wrong.
0006:
You can use read_stream_reset() instead of read_stream_end(), then you
can use the same stream with different variables, I believe this is
the preferred way.
Rest LGTM!
--
Regards,
Nazir Bilal Yavuz
Microsoft
Hi,
Thanks for looking into this.
On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Hi,
>
> On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> > >
> > > Two more to go:
> > > patch 5: Streamify log_newpage_range() WAL logging path
> > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > >
> > > Benchmarks will be conducted soon.
> > >
> >
> > v6 in the last message has a problem and has not been updated. Attach
> > the right one again. Sorry for the noise.
>
> 0003 and 0006:
>
> You need to add 'StatApproxReadStreamPrivate' and
> 'HashBulkDeleteStreamPrivate' to the typedefs.list.
Done.
> 0005:
>
> @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> nbufs = 0;
> while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> {
> - Buffer buf = ReadBufferExtended(rel, forknum, blkno,
> - RBM_NORMAL, NULL);
> + Buffer buf = read_stream_next_buffer(stream, NULL);
> +
> + if (!BufferIsValid(buf))
> + break;
>
> We are loosening a check here, there should not be a invalid buffer in
> the stream until the endblk. I think you can remove this
> BufferIsValid() check, then we can learn if something goes wrong.
My concern before for not adding assert at the end of streaming is the
potential early break in here:
/* Nothing more to do if all remaining blocks were empty. */
if (nbufs == 0)
break;
After looking more closely, it turns out to be a misunderstanding of the logic.
> 0006:
>
> You can use read_stream_reset() instead of read_stream_end(), then you
> can use the same stream with different variables, I believe this is
> the preferred way.
>
> Rest LGTM!
>
Yeah, reset seems a more proper way here.
--
Best,
Xuneng
Вложения
- v3-0003-Streamify-heap-bloat-estimation-scan.-Introduce-a.patch
- v3-0005-Streamify-log_newpage_range-WAL-logging-path.patch
- v3-0002-Streamify-Bloom-VACUUM-paths.-n-nUse-streaming-re.patch
- v3-0001-Switch-Bloom-scan-paths-to-streaming-read.-n-nRep.patch
- v3-0004-Replace-synchronous-ReadBufferExtended-loop-with-.patch
- v3-0006-Streamify-hash-index-VACUUM-primary-bucket-page-r.patch
Hi,
On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> Thanks for looking into this.
>
> On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Hi,
> >
> > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > > >
> > > > Two more to go:
> > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > >
> > > > Benchmarks will be conducted soon.
> > > >
> > >
> > > v6 in the last message has a problem and has not been updated. Attach
> > > the right one again. Sorry for the noise.
> >
> > 0003 and 0006:
> >
> > You need to add 'StatApproxReadStreamPrivate' and
> > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
>
> Done.
>
> > 0005:
> >
> > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> > nbufs = 0;
> > while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> > {
> > - Buffer buf = ReadBufferExtended(rel, forknum, blkno,
> > - RBM_NORMAL, NULL);
> > + Buffer buf = read_stream_next_buffer(stream, NULL);
> > +
> > + if (!BufferIsValid(buf))
> > + break;
> >
> > We are loosening a check here, there should not be a invalid buffer in
> > the stream until the endblk. I think you can remove this
> > BufferIsValid() check, then we can learn if something goes wrong.
>
> My concern before for not adding assert at the end of streaming is the
> potential early break in here:
>
> /* Nothing more to do if all remaining blocks were empty. */
> if (nbufs == 0)
> break;
>
> After looking more closely, it turns out to be a misunderstanding of the logic.
>
> > 0006:
> >
> > You can use read_stream_reset() instead of read_stream_end(), then you
> > can use the same stream with different variables, I believe this is
> > the preferred way.
> >
> > Rest LGTM!
> >
>
> Yeah, reset seems a more proper way here.
>
Run pgindent using the updated typedefs.list.
--
Best,
Xuneng
Вложения
- v4-0001-Switch-Bloom-scan-paths-to-streaming-read.-n-nRep.patch
- v4-0005-Streamify-log_newpage_range-WAL-logging-path.patch
- v4-0004-Replace-synchronous-ReadBufferExtended-loop-with-.patch
- v4-0002-Streamify-Bloom-VACUUM-paths.-n-nUse-streaming-re.patch
- v4-0006-Streamify-hash-index-VACUUM-primary-bucket-page-r.patch
- v4-0003-Streamify-heap-bloat-estimation-scan.-Introduce-a.patch
Hi,
On Tue, Dec 30, 2025 at 10:43 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > Thanks for looking into this.
> >
> > On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > >
> > > > Hi,
> > > > >
> > > > > Two more to go:
> > > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > > >
> > > > > Benchmarks will be conducted soon.
> > > > >
> > > >
> > > > v6 in the last message has a problem and has not been updated. Attach
> > > > the right one again. Sorry for the noise.
> > >
> > > 0003 and 0006:
> > >
> > > You need to add 'StatApproxReadStreamPrivate' and
> > > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
> >
> > Done.
> >
> > > 0005:
> > >
> > > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> > > nbufs = 0;
> > > while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> > > {
> > > - Buffer buf = ReadBufferExtended(rel, forknum, blkno,
> > > - RBM_NORMAL, NULL);
> > > + Buffer buf = read_stream_next_buffer(stream, NULL);
> > > +
> > > + if (!BufferIsValid(buf))
> > > + break;
> > >
> > > We are loosening a check here, there should not be a invalid buffer in
> > > the stream until the endblk. I think you can remove this
> > > BufferIsValid() check, then we can learn if something goes wrong.
> >
> > My concern before for not adding assert at the end of streaming is the
> > potential early break in here:
> >
> > /* Nothing more to do if all remaining blocks were empty. */
> > if (nbufs == 0)
> > break;
> >
> > After looking more closely, it turns out to be a misunderstanding of the logic.
> >
> > > 0006:
> > >
> > > You can use read_stream_reset() instead of read_stream_end(), then you
> > > can use the same stream with different variables, I believe this is
> > > the preferred way.
> > >
> > > Rest LGTM!
> > >
> >
> > Yeah, reset seems a more proper way here.
> >
>
> Run pgindent using the updated typedefs.list.
>
I've completed benchmarking of the v4 streaming read patches across
three I/O methods (io_uring, sync, worker). Tests were run with cold
cache on large datasets.
--- Settings ---
shared_buffers = '8GB'
effective_io_concurrency = 200
io_method = $IO_METHOD
io_workers = $IO_WORKERS
io_max_concurrency = $IO_MAX_CONCURRENCY
track_io_timing = on
autovacuum = off
checkpoint_timeout = 1h
max_wal_size = 10GB
max_parallel_workers_per_gather = 0
--- Machine ---
CPU: 48-core
RAM: 256 GB DDR5
Disk: 2 x 1.92 TB NVMe SSD
--- Executive Summary ---
The patches provide significant benefits for I/O-bound sequential
operations, with the greatest improvements seen when using
asynchronous I/O methods (io_uring and worker). The synchronous I/O
mode shows reduced but still meaningful gains.
--- Results by I/O Method
Best Results: io_method=worker
bloom_scan: 4.14x (75.9% faster); 93% fewer reads
pgstattuple: 1.59x (37.1% faster); 94% fewer reads
hash_vacuum: 1.05x (4.4% faster); 80% fewer reads
gin_vacuum: 1.06x (5.6% faster); 15% fewer reads
bloom_vacuum: 1.04x (3.9% faster); 76% fewer reads
wal_logging: 0.98x (-2.5%, neutral/slightly slower); no change in reads
io_method=io_uring
bloom_scan: 3.12x (68.0% faster); 93% fewer reads
pgstattuple: 1.50x (33.2% faster); 94% fewer reads
hash_vacuum: 1.03x (3.3% faster); 80% fewer reads
gin_vacuum: 1.02x (2.1% faster); 15% fewer reads
bloom_vacuum: 1.03x (3.4% faster); 76% fewer reads
wal_logging: 1.00x (-0.5%, neutral); no change in reads
io_method=sync (baseline comparison)
bloom_scan: 1.20x (16.4% faster); 93% fewer reads
pgstattuple: 1.10x (9.0% faster); 94% fewer reads
hash_vacuum: 1.01x (0.8% faster); 80% fewer reads
gin_vacuum: 1.02x (1.7% faster); 15% fewer reads
bloom_vacuum: 1.03x (2.8% faster); 76% fewer reads
wal_logging: 0.99x (-0.7%, neutral); no change in reads
--- Observations ---
Async I/O amplifies streaming benefits: The same patches show 3-4x
improvement with worker/io_uring vs 1.2x with sync.
I/O operation reduction is consistent: All modes show the same ~93-94%
reduction in I/O operations for bloom_scan and pgstattuple.
VACUUM operations show modest gains: Despite large I/O reductions
(76-80%), wall-clock improvements are smaller (3-15%) since VACUUM has
larger CPU overhead (tuple processing, index maintenance, WAL
logging).
log_newpage_range shows no benefit: The patch provides no improvement (~0.97x).
--
Best,
Xuneng
Вложения
Hi,
On Thu, Feb 5, 2026 at 12:01 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Tue, Dec 30, 2025 at 10:43 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Dec 30, 2025 at 9:51 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Thanks for looking into this.
> > >
> > > On Mon, Dec 29, 2025 at 6:58 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Sun, 28 Dec 2025 at 14:46, Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > > >
> > > > > > Two more to go:
> > > > > > patch 5: Streamify log_newpage_range() WAL logging path
> > > > > > patch 6: Streamify hash index VACUUM primary bucket page reads
> > > > > >
> > > > > > Benchmarks will be conducted soon.
> > > > > >
> > > > >
> > > > > v6 in the last message has a problem and has not been updated. Attach
> > > > > the right one again. Sorry for the noise.
> > > >
> > > > 0003 and 0006:
> > > >
> > > > You need to add 'StatApproxReadStreamPrivate' and
> > > > 'HashBulkDeleteStreamPrivate' to the typedefs.list.
> > >
> > > Done.
> > >
> > > > 0005:
> > > >
> > > > @@ -1321,8 +1341,10 @@ log_newpage_range(Relation rel, ForkNumber forknum,
> > > > nbufs = 0;
> > > > while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
> > > > {
> > > > - Buffer buf = ReadBufferExtended(rel, forknum, blkno,
> > > > - RBM_NORMAL, NULL);
> > > > + Buffer buf = read_stream_next_buffer(stream, NULL);
> > > > +
> > > > + if (!BufferIsValid(buf))
> > > > + break;
> > > >
> > > > We are loosening a check here, there should not be a invalid buffer in
> > > > the stream until the endblk. I think you can remove this
> > > > BufferIsValid() check, then we can learn if something goes wrong.
> > >
> > > My concern before for not adding assert at the end of streaming is the
> > > potential early break in here:
> > >
> > > /* Nothing more to do if all remaining blocks were empty. */
> > > if (nbufs == 0)
> > > break;
> > >
> > > After looking more closely, it turns out to be a misunderstanding of the logic.
> > >
> > > > 0006:
> > > >
> > > > You can use read_stream_reset() instead of read_stream_end(), then you
> > > > can use the same stream with different variables, I believe this is
> > > > the preferred way.
> > > >
> > > > Rest LGTM!
> > > >
> > >
> > > Yeah, reset seems a more proper way here.
> > >
> >
> > Run pgindent using the updated typedefs.list.
> >
>
> I've completed benchmarking of the v4 streaming read patches across
> three I/O methods (io_uring, sync, worker). Tests were run with cold
> cache on large datasets.
>
> --- Settings ---
>
> shared_buffers = '8GB'
> effective_io_concurrency = 200
> io_method = $IO_METHOD
> io_workers = $IO_WORKERS
> io_max_concurrency = $IO_MAX_CONCURRENCY
> track_io_timing = on
> autovacuum = off
> checkpoint_timeout = 1h
> max_wal_size = 10GB
> max_parallel_workers_per_gather = 0
>
> --- Machine ---
> CPU: 48-core
> RAM: 256 GB DDR5
> Disk: 2 x 1.92 TB NVMe SSD
>
> --- Executive Summary ---
>
> The patches provide significant benefits for I/O-bound sequential
> operations, with the greatest improvements seen when using
> asynchronous I/O methods (io_uring and worker). The synchronous I/O
> mode shows reduced but still meaningful gains.
>
> --- Results by I/O Method
>
> Best Results: io_method=worker
>
> bloom_scan: 4.14x (75.9% faster); 93% fewer reads
> pgstattuple: 1.59x (37.1% faster); 94% fewer reads
> hash_vacuum: 1.05x (4.4% faster); 80% fewer reads
> gin_vacuum: 1.06x (5.6% faster); 15% fewer reads
> bloom_vacuum: 1.04x (3.9% faster); 76% fewer reads
> wal_logging: 0.98x (-2.5%, neutral/slightly slower); no change in reads
>
> io_method=io_uring
>
> bloom_scan: 3.12x (68.0% faster); 93% fewer reads
> pgstattuple: 1.50x (33.2% faster); 94% fewer reads
> hash_vacuum: 1.03x (3.3% faster); 80% fewer reads
> gin_vacuum: 1.02x (2.1% faster); 15% fewer reads
> bloom_vacuum: 1.03x (3.4% faster); 76% fewer reads
> wal_logging: 1.00x (-0.5%, neutral); no change in reads
>
> io_method=sync (baseline comparison)
>
> bloom_scan: 1.20x (16.4% faster); 93% fewer reads
> pgstattuple: 1.10x (9.0% faster); 94% fewer reads
> hash_vacuum: 1.01x (0.8% faster); 80% fewer reads
> gin_vacuum: 1.02x (1.7% faster); 15% fewer reads
> bloom_vacuum: 1.03x (2.8% faster); 76% fewer reads
> wal_logging: 0.99x (-0.7%, neutral); no change in reads
>
> --- Observations ---
>
> Async I/O amplifies streaming benefits: The same patches show 3-4x
> improvement with worker/io_uring vs 1.2x with sync.
>
> I/O operation reduction is consistent: All modes show the same ~93-94%
> reduction in I/O operations for bloom_scan and pgstattuple.
>
> VACUUM operations show modest gains: Despite large I/O reductions
> (76-80%), wall-clock improvements are smaller (3-15%) since VACUUM has
> larger CPU overhead (tuple processing, index maintenance, WAL
> logging).
>
> log_newpage_range shows no benefit: The patch provides no improvement (~0.97x).
>
> --
> Best,
> Xuneng
There was an issue in the wal_log test of the original script.
--- The original benchmark used:
ALTER TABLE ... SET LOGGED
This path performs a full table rewrite via ATRewriteTable()
(tablecmds.c). It creates a new relfilenode and copies tuples into it.
It does not call log_newpage_range() on rewritten pages.
log_newpage_range() may only appear indirectly through the
pending-sync logic in storage.c, and only when:
wal_level = minimal, and
relation size < wal_skip_threshold (default 2MB).
Our test tables (1M–20M rows) are far larger than 2MB. In that case,
PostgreSQL fsyncs the file instead of WAL-logging it. Therefore, the
previous benchmark measured table rewrite I/O, not the
log_newpage_range() path.
--- Current design: GIN index build
The benchmark now uses:
CREATE INDEX ... USING gin (doc_tsv)
This reliably exercises log_newpage_range() because:
- ginbuild() constructs the index and WAL-logs all new index pages
using log_newpage_range().
- This is part of the normal GIN build path, independent of wal_skip_threshold.
- The streaming-read patch modifies the WAL logging path inside
log_newpage_range(), which this test directly targets.
--- Results (wal_logging_large)
worker: 1.00x (+0.5%); no meaningful change in reads
io_uring: 1.01x (+1.3%); no meaningful change in reads
sync: 1.01x (+1.1%); no meaningful change in reads
--
Best,
Xuneng