Обсуждение: astreamer_lz4: fix bug of output pointer advancement in decompressor
Hi,
There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a
bugin astreamer_lz4_decompressor_content().
Looking at the code snippet (omitting irrelevant code):
```
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);
mystreamer->bytes_written += out_size; // <== bumped bytes_written already
/*
* If output buffer is full then forward the content to next streamer
* and update the output buffer.
*/
if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
{
astreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
mystreamer->base.bbs_buffer.maxlen,
context);
avail_out = mystreamer->base.bbs_buffer.maxlen;
mystreamer->bytes_written = 0;
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
}
else
{
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
next_out += mystreamer->bytes_written; // <== The bug is there
}
```
To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by
out_sizein the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far,
insteadof just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the
previousprogress.
When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution.
Tracinginto the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress()
tendsto fill the output buffer in one or two iterations. As a result, the problematic else branch is either not
reached,or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.
To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available out_size,
Iartificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to fill the
buffer.See the attached nocfbot_hack.diff for the hack.
With that hack in place, the bug can be reproduced using the following procedure:
1. initdb
2 Set "wal_level = replica” in postgreSQl.conf
3. Restart the instance
4. Create a database
5. Generate some WAL logs by psql
```
CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
CHECKPOINT;
```
6. Create a backup
```
% rm -rf /tmp/bkup_lz4
% pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
```
7. Verify the backup
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```
With the fix applied (plus the hack), step 7 succeeds:
```
% pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Вложения
> On Mar 2, 2026, at 17:17, Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi,
>
> There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a
bugin astreamer_lz4_decompressor_content().
>
> Looking at the code snippet (omitting irrelevant code):
> ```
> ret = LZ4F_decompress(mystreamer->dctx,
> next_out, &out_size,
> next_in, &read_size, NULL);
>
> mystreamer->bytes_written += out_size; // <== bumped bytes_written already
>
> /*
> * If output buffer is full then forward the content to next streamer
> * and update the output buffer.
> */
> if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
> {
> astreamer_content(mystreamer->base.bbs_next, member,
> mystreamer->base.bbs_buffer.data,
> mystreamer->base.bbs_buffer.maxlen,
> context);
>
> avail_out = mystreamer->base.bbs_buffer.maxlen;
> mystreamer->bytes_written = 0;
> next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
> }
> else
> {
> avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
> next_out += mystreamer->bytes_written; // <== The bug is there
> }
> ```
>
> To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by
out_sizein the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far,
insteadof just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the
previousprogress.
>
> When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution.
Tracinginto the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress()
tendsto fill the output buffer in one or two iterations. As a result, the problematic else branch is either not
reached,or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.
>
> To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available
out_size,I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to
fillthe buffer. See the attached nocfbot_hack.diff for the hack.
>
> With that hack in place, the bug can be reproduced using the following procedure:
>
> 1. initdb
> 2 Set "wal_level = replica” in postgreSQl.conf
> 3. Restart the instance
> 4. Create a database
> 5. Generate some WAL logs by psql
> ```
> CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
> CHECKPOINT;
> ```
> 6. Create a backup
> ```
> % rm -rf /tmp/bkup_lz4
> % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
> ```
> 7. Verify the backup
> ```
> % pg_verifybackup -F t -n /tmp/bkup_lz4
> pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
> ```
>
> With the fix applied (plus the hack), step 7 succeeds:
> ```
> % pg_verifybackup -F t -n /tmp/bkup_lz4
> backup successfully verified
> ```
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
> <nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>
Added to CF for tracking https://commitfest.postgresql.org/patch/6561/
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Hi,
On Tue, Mar 3, 2026 at 11:27 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Mar 2, 2026, at 17:17, Chao Li <li.evan.chao@gmail.com> wrote:
> >
> > Hi,
> >
> > There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found
abug in astreamer_lz4_decompressor_content().
> >
> > Looking at the code snippet (omitting irrelevant code):
> > ```
> > ret = LZ4F_decompress(mystreamer->dctx,
> > next_out, &out_size,
> > next_in, &read_size, NULL);
> >
> > mystreamer->bytes_written += out_size; // <== bumped bytes_written already
> >
> > /*
> > * If output buffer is full then forward the content to next streamer
> > * and update the output buffer.
> > */
> > if (mystreamer->bytes_written >= mystreamer->base.bbs_buffer.maxlen)
> > {
> > astreamer_content(mystreamer->base.bbs_next, member,
> > mystreamer->base.bbs_buffer.data,
> > mystreamer->base.bbs_buffer.maxlen,
> > context);
> >
> > avail_out = mystreamer->base.bbs_buffer.maxlen;
> > mystreamer->bytes_written = 0;
> > next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
> > }
> > else
> > {
> > avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;
> > next_out += mystreamer->bytes_written; // <== The bug is there
> > }
> > ```
> >
> > To advance next_out, the code uses mystreamer->bytes_written. However, bytes_written has already been increased by
out_sizein the current iteration. As a result, next_out is advanced by the cumulative number of bytes written so far,
insteadof just the number of bytes produced in this iteration. Effectively, the pointer movement is double-counting the
previousprogress.
> >
> > When I tried to design a test case to trigger this bug, I found it is actually not easy to hit in normal execution.
Tracinginto the function, I found that the default output buffer size is 1024 bytes, and in practice LZ4F_decompress()
tendsto fill the output buffer in one or two iterations. As a result, the problematic else branch is either not
reached,or reached in a case where bytes_written == out_size, so the incorrect pointer increment does not manifest.
> >
> > To reliably trigger the bug, I used a small hack: instead of letting LZ4F_decompress() use the full available
out_size,I artificially limited out_size before the call, forcing LZ4F_decompress() to require one more iteration to
fillthe buffer. See the attached nocfbot_hack.diff for the hack.
> >
> > With that hack in place, the bug can be reproduced using the following procedure:
> >
> > 1. initdb
> > 2 Set "wal_level = replica” in postgreSQl.conf
> > 3. Restart the instance
> > 4. Create a database
> > 5. Generate some WAL logs by psql
> > ```
> > CREATE TABLE t AS SELECT generate_series(1, 100000) AS id;
> > CHECKPOINT;
> > ```
> > 6. Create a backup
> > ```
> > % rm -rf /tmp/bkup_lz4
> > % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
> > ```
> > 7. Verify the backup
> > ```
> > % pg_verifybackup -F t -n /tmp/bkup_lz4
> > pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
> > ```
> >
> > With the fix applied (plus the hack), step 7 succeeds:
> > ```
> > % pg_verifybackup -F t -n /tmp/bkup_lz4
> > backup successfully verified
> > ```
> > Best regards,
> > --
> > Chao Li (Evan)
> > HighGo Software Co., Ltd.
> > https://www.highgo.com/
> >
> >
> >
> >
> > <nocfbot_hack.diff><v1-0001-astreamer_lz4-fix-output-pointer-advancement-in-d.patch>
>
>
> Added to CF for tracking https://commitfest.postgresql.org/patch/6561/
>
I agree this is a logical issue. We should increment next_out by the
delta value(out_size) rather than the cumulative
value(mystreamer->bytes_written); otherwise, it will leave holes in
the output buffer. The proposed fix LGTM.
--
Best,
Xuneng
Chao Li <li.evan.chao@gmail.com> writes:
> There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a
bugin astreamer_lz4_decompressor_content().
Yup, that's clearly wrong. I failed to reproduce a crash with the
test hack you suggested, but no matter. Pushed with some cosmetic
editorialization.
The track record of all this client-side-compression logic is
really quite awful :-(. Another thing that I'm looking askance
at is the error handling, or rather lack of it:
ret = LZ4F_decompress(mystreamer->dctx,
next_out, &out_size,
next_in, &read_size, NULL);
if (LZ4F_isError(ret))
pg_log_error("could not decompress data: %s",
LZ4F_getErrorName(ret));
... continue on our merry way ...
I suspect whoever wrote this thought pg_log_error is equivalent
to elog(ERROR), but it's not; it just prints a message. It seems
highly unlikely to me that continuing onwards will result in a
good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
throughout these files, at least in places where there's no
visible effort to handle the error.
regards, tom lane
I wrote:
> I suspect whoever wrote this thought pg_log_error is equivalent
> to elog(ERROR), but it's not; it just prints a message. It seems
> highly unlikely to me that continuing onwards will result in a
> good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
> throughout these files, at least in places where there's no
> visible effort to handle the error.
After looking through fe_utils, pg_dump, pg_basebackup, and
pg_verifybackup, I found the attached places that seem to
need cleanup. There are a couple other places where we
are not treating failures as fatal, but those seem intentional,
eg not fatal'ing on close() failure for an input file.
regards, tom lane
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index b72bad130ad..0aa519fbb67 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -629,6 +629,7 @@ LZ4Stream_close(CompressFileHandle *CFH)
size_t required;
size_t status;
int ret;
+ bool success = true;
fp = state->fp;
if (state->inited)
@@ -644,6 +645,7 @@ LZ4Stream_close(CompressFileHandle *CFH)
{
errno = (errno) ? errno : ENOSPC;
pg_log_error("could not write to output file: %m");
+ success = false;
}
state->bufdata = 0;
}
@@ -656,6 +658,7 @@ LZ4Stream_close(CompressFileHandle *CFH)
{
pg_log_error("could not end compression: %s",
LZ4F_getErrorName(status));
+ success = false;
}
else
state->bufdata += status;
@@ -665,19 +668,26 @@ LZ4Stream_close(CompressFileHandle *CFH)
{
errno = (errno) ? errno : ENOSPC;
pg_log_error("could not write to output file: %m");
+ success = false;
}
status = LZ4F_freeCompressionContext(state->ctx);
if (LZ4F_isError(status))
+ {
pg_log_error("could not end compression: %s",
LZ4F_getErrorName(status));
+ success = false;
+ }
}
else
{
status = LZ4F_freeDecompressionContext(state->dtx);
if (LZ4F_isError(status))
+ {
pg_log_error("could not end decompression: %s",
LZ4F_getErrorName(status));
+ success = false;
+ }
pg_free(state->outbuf);
}
@@ -692,10 +702,10 @@ LZ4Stream_close(CompressFileHandle *CFH)
if (ret != 0)
{
pg_log_error("could not close file: %m");
- return false;
+ success = false;
}
- return true;
+ return success;
}
static bool
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index e8d62f754ca..2e080c37a58 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -317,7 +317,7 @@ astreamer_gzip_decompressor_content(astreamer *streamer,
res = inflate(zs, Z_NO_FLUSH);
if (res == Z_STREAM_ERROR)
- pg_log_error("could not decompress data: %s", zs->msg);
+ pg_fatal("could not decompress data: %s", zs->msg);
mystreamer->bytes_written =
mystreamer->base.bbs_buffer.maxlen - zs->avail_out;
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index e196fcc81e5..2bc32b42879 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -94,8 +94,8 @@ astreamer_lz4_compressor_new(astreamer *next, pg_compress_specification *compres
ctxError = LZ4F_createCompressionContext(&streamer->cctx, LZ4F_VERSION);
if (LZ4F_isError(ctxError))
- pg_log_error("could not create lz4 compression context: %s",
- LZ4F_getErrorName(ctxError));
+ pg_fatal("could not create lz4 compression context: %s",
+ LZ4F_getErrorName(ctxError));
return &streamer->base;
#else
@@ -139,8 +139,8 @@ astreamer_lz4_compressor_content(astreamer *streamer,
&mystreamer->prefs);
if (LZ4F_isError(compressed_size))
- pg_log_error("could not write lz4 header: %s",
- LZ4F_getErrorName(compressed_size));
+ pg_fatal("could not write lz4 header: %s",
+ LZ4F_getErrorName(compressed_size));
mystreamer->bytes_written += compressed_size;
mystreamer->header_written = true;
@@ -188,8 +188,8 @@ astreamer_lz4_compressor_content(astreamer *streamer,
next_in, len, NULL);
if (LZ4F_isError(compressed_size))
- pg_log_error("could not compress data: %s",
- LZ4F_getErrorName(compressed_size));
+ pg_fatal("could not compress data: %s",
+ LZ4F_getErrorName(compressed_size));
mystreamer->bytes_written += compressed_size;
}
@@ -240,8 +240,8 @@ astreamer_lz4_compressor_finalize(astreamer *streamer)
next_out, avail_out, NULL);
if (LZ4F_isError(compressed_size))
- pg_log_error("could not end lz4 compression: %s",
- LZ4F_getErrorName(compressed_size));
+ pg_fatal("could not end lz4 compression: %s",
+ LZ4F_getErrorName(compressed_size));
mystreamer->bytes_written += compressed_size;
@@ -353,8 +353,8 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
next_in, &read_size, NULL);
if (LZ4F_isError(ret))
- pg_log_error("could not decompress data: %s",
- LZ4F_getErrorName(ret));
+ pg_fatal("could not decompress data: %s",
+ LZ4F_getErrorName(ret));
/* Update input buffer based on number of bytes consumed */
avail_in -= read_size;
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index 2bf5c57b902..f26abcfd0fa 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -116,11 +116,8 @@ astreamer_zstd_compressor_new(astreamer *next, pg_compress_specification *compre
ZSTD_c_enableLongDistanceMatching,
compress->long_distance);
if (ZSTD_isError(ret))
- {
- pg_log_error("could not enable long-distance mode: %s",
- ZSTD_getErrorName(ret));
- exit(1);
- }
+ pg_fatal("could not enable long-distance mode: %s",
+ ZSTD_getErrorName(ret));
}
/* Initialize the ZSTD output buffer. */
@@ -182,8 +179,8 @@ astreamer_zstd_compressor_content(astreamer *streamer,
&inBuf, ZSTD_e_continue);
if (ZSTD_isError(yet_to_flush))
- pg_log_error("could not compress data: %s",
- ZSTD_getErrorName(yet_to_flush));
+ pg_fatal("could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
}
}
@@ -224,8 +221,8 @@ astreamer_zstd_compressor_finalize(astreamer *streamer)
&in, ZSTD_e_end);
if (ZSTD_isError(yet_to_flush))
- pg_log_error("could not compress data: %s",
- ZSTD_getErrorName(yet_to_flush));
+ pg_fatal("could not compress data: %s",
+ ZSTD_getErrorName(yet_to_flush));
} while (yet_to_flush > 0);
@@ -330,8 +327,8 @@ astreamer_zstd_decompressor_content(astreamer *streamer,
&mystreamer->zstd_outBuf, &inBuf);
if (ZSTD_isError(ret))
- pg_log_error("could not decompress data: %s",
- ZSTD_getErrorName(ret));
+ pg_fatal("could not decompress data: %s",
+ ZSTD_getErrorName(ret));
}
}
> On Mar 5, 2026, at 01:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Chao Li <li.evan.chao@gmail.com> writes:
>> There have been a couple of LZ4-related patches recently, so I spent some time playing with the LZ4 path and found a
bugin astreamer_lz4_decompressor_content().
>
> Yup, that's clearly wrong. I failed to reproduce a crash with the
> test hack you suggested, but no matter. Pushed with some cosmetic
> editorialization.
Hmm.. I just tried again. With applying nocfbot_hack.diff to an old branch, I can easily reproduce the bug:
```
chaol@ChaodeMacBook-Air cndb % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
2026-03-05 09:01:53.461 CST [72896] LOG: checkpoint starting: fast force wait time
2026-03-05 09:01:53.466 CST [72896] LOG: checkpoint complete: fast force wait time: wrote 0 buffers (0.0%), wrote 0
SLRUbuffers; 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0,
longest=0.000s, average=0.000 s; distance=16383 kB, estimate=29655 kB; lsn=0/14000080, redo lsn=0/14000028
chaol@ChaodeMacBook-Air cndb % pg_verifybackup -F t -n /tmp/bkup_lz4
pg_verifybackup: error: zsh: trace trap pg_verifybackup -F t -n /tmp/bkup_lz4
```
Then switching to the latest master, and also applying nocfbot_hack.diff:
```
chaol@ChaodeMacBook-Air postgresql % git diff
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index e196fcc81e5..35fd564df9a 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -331,10 +331,15 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
size_t ret,
read_size,
out_size;
+ size_t hack_out_size;
read_size = avail_in;
out_size = avail_out;
+ if (out_size > 5)
+ hack_out_size = out_size - 5;
+ else
+ hack_out_size = out_size;
/*
* This call decompresses the data starting at next_in and generates
* the output data starting at next_out. It expects the caller to
@@ -349,13 +354,15 @@ astreamer_lz4_decompressor_content(astreamer *streamer,
* to out_size respectively.
*/
ret = LZ4F_decompress(mystreamer->dctx,
- next_out, &out_size,
+ next_out, &hack_out_size,
next_in, &read_size, NULL);
if (LZ4F_isError(ret))
pg_log_error("could not decompress data: %s",
LZ4F_getErrorName(ret));
+ out_size = hack_out_size;
+
/* Update input buffer based on number of bytes consumed */
avail_in -= read_size;
next_in += read_size;
```
Now, the bug goes away:
```
chaol@ChaodeMacBook-Air cndb % rm -rf /tmp/bkup_lz4
chaol@ChaodeMacBook-Air cndb % pg_basebackup -D /tmp/bkup_lz4 -F t -Z lz4 -X stream -c fast
2026-03-05 09:05:57.632 CST [72896] LOG: checkpoint starting: fast force wait
2026-03-05 09:05:57.634 CST [72896] LOG: checkpoint complete: fast force wait: wrote 0 buffers (0.0%), wrote 0 SLRU
buffers;0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0,
longest=0.000s, average=0.000 s; distance=32768 kB, estimate=32768 kB; lsn=0/16000080, redo lsn=0/16000028
chaol@ChaodeMacBook-Air cndb % pg_verifybackup -F t -n /tmp/bkup_lz4
backup successfully verified
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Thu, Mar 5, 2026 at 2:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> I suspect whoever wrote this thought pg_log_error is equivalent
> to elog(ERROR), but it's not; it just prints a message. It seems
> highly unlikely to me that continuing onwards will result in a
> good outcome. I'm a bit inclined to s/pg_log_error/pg_fatal/
> throughout these files, at least in places where there's no
> visible effort to handle the error.
After looking through fe_utils, pg_dump, pg_basebackup, and
pg_verifybackup, I found the attached places that seem to
need cleanup. There are a couple other places where we
are not treating failures as fatal, but those seem intentional,
eg not fatal'ing on close() failure for an input file.
regards, tom lane
I also thought pg_log_error behaves the same way as elog(ERROR). Noted now.
The cleanup looks good to me. I also did a broader search, and didn't find a more similar place to clean up.