Обсуждение: pg_basebackup: removed an unnecessary use of memset in FindStreamingStart
Hi Hackers,
When I read the FindStreamingStart function in pg_receivewal.c, I discovered an unnecessary use of memset.So I removed it, optimizing the performance without affecting its functionality.
The following is the detailed analysis of the reasons:
1.LZ4F_decompress will fully overwrite the output buffer:
When out_size is passed as an input parameter, it denotes the size of the output buffer (outbuf). The decompression operation writes the decompressed data to outbuf. Upon function return, out_size is updated to reflect the actual number of bytes written. Notably, even in cases of partial decompression, data is written starting from the initial position of outbuf.
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.
When out_size is passed as an input parameter, it denotes the size of the output buffer (outbuf). The decompression operation writes the decompressed data to outbuf. Upon function return, out_size is updated to reflect the actual number of bytes written. Notably, even in cases of partial decompression, data is written starting from the initial position of outbuf.
2.Performance Overhead
In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since these memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumption of CPU resources.
Regards,
Yang Yuanzhuo
Вложения
> On Feb 25, 2026, at 14:31, yangyz <1197620467@qq.com> wrote:
>
> Hi Hackers,
>
> When I read the FindStreamingStart function in pg_receivewal.c, I discovered an unnecessary use of memset.So I
removedit, optimizing the performance without affecting its functionality.
>
> The following is the detailed analysis of the reasons:
> 1.LZ4F_decompress will fully overwrite the output buffer:
> When out_size is passed as an input parameter, it denotes the size of the output buffer (outbuf). The decompression
operationwrites the decompressed data to outbuf. Upon function return, out_size is updated to reflect the actual number
ofbytes written. Notably, even in cases of partial decompression, data is written starting from the initial position of
outbuf.
> 2.Performance Overhead
> In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since
thesememory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary
consumptionof CPU resources.
>
> Regards,
> Yang Yuanzhuo
>
>
>
> <v1-0001-Removed-an-unnecessary-use-of-memset-in-FindStrea.patch>
Looking at the code snippet:
```
while (readp < readend)
{
size_t out_size = LZ4_CHUNK_SZ;
size_t read_size = readend - readp;
memset(outbuf, 0, LZ4_CHUNK_SZ);
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress file \"%s\": %s",
fullpath,
LZ4F_getErrorName(status));
readp += read_size;
uncompressed_size += out_size;
}
```
It’s trying to locate the start position, and the decoded bytes are not consumed (they’re effectively discarded). Given
thatLZ4F_decompress() reports the produced size via out_size, zeroing the whole output buffer beforehand doesn’t seem
necessaryhere. Since this happens inside the loop, the extra memset() just amplifies the overhead.
Also, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero the output buffer at all:
```
while (readp < readend)
{
size_t out_size = DEFAULT_IO_BUFFER_SIZE;
size_t read_size = readend - readp;
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress: %s",
LZ4F_getErrorName(status));
ahwrite(outbuf, 1, out_size, AH);
readp += read_size;
}
```
So +1 for removing the memset.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Re: pg_basebackup: removed an unnecessary use of memset in FindStreamingStart
От
Daniel Gustafsson
Дата:
> On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote: > 2.Performance Overhead > In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since thesememory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumptionof CPU resources. When proposing a performance improvement it's important to provide some level of benchmarks to show the improvement. Is removing this memset noticeable? -- Daniel Gustafsson
> On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote: > >> 2.Performance Overhead >> In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Since thesememory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessary consumptionof CPU resources. > > When proposing a performance improvement it's important to provide some level > of benchmarks to show the improvement. Is removing this memset noticeable? > > -- > Daniel Gustafsson > I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real benefitseems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I thinkthis change mainly improves maintainability. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Re: pg_basebackup: removed an unnecessary use of memset in FindStreamingStart
От
Daniel Gustafsson
Дата:
> On 25 Feb 2026, at 13:41, Chao Li <li.evan.chao@gmail.com> wrote: > >> On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote: >> >>> On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote: >> >>> 2.Performance Overhead >>> In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized. Sincethese memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an unnecessaryconsumption of CPU resources. >> >> When proposing a performance improvement it's important to provide some level >> of benchmarks to show the improvement. Is removing this memset noticeable? > > I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real benefitseems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I thinkthis change mainly improves maintainability. I would argue the opposite, clearing a buffer before passing it to an external library function writing to it seems the right thing to do unless it can be proven to regress performance too much. Also, "appears unnecessary" doesn't instill enough confidence to perform a change IMO. -- Daniel Gustafsson
> On Feb 25, 2026, at 21:10, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 25 Feb 2026, at 13:41, Chao Li <li.evan.chao@gmail.com> wrote:
>>
>>> On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>
>>>> On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
>>>
>>>> 2.Performance Overhead
>>>> In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized.
Sincethese memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an
unnecessaryconsumption of CPU resources.
>>>
>>> When proposing a performance improvement it's important to provide some level
>>> of benchmarks to show the improvement. Is removing this memset noticeable?
>>
>> I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real
benefitseems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I
thinkthis change mainly improves maintainability.
>
> I would argue the opposite, clearing a buffer before passing it to an external
> library function writing to it seems the right thing to do unless it can be
> proven to regress performance too much. Also, "appears unnecessary" doesn't
> instill enough confidence to perform a change IMO.
>
> --
> Daniel Gustafsson
As I pointed out earlier, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero out the output buffer:
```
while (readp < readend)
{
size_t out_size = DEFAULT_IO_BUFFER_SIZE;
size_t read_size = readend - readp;
status = LZ4F_decompress(ctx, outbuf, &out_size,
readp, &read_size, &dec_opt);
if (LZ4F_isError(status))
pg_fatal("could not decompress: %s",
LZ4F_getErrorName(status));
ahwrite(outbuf, 1, out_size, AH);
readp += read_size;
}
```
Do you think we should add a memset there? There are a couple of more callers of LZ4F_decompress that don’t zero out
theoutput buffer.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
> On Feb 25, 2026, at 21:54, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On Feb 25, 2026, at 21:10, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>>> On 25 Feb 2026, at 13:41, Chao Li <li.evan.chao@gmail.com> wrote:
>>>
>>>> On Feb 25, 2026, at 18:21, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>>
>>>>> On 25 Feb 2026, at 07:31, yangyz <1197620467@qq.com> wrote:
>>>>
>>>>> 2.Performance Overhead
>>>>> In each iteration, the entire buffer of size LZ4_CHUNK_SZ (potentially several megabytes) is zero-initialized.
Sincethese memory blocks are immediately overwritten by decompressed data, this zeroing operation constitutes an
unnecessaryconsumption of CPU resources.
>>>>
>>>> When proposing a performance improvement it's important to provide some level
>>>> of benchmarks to show the improvement. Is removing this memset noticeable?
>>>
>>> I don’t think this patch is about performance. Although removing the memset might save a few CPU cycles, the real
benefitseems to be cleanup and consistency. The memset appears unnecessary, and similar functions don’t use it, so I
thinkthis change mainly improves maintainability.
>>
>> I would argue the opposite, clearing a buffer before passing it to an external
>> library function writing to it seems the right thing to do unless it can be
>> proven to regress performance too much. Also, "appears unnecessary" doesn't
>> instill enough confidence to perform a change IMO.
>>
>> --
>> Daniel Gustafsson
>
>
> As I pointed out earlier, ReadDataFromArchiveLZ4() has a very similar loop that doesn’t zero out the output buffer:
> ```
> while (readp < readend)
> {
> size_t out_size = DEFAULT_IO_BUFFER_SIZE;
> size_t read_size = readend - readp;
>
> status = LZ4F_decompress(ctx, outbuf, &out_size,
> readp, &read_size, &dec_opt);
> if (LZ4F_isError(status))
> pg_fatal("could not decompress: %s",
> LZ4F_getErrorName(status));
>
> ahwrite(outbuf, 1, out_size, AH);
> readp += read_size;
> }
> ```
>
> Do you think we should add a memset there? There are a couple of more callers of LZ4F_decompress that don’t zero out
theoutput buffer.
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
Adding the original author to see if he still remember what was the intention of the memset.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/