Обсуждение: basebackup: add missing deflateEnd() in gzip compression sink

Поиск

Список

Период

Сортировка

basebackup: add missing deflateEnd() in gzip compression sink

От

Jianghua Yang

Дата:

20 марта, 20:02:44

Hi hackers,

While reviewing the backup compression backends, I noticed that the gzip
sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
zstd sinks which properly free their compression contexts.

Specifically:

- lz4 calls LZ4F_freeCompressionContext() in both end_archive and a
dedicated bbsink_lz4_cleanup() callback.
- zstd calls ZSTD_freeCCtx() in both end_archive and a dedicated
bbsink_zstd_cleanup() callback.
- gzip calls deflateInit2() in begin_archive but never calls
deflateEnd(). Its cleanup callback is bbsink_forward_cleanup,
which just forwards to the next sink without releasing the zlib
context.

This means each archive (one per tablespace) leaks ~256KB of zlib
internal state until the backup's memory context is destroyed. On the
ERROR path, the zlib context is not released at all since there is no
gzip-specific cleanup callback.

The attached patch adds deflateEnd() at the end of
bbsink_gzip_end_archive() for the normal path, and introduces a new
bbsink_gzip_cleanup() for the error path. The new cleanup callback
follows the exact same call chain as bbsink_lz4_cleanup and
bbsink_zstd_cleanup:

PG_FINALLY (basebackup.c:1063)
-> bbsink_cleanup(sink) (basebackup.c:1065)
-> sink->bbs_ops->cleanup(sink) (basebackup_sink.h:269)
-> bbsink_gzip_cleanup() -- now calls deflateEnd()

Previously the gzip slot in this chain pointed to
bbsink_forward_cleanup, which just forwarded to the next sink
without doing any gzip-specific resource release.

Since z_stream is an embedded struct (not a pointer like LZ4F_cctx or
ZSTD_CCtx), a bool flag "zstream_initialized" is used to track whether
deflateEnd() needs to be called.

Tested with pg_basebackup using gzip compression at levels 1, 5
(default), and 9, including server-side compression. All produced
valid .tar.gz files that pass gzip -t integrity checks and restore
correctly.

Regards,
Jianghua Yang

Вложения

0001-basebackup-add-missing-deflateEnd-calls-in-gzip-comp.patch

Re: basebackup: add missing deflateEnd() in gzip compression sink

От

Michael Paquier

Дата:

21 марта, 09:09:31

On Fri, Mar 20, 2026 at 10:02:44AM -0700, Jianghua Yang wrote:
> While reviewing the backup compression backends, I noticed that the gzip
> sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
> zstd sinks which properly free their compression contexts.
>
> This means each archive (one per tablespace) leaks ~256KB of zlib
> internal state until the backup's memory context is destroyed. On the
> ERROR path, the zlib context is not released at all since there is no
> gzip-specific cleanup callback.

Yes, you are right on this one.  This code could be better in cleaning
up the resources it is working on (and note that all the other
gzip-related code paths call the end() part if doing an init() or an
init2()).  Leaking that once per archive may not look like a big deal,
but this is backend-side code we are talking about, and on a
persistent replication connection it is not really cool to lose the
context of this data with garbage coming from zlib piling up, so based
on the argument of the permanent connection my take is that this
cleanup warrants a backpatch.
--
Michael

Вложения

signature.asc

Re: basebackup: add missing deflateEnd() in gzip compression sink

От

Jianghua Yang

Дата:

21 марта, 16:09:46

Thanks for reviewing, Michael.

> based on the argument of the permanent connection my take is that
> this cleanup warrants a backpatch.

Agreed. The current code leaks zlib internal state on every archive,
and on a long-lived replication connection those allocations just
pile up with no cleanup until the connection ends.

Michael Paquier <michael@paquier.xyz> 于2026年3月20日周五 23:09写道：

On Fri, Mar 20, 2026 at 10:02:44AM -0700, Jianghua Yang wrote:
> While reviewing the backup compression backends, I noticed that the gzip
> sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
> zstd sinks which properly free their compression contexts.
>
> This means each archive (one per tablespace) leaks ~256KB of zlib
> internal state until the backup's memory context is destroyed. On the
> ERROR path, the zlib context is not released at all since there is no
> gzip-specific cleanup callback.

Yes, you are right on this one. This code could be better in cleaning
up the resources it is working on (and note that all the other
gzip-related code paths call the end() part if doing an init() or an
init2()). Leaking that once per archive may not look like a big deal,
but this is backend-side code we are talking about, and on a
persistent replication connection it is not really cool to lose the
context of this data with garbage coming from zlib piling up, so based
on the argument of the permanent connection my take is that this
cleanup warrants a backpatch.
--
Michael

Re: basebackup: add missing deflateEnd() in gzip compression sink

От

Jianghua Yang

Дата:

22 марта, 00:22:25

Hi Michael,

Thanks for the review and backpatch suggestion.

Here is v1 as a two-patch series. The first patch is unchanged from the
original submission. I've added a second patch that fixes another resource
leak I noticed nearby in pg_basebackup:

v1-0001: basebackup: add missing deflateEnd() calls in gzip compression sink
v1-0002: pg_basebackup: add missing close() for incremental manifest file

In the incremental backup code path, the file descriptor opened for reading
the manifest is never closed after the upload completes. Since the backup
can run for a long time after this point, the leaked descriptor remains
open for the entire duration.

Regards,
Jianghua Yang

Jianghua Yang <yjhjstz@gmail.com> 于2026年3月21日周六 06:09写道：

Thanks for reviewing, Michael.

> based on the argument of the permanent connection my take is that
> this cleanup warrants a backpatch.

Agreed. The current code leaks zlib internal state on every archive,
and on a long-lived replication connection those allocations just
pile up with no cleanup until the connection ends.

Michael Paquier <michael@paquier.xyz> 于2026年3月20日周五 23:09写道：
On Fri, Mar 20, 2026 at 10:02:44AM -0700, Jianghua Yang wrote:
> While reviewing the backup compression backends, I noticed that the gzip
> sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
> zstd sinks which properly free their compression contexts.
>
> This means each archive (one per tablespace) leaks ~256KB of zlib
> internal state until the backup's memory context is destroyed. On the
> ERROR path, the zlib context is not released at all since there is no
> gzip-specific cleanup callback.

Yes, you are right on this one. This code could be better in cleaning
up the resources it is working on (and note that all the other
gzip-related code paths call the end() part if doing an init() or an
init2()). Leaking that once per archive may not look like a big deal,
but this is backend-side code we are talking about, and on a
persistent replication connection it is not really cool to lose the
context of this data with garbage coming from zlib piling up, so based
on the argument of the permanent connection my take is that this
cleanup warrants a backpatch.
--
Michael

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

От

Michael Paquier

Дата:

23 марта, 03:31:36

On Sat, Mar 21, 2026 at 02:22:25PM -0700, Jianghua Yang wrote:
> v1-0001: basebackup: add missing deflateEnd() calls in gzip compression
> sink

After double-checking the whole code, I agree that this is a good
practice to have in the tree.  However, the issue is not worth
bothering in back-branches as the server-side base backup gzip code
relies on allocation and free callbacks, with zlib internals doing
nothing with fds or more persistent states as far as I have read its
code.  For the current use, we'd bloat this data once per tablespace
in a single base backup, safe even if the connection is persistent
(missed that in my first message).

What I am more worried about are future callers of this code, though,
and we care about having a end() call for each matching init[2]() call
in the tree in all the places that rely on gzip internals.  So that's
a good practice on consistency ground, at least.  For these reasons,
applied that on HEAD.

> v1-0002: pg_basebackup: add missing close() for incremental manifest
> file

This one does not matter.  This resource is for a backup manifest and
we are talking about a single one for a single invocation of the
binary.
--
Michael

Вложения

signature.asc

Re: basebackup: add missing deflateEnd() in gzip compression sink

От

Jianghua Yang

Дата:

23 марта, 04:36:22

Hi Michael,

Thank you for reviewing and committing the patch!

noted on v1-0002 — understood that the incremental manifest file close isn't an issue there.

Thanks again.

Regards,
Jianghua Yang

Michael Paquier <michael@paquier.xyz> 于2026年3月22日周日 17:32写道：

On Sat, Mar 21, 2026 at 02:22:25PM -0700, Jianghua Yang wrote:
> v1-0001: basebackup: add missing deflateEnd() calls in gzip compression
> sink

After double-checking the whole code, I agree that this is a good
practice to have in the tree. However, the issue is not worth
bothering in back-branches as the server-side base backup gzip code
relies on allocation and free callbacks, with zlib internals doing
nothing with fds or more persistent states as far as I have read its
code. For the current use, we'd bloat this data once per tablespace
in a single base backup, safe even if the connection is persistent
(missed that in my first message).

What I am more worried about are future callers of this code, though,
and we care about having a end() call for each matching init[2]() call
in the tree in all the places that rely on gzip internals. So that's
a good practice on consistency ground, at least. For these reasons,
applied that on HEAD.

> v1-0002: pg_basebackup: add missing close() for incremental manifest
> file

This one does not matter. This resource is for a backup manifest and
we are talking about a single one for a single invocation of the
binary.
--
Michael

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: basebackup: add missing deflateEnd() in gzip compression sink

basebackup: add missing deflateEnd() in gzip compression sink

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

Re: basebackup: add missing deflateEnd() in gzip compression sink

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink