Обсуждение: basebackup: add missing deflateEnd() in gzip compression sink

Поиск
Список
Период
Сортировка

basebackup: add missing deflateEnd() in gzip compression sink

От
Jianghua Yang
Дата:
Hi hackers,

  While reviewing the backup compression backends, I noticed that the gzip
  sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
  zstd sinks which properly free their compression contexts.

  Specifically:

  - lz4 calls LZ4F_freeCompressionContext() in both end_archive and a
    dedicated bbsink_lz4_cleanup() callback.
  - zstd calls ZSTD_freeCCtx() in both end_archive and a dedicated
    bbsink_zstd_cleanup() callback.
  - gzip calls deflateInit2() in begin_archive but never calls
    deflateEnd(). Its cleanup callback is bbsink_forward_cleanup,
    which just forwards to the next sink without releasing the zlib
    context.

  This means each archive (one per tablespace) leaks ~256KB of zlib
  internal state until the backup's memory context is destroyed. On the
  ERROR path, the zlib context is not released at all since there is no
  gzip-specific cleanup callback.

  The attached patch adds deflateEnd() at the end of
  bbsink_gzip_end_archive() for the normal path, and introduces a new
  bbsink_gzip_cleanup() for the error path. The new cleanup callback
  follows the exact same call chain as bbsink_lz4_cleanup and
  bbsink_zstd_cleanup:

    PG_FINALLY  (basebackup.c:1063)
      -> bbsink_cleanup(sink)  (basebackup.c:1065)
        -> sink->bbs_ops->cleanup(sink)  (basebackup_sink.h:269)
          -> bbsink_gzip_cleanup()  -- now calls deflateEnd()

  Previously the gzip slot in this chain pointed to
  bbsink_forward_cleanup, which just forwarded to the next sink
  without doing any gzip-specific resource release.

  Since z_stream is an embedded struct (not a pointer like LZ4F_cctx or
  ZSTD_CCtx), a bool flag "zstream_initialized" is used to track whether
  deflateEnd() needs to be called.

  Tested with pg_basebackup using gzip compression at levels 1, 5
  (default), and 9, including server-side compression. All produced
  valid .tar.gz files that pass gzip -t integrity checks and restore
  correctly.

  Regards,
  Jianghua Yang
Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

От
Michael Paquier
Дата:
On Fri, Mar 20, 2026 at 10:02:44AM -0700, Jianghua Yang wrote:
> While reviewing the backup compression backends, I noticed that the gzip
> sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
> zstd sinks which properly free their compression contexts.
>
> This means each archive (one per tablespace) leaks ~256KB of zlib
> internal state until the backup's memory context is destroyed. On the
> ERROR path, the zlib context is not released at all since there is no
> gzip-specific cleanup callback.

Yes, you are right on this one.  This code could be better in cleaning
up the resources it is working on (and note that all the other
gzip-related code paths call the end() part if doing an init() or an
init2()).  Leaking that once per archive may not look like a big deal,
but this is backend-side code we are talking about, and on a
persistent replication connection it is not really cool to lose the
context of this data with garbage coming from zlib piling up, so based
on the argument of the permanent connection my take is that this
cleanup warrants a backpatch.
--
Michael

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

От
Jianghua Yang
Дата:
  Thanks for reviewing, Michael.

  > based on the argument of the permanent connection my take is that
  > this cleanup warrants a backpatch.

  Agreed. The current code leaks zlib internal state on every archive,
  and on a long-lived replication connection those allocations just
  pile up with no cleanup until the connection ends.

Michael Paquier <michael@paquier.xyz> 于2026年3月20日周五 23:09写道:
On Fri, Mar 20, 2026 at 10:02:44AM -0700, Jianghua Yang wrote:
> While reviewing the backup compression backends, I noticed that the gzip
> sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
> zstd sinks which properly free their compression contexts.
>
> This means each archive (one per tablespace) leaks ~256KB of zlib
> internal state until the backup's memory context is destroyed. On the
> ERROR path, the zlib context is not released at all since there is no
> gzip-specific cleanup callback.

Yes, you are right on this one.  This code could be better in cleaning
up the resources it is working on (and note that all the other
gzip-related code paths call the end() part if doing an init() or an
init2()).  Leaking that once per archive may not look like a big deal,
but this is backend-side code we are talking about, and on a
persistent replication connection it is not really cool to lose the
context of this data with garbage coming from zlib piling up, so based
on the argument of the permanent connection my take is that this
cleanup warrants a backpatch.
--
Michael

Re: basebackup: add missing deflateEnd() in gzip compression sink

От
Jianghua Yang
Дата:
 Hi Michael,                                                                                                                                                                        
                                                                                                                                                                                       
  Thanks for the review and backpatch suggestion.                                                                                                                                    
                                         
  Here is v1 as a two-patch series. The first patch is unchanged from the                                                                                                              
  original submission. I've added a second patch that fixes another resource
  leak I noticed nearby in pg_basebackup:                                                                                                                                              
                                                                                                                                                                                       
    v1-0001: basebackup: add missing deflateEnd() calls in gzip compression sink                                                                                                      
    v1-0002: pg_basebackup: add missing close() for incremental manifest file                                                                                                          
                                                                                                                                                                                       
  In the incremental backup code path, the file descriptor opened for reading                                                                                                          
  the manifest is never closed after the upload completes. Since the backup
  can run for a long time after this point, the leaked descriptor remains                                                                                                              
  open for the entire duration.   

   Regards,
  Jianghua Yang

Jianghua Yang <yjhjstz@gmail.com> 于2026年3月21日周六 06:09写道:
  Thanks for reviewing, Michael.

  > based on the argument of the permanent connection my take is that
  > this cleanup warrants a backpatch.

  Agreed. The current code leaks zlib internal state on every archive,
  and on a long-lived replication connection those allocations just
  pile up with no cleanup until the connection ends.

Michael Paquier <michael@paquier.xyz> 于2026年3月20日周五 23:09写道:
On Fri, Mar 20, 2026 at 10:02:44AM -0700, Jianghua Yang wrote:
> While reviewing the backup compression backends, I noticed that the gzip
> sink (basebackup_gzip.c) never calls deflateEnd(), unlike the lz4 and
> zstd sinks which properly free their compression contexts.
>
> This means each archive (one per tablespace) leaks ~256KB of zlib
> internal state until the backup's memory context is destroyed. On the
> ERROR path, the zlib context is not released at all since there is no
> gzip-specific cleanup callback.

Yes, you are right on this one.  This code could be better in cleaning
up the resources it is working on (and note that all the other
gzip-related code paths call the end() part if doing an init() or an
init2()).  Leaking that once per archive may not look like a big deal,
but this is backend-side code we are talking about, and on a
persistent replication connection it is not really cool to lose the
context of this data with garbage coming from zlib piling up, so based
on the argument of the permanent connection my take is that this
cleanup warrants a backpatch.
--
Michael
Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

От
Michael Paquier
Дата:
On Sat, Mar 21, 2026 at 02:22:25PM -0700, Jianghua Yang wrote:
> v1-0001: basebackup: add missing deflateEnd() calls in gzip compression
> sink

After double-checking the whole code, I agree that this is a good
practice to have in the tree.  However, the issue is not worth
bothering in back-branches as the server-side base backup gzip code
relies on allocation and free callbacks, with zlib internals doing
nothing with fds or more persistent states as far as I have read its
code.  For the current use, we'd bloat this data once per tablespace
in a single base backup, safe even if the connection is persistent
(missed that in my first message).

What I am more worried about are future callers of this code, though,
and we care about having a end() call for each matching init[2]() call
in the tree in all the places that rely on gzip internals.  So that's
a good practice on consistency ground, at least.  For these reasons,
applied that on HEAD.

> v1-0002: pg_basebackup: add missing close() for incremental manifest
> file

This one does not matter.  This resource is for a backup manifest and
we are talking about a single one for a single invocation of the
binary.
--
Michael

Вложения

Re: basebackup: add missing deflateEnd() in gzip compression sink

От
Jianghua Yang
Дата:
 Hi Michael,                                                                                                                                                                          
                                                                                                                                                                                       
  Thank you for reviewing and committing the patch!                                                                                                                                  

  noted on v1-0002 — understood that the incremental manifest file close isn't an issue there.                                                                                    
   
  Thanks again.                                                                                                                                                                        
                                                                                                                                                                                     
  Regards,
  Jianghua Yang

Michael Paquier <michael@paquier.xyz> 于2026年3月22日周日 17:32写道:
On Sat, Mar 21, 2026 at 02:22:25PM -0700, Jianghua Yang wrote:
> v1-0001: basebackup: add missing deflateEnd() calls in gzip compression
> sink

After double-checking the whole code, I agree that this is a good
practice to have in the tree.  However, the issue is not worth
bothering in back-branches as the server-side base backup gzip code
relies on allocation and free callbacks, with zlib internals doing
nothing with fds or more persistent states as far as I have read its
code.  For the current use, we'd bloat this data once per tablespace
in a single base backup, safe even if the connection is persistent
(missed that in my first message).

What I am more worried about are future callers of this code, though,
and we care about having a end() call for each matching init[2]() call
in the tree in all the places that rely on gzip internals.  So that's
a good practice on consistency ground, at least.  For these reasons,
applied that on HEAD.

> v1-0002: pg_basebackup: add missing close() for incremental manifest
> file

This one does not matter.  This resource is for a backup manifest and
we are talking about a single one for a single invocation of the
binary.
--
Michael