Обсуждение: [PATCH] Fix fd leak in pg_dump compression backends when dup()+fdopen() fails

Поиск
Список
Период
Сортировка

[PATCH] Fix fd leak in pg_dump compression backends when dup()+fdopen() fails

От
Jianghua Yang
Дата:
Hi,

   I found a file descriptor leak in the pg_dump compression backends
   (gzip, lz4, zstd, and the uncompressed "none" backend), introduced
   in commit e9960732a9 ("Introduce a generic pg_dump compression API",
   PostgreSQL 16).

   == The Bug ==

   All four compression open functions use this pattern when an existing
   file descriptor is passed in:

       if (fd >= 0)
           fp = fdopen(dup(fd), mode);   /* or gzdopen() */

       if (fp == NULL)
           return false;                 /* dup'd fd is leaked here */

   The problem is that dup(fd) and fdopen()/gzdopen() are two separate
   steps, and their failure modes must be handled independently:

   1. If dup() fails (returns -1), no fd is created -- this case was
      not checked at all in the original code.

   2. If dup() succeeds but fdopen()/gzdopen() fails (e.g., due to a
      failed malloc(3) for the FILE structure), POSIX explicitly states:

        "If the fdopen() function fails, the file descriptor is
         not closed."
        -- POSIX.1-2017, fdopen() specification

      The duplicated fd therefore remains open with no owner, leaking
      until the process exits.

   == When Can dup() Fail? ==

   The most realistic trigger for dup() returning EMFILE is parallel
   pg_dump (pg_dump -j N) against a large database.  Each worker opens
   multiple file descriptors for tables, indexes, TOC files, and
   compression streams simultaneously.  On systems with a low per-process
   fd limit (e.g., ulimit -n 1024), or when dumping a schema with a very
   large number of objects, the process fd count can approach the limit.
   At that point, dup() fails with EMFILE and the subsequent NULL-check
   on fp returns false without any cleanup.

   While fdopen() failure (EMFILE/OOM) is less common, it is equally
   incorrect to ignore -- and it is precisely the case that POSIX calls
   out as the caller's responsibility to close.

   == The Fix ==

   Save the result of dup() in a local variable.  Check it immediately.
   If fdopen()/gzdopen() subsequently fails, explicitly close the
   duplicated fd before returning false.

       if (fd >= 0)
       {
           int dup_fd = dup(fd);

           if (dup_fd < 0)
               return false;
           fp = fdopen(dup_fd, mode);
           if (fp == NULL)
           {
               close(dup_fd);   /* POSIX: fdopen does not close on 
failure */
               return false;
           }
       }
       else
       {
           fp = fopen(path, mode);
           if (fp == NULL)
               return false;
       }

   The zstd fix additionally ensures that the previously allocated
   zstdcs structure is freed on all new failure paths.

   == Affected Versions ==

   PostgreSQL 16 and 17, and current master.
   The compress_* files were introduced in commit e9960732a9 (Feb 2023).

   == Patch ==

   Patch attached.  It applies cleanly to current master (as of
   commit 18bcdb75d15).

   Regards,
   Jianghua Yang
Вложения
Jianghua Yang <yjhjstz@gmail.com> writes:
>    == The Bug ==

>    All four compression open functions use this pattern when an existing
>    file descriptor is passed in:

>        if (fd >= 0)
>            fp = fdopen(dup(fd), mode);   /* or gzdopen() */

>        if (fp == NULL)
>            return false;                 /* dup'd fd is leaked here */

>    The problem is that dup(fd) and fdopen()/gzdopen() are two separate
>    steps, and their failure modes must be handled independently:

Hmm.  You're right that we could leak the dup'd FD, but would it matter?
I'm pretty sure all these programs will just exit immediately on
failure.

I'm not averse to improving the code, but I'm not sure there is
a live bug worth back-patching.

            regards, tom lane



Re: [PATCH] Fix fd leak in pg_dump compression backends when dup()+fdopen() fails

От
Jianghua Yang
Дата:
You're correct.  All callers invoke pg_fatal() on failure, so the
  process exits immediately and the OS reclaims the fd.  There is no
  live bug worth back-patching on those grounds.

  That said, the patch does fix a real diagnostic problem.  In the
  original code, when dup() fails with EMFILE, the -1 return value is
  passed directly to fdopen(), which fails with EBADF.  The user sees:

    pg_dump: error: could not open output file: Bad file descriptor

  which is misleading -- the actual cause is fd exhaustion, not a bad
  descriptor.  With the patch, errno is preserved correctly, so the
  message becomes:

    pg_dump: error: could not open output file: Too many open files

  which gives the user actionable information.

  I'm happy to limit this to HEAD only if back-patching is not
  warranted.

  Regards,
  Jianghua Yang

Tom Lane <tgl@sss.pgh.pa.us> 于2026年3月19日周四 10:08写道:
Jianghua Yang <yjhjstz@gmail.com> writes:
>    == The Bug ==

>    All four compression open functions use this pattern when an existing
>    file descriptor is passed in:

>        if (fd >= 0)
>            fp = fdopen(dup(fd), mode);   /* or gzdopen() */

>        if (fp == NULL)
>            return false;                 /* dup'd fd is leaked here */

>    The problem is that dup(fd) and fdopen()/gzdopen() are two separate
>    steps, and their failure modes must be handled independently:

Hmm.  You're right that we could leak the dup'd FD, but would it matter?
I'm pretty sure all these programs will just exit immediately on
failure.

I'm not averse to improving the code, but I'm not sure there is
a live bug worth back-patching.

                        regards, tom lane
Jianghua Yang <yjhjstz@gmail.com> writes:
>   That said, the patch does fix a real diagnostic problem.  In the
>   original code, when dup() fails with EMFILE, the -1 return value is
>   passed directly to fdopen(), which fails with EBADF.  The user sees:
>     pg_dump: error: could not open output file: Bad file descriptor
>   which is misleading -- the actual cause is fd exhaustion, not a bad
>   descriptor.  With the patch, errno is preserved correctly, so the
>   message becomes:
>     pg_dump: error: could not open output file: Too many open files
>   which gives the user actionable information.

Fair point.  Still, this is such an unlikely edge-case that
I don't think it's worth a back-patch.  Let's just do HEAD.

            regards, tom lane



I wrote:
> Fair point.  Still, this is such an unlikely edge-case that
> I don't think it's worth a back-patch.  Let's just do HEAD.

Pushed.  I also looked around for other instances of the same
pattern, and couldn't find any.

            regards, tom lane



Re: [PATCH] Fix fd leak in pg_dump compression backends when dup()+fdopen() fails

От
Jianghua Yang
Дата:
Thanks for pushing and for the extra scan.

  Regards,
  Jianghua Yang

Tom Lane <tgl@sss.pgh.pa.us> 于2026年3月19日周四 11:26写道:
I wrote:
> Fair point.  Still, this is such an unlikely edge-case that
> I don't think it's worth a back-patch.  Let's just do HEAD.

Pushed.  I also looked around for other instances of the same
pattern, and couldn't find any.

                        regards, tom lane