Обсуждение: Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

Поиск
Список
Период
Сортировка

Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

От
Jacob Champion
Дата:
[moving to -hackers]

On Fri, Apr 17, 2026 at 12:14 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
>
> On Fri, Apr 17, 2026 at 7:33 AM Daniel Schreiber
> <daniel.schreiber@hrz.tu-chemnitz.de> wrote:
> > my colleagues and I probably found a bug in libpq when libpq is dlopened
> > and closed multiple times during the lifetime of a process. In our setup
> > we use a PAM module which links to libpq. The process using PAM is
> > linked against openssl, so openssl is loaded during the complete
> > lifetime of the process whereas libpq is loaded only during PAM
> > authentication (and unloaded when PAM has finished).
> >
> > [snip]
> >
> > According to our findings every time a connection is established after
> > dlopening libpq one of the 127 available BIO_METHOD structures in
> > OpenSSL is consumed:
> > https://github.com/postgres/postgres/blob/REL_17_9/src/interfaces/libpq/fe-secure-openssl.c#L1987
>
> Right. I think in this *particular* case, we should simply skip the
> call to BIO_get_new_index(). We don't need it, IIUC.

Attached is a proposal to do that.

> But I think we may also need to set expectations on whether or not
> infinite dlopen/dlclose loops are supported in general. If we ever
> come across a situation in which a call to BIO_get_new_index() is
> necessary, that leak just fundamentally can't be plugged. The same is
> true for any third-party libraries (or their dependencies, or
> theirs...) that require "one-time", irreversible calls which can't be
> tracked after we're unloaded. And we can't push these concerns up to
> the top level application developer, because they don't know we exist.
>
> (I'd be surprised if this were the only such resource leak across all
> supported versions and combinations of Kerberos, OpenSSL, OpenLDAP,
> Curl, etc. etc. From a quick search, you're the first to report this
> in the ten years since the leak was introduced, so there may be more
> dragons where you're headed.)

If anyone has thoughts on that, I'd love to hear them. I don't mind
removing this unnecessary code in HEAD, or even backpatching as a
courtesy -- but if it were up to me, I would not guarantee zero global
resource leaks across libpq and its entire dependency graph. (Even if
we magically had control over all those dependencies, I think it'd
still be reasonable for libpq devs to use "allocate once and move on"
patterns... and I want to continue using those in my new code.)

Thanks,
--Jacob

Вложения

Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

От
Nico Williams
Дата:
On Wed, Apr 22, 2026 at 11:29:04AM -0700, Jacob Champion wrote:
> > (I'd be surprised if this were the only such resource leak across all
> > supported versions and combinations of Kerberos, OpenSSL, OpenLDAP,
> > Curl, etc. etc. From a quick search, you're the first to report this
> > in the ten years since the leak was introduced, so there may be more
> > dragons where you're headed.)
> 
> If anyone has thoughts on that, I'd love to hear them. I don't mind
> removing this unnecessary code in HEAD, or even backpatching as a
> courtesy -- but if it were up to me, I would not guarantee zero global
> resource leaks across libpq and its entire dependency graph. (Even if
> we magically had control over all those dependencies, I think it'd
> still be reasonable for libpq devs to use "allocate once and move on"
> patterns... and I want to continue using those in my new code.)

Leaking a dl handle is a way to prevent unloading.  Not saying that's a
great answer, just that it's a workaround.



Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

От
Tom Lane
Дата:
Jacob Champion <jacob.champion@enterprisedb.com> writes:
> If anyone has thoughts on that, I'd love to hear them. I don't mind
> removing this unnecessary code in HEAD, or even backpatching as a
> courtesy -- but if it were up to me, I would not guarantee zero global
> resource leaks across libpq and its entire dependency graph.

I agree that we have no real ability to guarantee that.
Still, as far as the presented patch goes, it seems like a clear
win so I'd vote for fix-and-backpatch.

Should we write the arguments as BIO_TYPE_NONE | BIO_TYPE_SOURCE_SINK
rather than just BIO_TYPE_SOURCE_SINK?

            regards, tom lane



Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

От
Jacob Champion
Дата:
On Wed, Apr 22, 2026 at 12:22 PM Nico Williams <nico@cryptonector.com> wrote:
> Leaking a dl handle is a way to prevent unloading.  Not saying that's a
> great answer, just that it's a workaround.

Hmm, I did that for our handle to libpq-oauth, but I imagine that
leaking a handle to _ourselves_ may make someone very unhappy with us
at some point? Plus, it might kick off the tiniest, most pointless
arms race:

    // why does libpq do this
    dlclose(libpq_handle);
    dlclose(libpq_handle);

I guess we could play around with RTLD_NODELETE... Something to keep
in the back pocket, maybe?

--Jacob



Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

От
Jacob Champion
Дата:
On Wed, Apr 22, 2026 at 12:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that we have no real ability to guarantee that.
> Still, as far as the presented patch goes, it seems like a clear
> win so I'd vote for fix-and-backpatch.

Sounds good to me.

> Should we write the arguments as BIO_TYPE_NONE | BIO_TYPE_SOURCE_SINK
> rather than just BIO_TYPE_SOURCE_SINK?

Good question... Popularity-wise, the shorter spelling shows up across
quite a few projects on GitHub, but the only spelling of
`BIO_meth_new(BIO_TYPE_NONE | ...)` that I can find is a single place
inside OpenSSL's own test suite -- which also uses the shorter
alternative, in two places. So my vote is BIO_TYPE_SOURCE_SINK; we'll
be in good company.

Thanks,
--Jacob



Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times

От
Tom Lane
Дата:
Jacob Champion <jacob.champion@enterprisedb.com> writes:
> On Wed, Apr 22, 2026 at 12:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Should we write the arguments as BIO_TYPE_NONE | BIO_TYPE_SOURCE_SINK
>> rather than just BIO_TYPE_SOURCE_SINK?

> Good question... Popularity-wise, the shorter spelling shows up across
> quite a few projects on GitHub, but the only spelling of
> `BIO_meth_new(BIO_TYPE_NONE | ...)` that I can find is a single place
> inside OpenSSL's own test suite -- which also uses the shorter
> alternative, in two places. So my vote is BIO_TYPE_SOURCE_SINK; we'll
> be in good company.

Fair enough.

            regards, tom lane