On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote:
> > On 16 Jun 2021, at 01:50, Jacob Champion <pchampion@vmware.com> wrote:
> > I've been tracking down reference leaks in the client. These open
> > references prevent NSS from shutting down cleanly, which then makes it
> > impossible to open a new context in the future. This probably affects
> > other libpq clients more than it affects psql.
>
> Ah, nice catch, that's indeed a bug in the frontend implementation. The
> problem is that the NSS trustdomain cache *must* be empty before shutting down
> the context, else this very issue happens. Note this in be_tls_destroy():
>
> /*
> * It reads a bit odd to clear a session cache when we are destroying the
> * context altogether, but if the session cache isn't cleared before
> * shutting down the context it will fail with SEC_ERROR_BUSY.
> */
> SSL_ClearSessionCache();
>
> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.
That's unfortunate. The session cache is global, right? So I'm guessing
we'll need to refcount and lock that call, to avoid cleaning up out
from under a thread that's actively using the the cache?
> There is another resource leak left (visible in one test after the above is
> added), the SECMOD module needs to be unloaded in case it's been loaded.
> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
>
> [...]
>
> With your patches I'm seeing a couple of these:
>
> SSL error: The one-time function was previously called and failed. Its error code is no longer available
Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the
moment) fixes all of the SSL tests for me, and I don't see either the
SECMOD leak or the "one-time function" error that you've mentioned.
What version of NSS are you running? I'm on 3.63.
I've attached my current patchset (based on v37) for comparison.
> > I am currently stuck on one last failing test. This leak seems to only
> > show up when using TLSv1.2 or below.
>
> AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting
> renegotiation.
Nice, at least that mystery is solved. :D
Thanks,
--Jacob