Обсуждение: improve ssl error code, 2147483650

Поиск
Список
Период
Сортировка

improve ssl error code, 2147483650

От
David Zhang
Дата:
Hi Hackers,

When configuring SSL on the Postgres server side with the following 
information:

ssl = on
ssl_ca_file = 'root_ca.crt'
ssl_cert_file = 'server-cn-only.crt'
ssl_key_file = 'server-cn-only.key'

If a user makes a mistake, for example, accidentally using 'root_ca.crl' 
instead of 'root_ca.crt', Postgres will report an error like the one below:
FATAL:  could not load root certificate file "root_ca.crl": SSL error 
code 2147483650

Here, the error code 2147483650 is not very helpful for the user. The 
reason is that Postgres is missing the initial SSL file check before 
passing it to the OpenSSL API. For example:

     if (ssl_ca_file[0])
     {
         STACK_OF(X509_NAME) * root_cert_list;

         if (SSL_CTX_load_verify_locations(context, ssl_ca_file, NULL) 
!= 1 ||
             (root_cert_list = SSL_load_client_CA_file(ssl_ca_file)) == 
NULL)
         {
             ereport(isServerStart ? FATAL : LOG,
                     (errcode(ERRCODE_CONFIG_FILE_ERROR),
                      errmsg("could not load root certificate file 
\"%s\": %s",
                             ssl_ca_file, SSLerrmessage(ERR_get_error()))));
             goto error;
         }

The SSL_CTX_load_verify_locations function in OpenSSL will return NULL 
if there is a system error, such as "No such file or directory" in this 
case:

const char *ERR_reason_error_string(unsigned long e)
{
     ERR_STRING_DATA d, *p = NULL;
     unsigned long l, r;

     if (!RUN_ONCE(&err_string_init, do_err_strings_init)) {
         return NULL;
     }

     /*
      * ERR_reason_error_string() can't safely return system error strings,
      * since openssl_strerror_r() needs a buffer for thread safety, and we
      * haven't got one that would serve any sensible purpose.
      */
     if (ERR_SYSTEM_ERROR(e))
         return NULL;

It would be better to perform a simple SSL file check before passing the 
SSL file to OpenSSL APIs so that the system error can be captured and a 
meaningful message provided to the end user.

Attached is a simple patch to help address this issue for ssl_ca_file, 
ssl_cert_file, and ssl_crl_file. With this patch, a similar test will 
return something like the message below:
FATAL:  could not access certificate file "root_ca.crl": No such file or 
directory

I believe this can help end users quickly realize the mistake.


Thank you,
David

Вложения

Re: improve ssl error code, 2147483650

От
Heikki Linnakangas
Дата:
On 07/03/2024 02:12, David Zhang wrote:
> The SSL_CTX_load_verify_locations function in OpenSSL will return NULL
> if there is a system error, such as "No such file or directory" in this
> case:
> 
> const char *ERR_reason_error_string(unsigned long e)
> {
>       ERR_STRING_DATA d, *p = NULL;
>       unsigned long l, r;
> 
>       if (!RUN_ONCE(&err_string_init, do_err_strings_init)) {
>           return NULL;
>       }
> 
>       /*
>        * ERR_reason_error_string() can't safely return system error strings,
>        * since openssl_strerror_r() needs a buffer for thread safety, and we
>        * haven't got one that would serve any sensible purpose.
>        */
>       if (ERR_SYSTEM_ERROR(e))
>           return NULL;

That's pretty unfortunate. As typical with OpenSSL, this stuff is not 
very well documented, but I think we could do something like this in 
SSLerrmessage():

if (ERR_SYSTEM_ERROR(e))
     errreason = strerror(ERR_GET_REASON(e));

ERR_SYSTEM_ERROR only exists in OpenSSL 3.0 and above, and the only 
documentation I could find was in this one obscure place in the man 
pages: 
https://www.openssl.org/docs/man3.2/man3/BIO_dgram_get_local_addr_enable.html. 
But as a best-effort thing, it would still be better than "SSL error 
code  2147483650".

> It would be better to perform a simple SSL file check before passing the
> SSL file to OpenSSL APIs so that the system error can be captured and a
> meaningful message provided to the end user.

That feels pretty ugly. I agree it would catch most of the common 
mistakes in practice, so maybe we should just hold our noses and do it 
anyway, if the above ERR_SYSTEM_ERROR() method doesn't work.

It's sad that we cannot pass a file descriptor or in-memory copy of the 
file contents to those functions.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: improve ssl error code, 2147483650

От
Stephen Frost
Дата:
Greetings,

* Heikki Linnakangas (hlinnaka@iki.fi) wrote:
> On 07/03/2024 02:12, David Zhang wrote:
> > The SSL_CTX_load_verify_locations function in OpenSSL will return NULL
> > if there is a system error, such as "No such file or directory" in this
> > case:
> >
> > const char *ERR_reason_error_string(unsigned long e)
> > {
> >       ERR_STRING_DATA d, *p = NULL;
> >       unsigned long l, r;
> >
> >       if (!RUN_ONCE(&err_string_init, do_err_strings_init)) {
> >           return NULL;
> >       }
> >
> >       /*
> >        * ERR_reason_error_string() can't safely return system error strings,
> >        * since openssl_strerror_r() needs a buffer for thread safety, and we
> >        * haven't got one that would serve any sensible purpose.
> >        */
> >       if (ERR_SYSTEM_ERROR(e))
> >           return NULL;
>
> That's pretty unfortunate. As typical with OpenSSL, this stuff is not very
> well documented, but I think we could do something like this in
> SSLerrmessage():
>
> if (ERR_SYSTEM_ERROR(e))
>     errreason = strerror(ERR_GET_REASON(e));
>
> ERR_SYSTEM_ERROR only exists in OpenSSL 3.0 and above, and the only
> documentation I could find was in this one obscure place in the man pages:
https://www.openssl.org/docs/man3.2/man3/BIO_dgram_get_local_addr_enable.html.
> But as a best-effort thing, it would still be better than "SSL error code
> 2147483650".

Agreed that it doesn't seem well documented.  I was trying to figure out
what the 'right' answer here was myself and not having much success.  If
the above works, then +1 to that.

> > It would be better to perform a simple SSL file check before passing the
> > SSL file to OpenSSL APIs so that the system error can be captured and a
> > meaningful message provided to the end user.
>
> That feels pretty ugly. I agree it would catch most of the common mistakes
> in practice, so maybe we should just hold our noses and do it anyway, if the
> above ERR_SYSTEM_ERROR() method doesn't work.

Yeah, seems better to try and handle this the OpenSSL way ... if that's
possible to do.

> It's sad that we cannot pass a file descriptor or in-memory copy of the file
> contents to those functions.

Agreed.

Thanks!

Stephen

Вложения

Re: improve ssl error code, 2147483650

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Heikki Linnakangas (hlinnaka@iki.fi) wrote:
>> That's pretty unfortunate. As typical with OpenSSL, this stuff is not very
>> well documented, but I think we could do something like this in
>> SSLerrmessage():
>>
>> if (ERR_SYSTEM_ERROR(e))
>> errreason = strerror(ERR_GET_REASON(e));
>>
>> ERR_SYSTEM_ERROR only exists in OpenSSL 3.0 and above, and the only
>> documentation I could find was in this one obscure place in the man pages:
https://www.openssl.org/docs/man3.2/man3/BIO_dgram_get_local_addr_enable.html.
>> But as a best-effort thing, it would still be better than "SSL error code
>> 2147483650".

> Agreed that it doesn't seem well documented.  I was trying to figure out
> what the 'right' answer here was myself and not having much success.  If
> the above works, then +1 to that.

My reaction as well --- I was just gearing up to test this idea,
unless one of you are already on it?

            regards, tom lane



Re: improve ssl error code, 2147483650

От
Tom Lane
Дата:
David Zhang <david.zhang@highgo.ca> writes:
> When configuring SSL on the Postgres server side with the following
> information:

> ssl = on
> ssl_ca_file = 'root_ca.crt'
> ssl_cert_file = 'server-cn-only.crt'
> ssl_key_file = 'server-cn-only.key'

> If a user makes a mistake, for example, accidentally using 'root_ca.crl'
> instead of 'root_ca.crt', Postgres will report an error like the one below:
> FATAL:  could not load root certificate file "root_ca.crl": SSL error
> code 2147483650

Interestingly, this works fine for me on RHEL8 (with openssl-1.1.1k):

2024-03-07 12:57:53.432 EST [547522] FATAL:  F0000: could not load root certificate file "foo.bar": No such file or
directory
2024-03-07 12:57:53.432 EST [547522] LOCATION:  be_tls_init, be-secure-openssl.c:306

I do reproduce your problem on Fedora 39 with openssl-3.1.1.
So this seems to be a regression on OpenSSL's part.  Maybe
they'll figure out how to fix it sometime; that seems to be
another good argument for not pre-empting their error handling.

            regards, tom lane



Re: improve ssl error code, 2147483650

От
Tom Lane
Дата:
I wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> Agreed that it doesn't seem well documented.  I was trying to figure out
>> what the 'right' answer here was myself and not having much success.  If
>> the above works, then +1 to that.

> My reaction as well --- I was just gearing up to test this idea,
> unless one of you are already on it?

I've confirmed that this:

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index e12b1cc9e3..47eee4b59d 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1363,6 +1363,10 @@ SSLerrmessage(unsigned long ecode)
     errreason = ERR_reason_error_string(ecode);
     if (errreason != NULL)
         return errreason;
+#ifdef ERR_SYSTEM_ERROR
+    if (ERR_SYSTEM_ERROR(ecode))
+        return strerror(ERR_GET_REASON(ecode));
+#endif
     snprintf(errbuf, sizeof(errbuf), _("SSL error code %lu"), ecode);
     return errbuf;
 }

seems to be enough to fix the problem on OpenSSL 3.1.1.  The #ifdef
is needed to avoid compile failure against OpenSSL 1.1.1 --- but that
version doesn't have the problem, so we don't need to sweat.

This could probably do with a comment, and we need to propagate
the fix into libpq's copy of the function too.  Barring objections,
I'll take care of that and push it later today.

            regards, tom lane



Re: improve ssl error code, 2147483650

От
Daniel Gustafsson
Дата:
> On 7 Mar 2024, at 20:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> Agreed that it doesn't seem well documented.  I was trying to figure out
>>> what the 'right' answer here was myself and not having much success.  If
>>> the above works, then +1 to that.
>
>> My reaction as well --- I was just gearing up to test this idea,
>> unless one of you are already on it?
>
> I've confirmed that this:
>
> diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
> index e12b1cc9e3..47eee4b59d 100644
> --- a/src/backend/libpq/be-secure-openssl.c
> +++ b/src/backend/libpq/be-secure-openssl.c
> @@ -1363,6 +1363,10 @@ SSLerrmessage(unsigned long ecode)
>     errreason = ERR_reason_error_string(ecode);
>     if (errreason != NULL)
>         return errreason;
> +#ifdef ERR_SYSTEM_ERROR
> +    if (ERR_SYSTEM_ERROR(ecode))
> +        return strerror(ERR_GET_REASON(ecode));
> +#endif
>     snprintf(errbuf, sizeof(errbuf), _("SSL error code %lu"), ecode);
>     return errbuf;
> }
>
> seems to be enough to fix the problem on OpenSSL 3.1.1.  The #ifdef
> is needed to avoid compile failure against OpenSSL 1.1.1 --- but that
> version doesn't have the problem, so we don't need to sweat.

This was introduced in OpenSSL 3.0.0 so that makes sense.  Pre-3.0.0 versions
truncates system errorcodes that was outside of the range 1..127 reserving the
rest for OpenSSL specific errors.  To capture the full range possible of system
errors the code is no longer truncated and the ERR_SYSTEM_FLAG flag is set,
which can be tested for with the macro used here.

> This could probably do with a comment, and we need to propagate
> the fix into libpq's copy of the function too.  Barring objections,
> I'll take care of that and push it later today.

LGTM.

--
Daniel Gustafsson




Re: improve ssl error code, 2147483650

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 7 Mar 2024, at 20:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This could probably do with a comment, and we need to propagate
>> the fix into libpq's copy of the function too.  Barring objections,
>> I'll take care of that and push it later today.

> LGTM.

Done so far as be-secure-openssl.c and fe-secure-openssl.c are
concerned.  But I noticed that src/common/cryptohash_openssl.c and
src/common/hmac_openssl.c have their own, rather half-baked versions
of SSLerrmessage.  I didn't do anything about that in the initial
patch, because it's not clear to me that those routines would ever
see system-errno-based errors, plus their comments claim that
returning NULL isn't terribly bad.  But if we want to do something
about it, I don't think that maintaining 3 copies of the code is the
way to go.  Maybe push be-secure-openssl.c's version into src/common?

            regards, tom lane