Обсуждение: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

Поиск
Список
Период
Сортировка

BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
william.welter@4linux.com.br
Дата:
The following bug has been logged on the website:

Bug reference:      12799
Logged by:          William Felipe Welter
Email address:      william.welter@4linux.com.br
PostgreSQL version: 9.4.1
Operating system:   Ubuntu Linux
Description:

According to OpenSSL manual
(https://www.openssl.org/docs/ssl/SSL_get_error.html#DESCRIPTION) "The
current thread's error queue must be empty before the TLS/SSL I/O operation
is attempted, or SSL_get_error() will not work reliably"

But libpq in pgsecure_read()/pqsecure_write() on branch REL9_4_STABLE or in
pgtls_read()/pgtls_write() on branch MASTER there no calls to
ERR_clear_error() to clear error queue
(https://www.openssl.org/docs/crypto/ERR_clear_error.html) and avoid
unpredictable conditions.

We can reproduce problems with currently implementation on PHP scripts that
use pgsql extension (use libpq) and openssl extension as reported on PHP bug
track on "https://bugs.php.net/bug.php?id=68276" (firstly reported as memory
corruption error) when errors from previous php callings to openssl affect
libpq calls leading to fatal errors.


The solution is simple, see following patches:

Branch master
diff --git a/src/interfaces/libpq/fe-secure-openssl.c
b/src/interfaces/libpq/fe-secure-openssl.c
index 1b9f3a4..8cf0335 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -212,6 +212,7 @@ pgtls_read(PGconn *conn, void *ptr, size_t len)

 rloop:
        SOCK_ERRNO_SET(0);
+       ERR_clear_error();
        n = SSL_read(conn->ssl, ptr, len);
        err = SSL_get_error(conn->ssl, n);
        switch (err)
@@ -320,6 +321,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
        int                     err;

        SOCK_ERRNO_SET(0);
+       ERR_clear_error();
        n = SSL_write(conn->ssl, ptr, len);
        err = SSL_get_error(conn->ssl, n);
        switch (err)


Branch REL9_4_STABLE
diff --git a/src/interfaces/libpq/fe-secure.c
b/src/interfaces/libpq/fe-secure.c
index 2752d16..54001c0 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -350,6 +350,7 @@ pqsecure_read(PGconn *conn, void *ptr, size_t len)

 rloop:
                SOCK_ERRNO_SET(0);
+               ERR_clear_error();
                n = SSL_read(conn->ssl, ptr, len);
                err = SSL_get_error(conn->ssl, n);
                switch (err)
@@ -511,6 +512,7 @@ pqsecure_write(PGconn *conn, const void *ptr, size_t
len)
                DISABLE_SIGPIPE(conn, spinfo, return -1);

                SOCK_ERRNO_SET(0);
+               ERR_clear_error();
                n = SSL_write(conn->ssl, ptr, len);
                err = SSL_get_error(conn->ssl, n);
                switch (err)


Similar situation discussed on

stackoverflow:http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Heikki Linnakangas
Дата:
On 02/24/2015 05:09 AM, william.welter@4linux.com.br wrote:
> According to OpenSSL manual
> (https://www.openssl.org/docs/ssl/SSL_get_error.html#DESCRIPTION) "The
> current thread's error queue must be empty before the TLS/SSL I/O operation
> is attempted, or SSL_get_error() will not work reliably"
>
> But libpq in pgsecure_read()/pqsecure_write() on branch REL9_4_STABLE or in
> pgtls_read()/pgtls_write() on branch MASTER there no calls to
> ERR_clear_error() to clear error queue
> (https://www.openssl.org/docs/crypto/ERR_clear_error.html) and avoid
> unpredictable conditions.

<pedantic>
The OpenSSL manual doesn't directly require you to call
ERR_clear_error() before every SSL_* call. It just requires that you
ensure that the error queue is empty. Libpq ensures that by always
clearing the queue *after* an error happens, in SSLerrmessage().
</pedantic>

I'm actually not 100% sure we clear the error queue after every error.
Looking at the OpenSSL sources on SSL_get_error(), it seems possible
that it returns SSL_ERROR_SYSCALL, with an error in the error queue. The
libpq code does not call SSLerrmessag() when SSL_get_error() returns
SSL_ERROR_SYSCALL, so that would leave the error dangling in the queue.
Also, if there are multiple errors in the queue, we only return and
remove from the queue the first one.

So should we clear the error queue before each call, just to be sure?
The risk is that the application has performed some other OpenSSL
operations, outside libpq, but not checked the error queue yet, and when
we clear the queue, the original error is gone and isn't reported. An
application probably shouldn't do that, you should check the error after
each SSL call, and not leave the error dangling in the error queue as a
landmine for the next SSL call. OTOH, if the application is built so
that the error queue is cleared before each SSL call, not after, that's
incompatible with the way libpq currently works. And that's certainly a
much more common practice than the hypothetical application that would
miss the error if it's cleared unexpectedly.

Overall, I agree we should clear the error queue before each SSL call.
It makes the application as whole more robust.

> The solution is simple, see following patches:

Yeah. Should also do the same in the backend, and also before
SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.

- Heikki

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
william.welter@4linux.com.br
Дата:
> <pedantic>
> The OpenSSL manual doesn't directly require you to call
> ERR_clear_error() before every SSL_* call. It just requires that you
> ensure that the error queue is empty. Libpq ensures that by always
> clearing the queue *after* an error happens, in SSLerrmessage().
> </pedantic>
>
Ok, i understand.

> I'm actually not 100% sure we clear the error queue after every error.
> Looking at the OpenSSL sources on SSL_get_error(), it seems possible
> that it returns SSL_ERROR_SYSCALL, with an error in the error queue. The
> libpq code does not call SSLerrmessag() when SSL_get_error() returns
> SSL_ERROR_SYSCALL, so that would leave the error dangling in the queue.
> Also, if there are multiple errors in the queue, we only return and
> remove from the queue the first one.
>
> So should we clear the error queue before each call, just to be sure?
> The risk is that the application has performed some other OpenSSL
> operations, outside libpq, but not checked the error queue yet, and when
> we clear the queue, the original error is gone and isn't reported. An
> application probably shouldn't do that, you should check the error after
> each SSL call, and not leave the error dangling in the error queue as a
> landmine for the next SSL call. OTOH, if the application is built so
> that the error queue is cleared before each SSL call, not after, that's
> incompatible with the way libpq currently works. And that's certainly a
> much more common practice than the hypothetical application that would
> miss the error if it's cleared unexpectedly.

If i understand correctly:
- If libpq clear the error queue before each call this can break current applications that not check error queue
immedialtyafter calls.  
- But if libpq not clear the queue, application can leave errors on the queue which lead libpq throws unrelated errors.

On PHP case, the problem is that the queue is only cleared if the developer explicit call "openssl_error_string()" for
eachopenssl function they called before. I think this behavior can be changed on PHP to prevent this kind of problem
withlibpq or other libs that use openssl.  


>
> Overall, I agree we should clear the error queue before each SSL call.
> It makes the application as whole more robust.
>
> > The solution is simple, see following patches:
>
> Yeah. Should also do the same in the backend, and also before
> SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.

Question: Is possible to fix this to next major release ? (im volunteering)

>
> - Heikki
>
> --
> Message  protected by MailGuard: e-mail anti-virus, anti-spam and content
> filtering.http://www.mailguard.com.au/mg
> Click here to report this message as spam:
> https://console.mailguard.com.au/ras/1LubMvNYuN/6iGxWUBDMTyqo5LXNO4F30/0
>
>

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Heikki Linnakangas
Дата:
On 02/27/2015 04:00 AM, william.welter@4linux.com.br wrote:
>>> The solution is simple, see following patches:
>>
>> Yeah. Should also do the same in the backend, and also before
>> SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.
>
> Question: Is possible to fix this to next major release ? (im volunteering)

Yeah. If you could write a complete patch, per above, I can commit it.

- Heikki

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Dave Vitek
Дата:
On 3/1/2015 3:32 PM, Heikki Linnakangas wrote:
> On 02/27/2015 04:00 AM, william.welter@4linux.com.br wrote:
>>>> The solution is simple, see following patches:
>>>
>>> Yeah. Should also do the same in the backend, and also before
>>> SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used.
>>
>> Question: Is possible to fix this to next major release ? (im
>> volunteering)
>
> Yeah. If you could write a complete patch, per above, I can commit it.
>
> - Heikki
>
>
We independently ran into this, diagnosed it, and fixed it.  Here is the
complete patch covering every use of SSL_get_error.

diff -ur src/backend/libpq/be-secure-openssl.c
src/backend/libpq/be-secure-openssl.c
--- src/backend/libpq/be-secure-openssl.c    2016-02-18
16:40:38.540898100 -0500
+++ src/backend/libpq/be-secure-openssl.c    2016-02-18
16:59:07.331898100 -0500
@@ -353,6 +353,10 @@
      port->ssl_in_use = true;

  aloop:
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_accept below.
+     */
+    ERR_clear_error();
      r = SSL_accept(port->ssl);
      if (r <= 0)
      {
@@ -501,6 +505,9 @@
      int            err;

      errno = 0;
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_read below.
+     */
      n = SSL_read(port->ssl, ptr, len);
      err = SSL_get_error(port->ssl, n);
      switch (err)
@@ -558,6 +565,9 @@
      int            err;

      errno = 0;
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_write below.
+     */
      n = SSL_write(port->ssl, ptr, len);
      err = SSL_get_error(port->ssl, n);
      switch (err)
diff -ur src/interfaces/libpq/fe-secure-openssl.c
src/interfaces/libpq/fe-secure-openssl.c
--- src/interfaces/libpq/fe-secure-openssl.c    2016-02-18
16:40:03.575898100 -0500
+++ src/interfaces/libpq/fe-secure-openssl.c    2016-02-18
16:58:01.711898100 -0500
@@ -212,6 +212,10 @@

  rloop:
      SOCK_ERRNO_SET(0);
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_read below.
+     */
+    ERR_clear_error();
      n = SSL_read(conn->ssl, ptr, len);
      err = SSL_get_error(conn->ssl, n);
      switch (err)
@@ -320,6 +324,10 @@
      int            err;

      SOCK_ERRNO_SET(0);
+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_write below.
+     */
+    ERR_clear_error();
      n = SSL_write(conn->ssl, ptr, len);
      err = SSL_get_error(conn->ssl, n);
      switch (err)
@@ -1327,6 +1335,10 @@
  {
      int            r;

+    /* make sure SSL_get_error_doesn't fetch an
+     * old error that predates the SSL_connect below.
+     */
+    ERR_clear_error();
      r = SSL_connect(conn->ssl);
      if (r <= 0)
      {


- Dave

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Peter Geoghegan
Дата:
On Thu, Feb 18, 2016 at 2:11 PM, Dave Vitek <dvitek@grammatech.com> wrote:
> We independently ran into this, diagnosed it, and fixed it.  Here is the
> complete patch covering every use of SSL_get_error.

I posted a patch for this issue independently, and quite recently:

https://commitfest.postgresql.org/9/520/

Do you happen to have any idea why there has been an uptick in problem
reports about this recently, despite the fact that it's been an issue
for a while (that's been the case within Heroku, at least)? Are you
aware that there is some specific trend behind that?

--
Peter Geoghegan

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Dave Vitek
Дата:
On 2/18/2016 5:16 PM, Peter Geoghegan wrote:
> On Thu, Feb 18, 2016 at 2:11 PM, Dave Vitek <dvitek@grammatech.com> wrote:
>> We independently ran into this, diagnosed it, and fixed it.  Here is the
>> complete patch covering every use of SSL_get_error.
> I posted a patch for this issue independently, and quite recently:
>
> https://commitfest.postgresql.org/9/520/
>
> Do you happen to have any idea why there has been an uptick in problem
> reports about this recently, despite the fact that it's been an issue
> for a while (that's been the case within Heroku, at least)? Are you
> aware that there is some specific trend behind that?
>
I can only speak for my case.  postgres is part of our CodeSonar
product, and we are adding an option to have it use TLS sockets to talk
to postgres.

Maintainers: Peter's patch is better than mine, at least for the front
end side.  I also adjusted be-secure-openssl.c, which perhaps is not
necessary, but then again it's hard to be sure.  It might be worth doing
a patch for be-secure-openssl.c in the spirit of what Peter did for the
frontend (sorry, not volunteering :).

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Tom Lane
Дата:
Dave Vitek <dvitek@grammatech.com> writes:
> Maintainers: Peter's patch is better than mine, at least for the front
> end side.  I also adjusted be-secure-openssl.c, which perhaps is not
> necessary, but then again it's hard to be sure.  It might be worth doing
> a patch for be-secure-openssl.c in the spirit of what Peter did for the
> frontend (sorry, not volunteering :).

+1 for changing both sides.  I'm fairly sure that you could provoke
problems of this ilk in the backend too, for example if client connection
is using SSL and we also establish an outgoing SSL connection using
postgres_fdw or dblink.

BTW, do we have a reproducible test case?

            regards, tom lane

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Dave Vitek
Дата:
On 2/18/2016 5:38 PM, Tom Lane wrote:
> Dave Vitek <dvitek@grammatech.com> writes:
>> Maintainers: Peter's patch is better than mine, at least for the front
>> end side.  I also adjusted be-secure-openssl.c, which perhaps is not
>> necessary, but then again it's hard to be sure.  It might be worth doing
>> a patch for be-secure-openssl.c in the spirit of what Peter did for the
>> frontend (sorry, not volunteering :).
> +1 for changing both sides.  I'm fairly sure that you could provoke
> problems of this ilk in the backend too, for example if client connection
> is using SSL and we also establish an outgoing SSL connection using
> postgres_fdw or dblink.
>
> BTW, do we have a reproducible test case?
>
>             regards, tom lane
I can reproduce it, but I don't have a self contained unit test.

Such a test case might look like: call ERR_put_error with SSL_ERROR_SSL
and then cause libpq to invoke SSL_read. SSL_get_error after the read
will probably return SSL_ERROR_SSL even if the read goes fine, causing
postgres to conclude things have failed.

Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading

От
Peter Geoghegan
Дата:
On Thu, Feb 18, 2016 at 2:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> +1 for changing both sides.  I'm fairly sure that you could provoke
> problems of this ilk in the backend too, for example if client connection
> is using SSL and we also establish an outgoing SSL connection using
> postgres_fdw or dblink.

I didn't consider that, although I did note that Heikki was also in
favor of covering both sides. It should be fairly straightforward.

> BTW, do we have a reproducible test case?

Yes, but that seems almost unnecessary. Basically, we trust that the
per-thread error queue doesn't have anything in it, even though it
clearly can. That assumption could be violated because malloc()
returns NULL in SSLerrmessage(), for example. That's a case that does
not even involve any non-libpq use of OpenSSL. Clearly we need to be
more careful about clearing the queue generally, but especially
because everyone else will get this wrong.

The SSL_get_error() man page says:

"""
In addition to ssl and ret, SSL_get_error() inspects the current thread's
OpenSSL error queue.  Thus, SSL_get_error() must be used in the same thread
that performed the TLS/SSL I/O operation, and no other OpenSSL function calls
should appear in between.  The current thread's error queue must be empty
before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work
reliably.

"""

It is more or less that simple. We can't let this happen without
severe consequences. It's on us to coordinate how to prevent this
outcome, with no direction provided on how that should work out in the
real world.

There seem to be some really thin wrappers for OpenSSL for scripting
languages like Ruby and PHP around, that pass the buck on to users of
those languages. Naturally, they often get it wrong, because the
interface is so impractical.

--
Peter Geoghegan