Обсуждение: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
От
william.welter@4linux.com.br
Дата:
The following bug has been logged on the website: Bug reference: 12799 Logged by: William Felipe Welter Email address: william.welter@4linux.com.br PostgreSQL version: 9.4.1 Operating system: Ubuntu Linux Description: According to OpenSSL manual (https://www.openssl.org/docs/ssl/SSL_get_error.html#DESCRIPTION) "The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably" But libpq in pgsecure_read()/pqsecure_write() on branch REL9_4_STABLE or in pgtls_read()/pgtls_write() on branch MASTER there no calls to ERR_clear_error() to clear error queue (https://www.openssl.org/docs/crypto/ERR_clear_error.html) and avoid unpredictable conditions. We can reproduce problems with currently implementation on PHP scripts that use pgsql extension (use libpq) and openssl extension as reported on PHP bug track on "https://bugs.php.net/bug.php?id=68276" (firstly reported as memory corruption error) when errors from previous php callings to openssl affect libpq calls leading to fatal errors. The solution is simple, see following patches: Branch master diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index 1b9f3a4..8cf0335 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -212,6 +212,7 @@ pgtls_read(PGconn *conn, void *ptr, size_t len) rloop: SOCK_ERRNO_SET(0); + ERR_clear_error(); n = SSL_read(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); switch (err) @@ -320,6 +321,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len) int err; SOCK_ERRNO_SET(0); + ERR_clear_error(); n = SSL_write(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); switch (err) Branch REL9_4_STABLE diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c index 2752d16..54001c0 100644 --- a/src/interfaces/libpq/fe-secure.c +++ b/src/interfaces/libpq/fe-secure.c @@ -350,6 +350,7 @@ pqsecure_read(PGconn *conn, void *ptr, size_t len) rloop: SOCK_ERRNO_SET(0); + ERR_clear_error(); n = SSL_read(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); switch (err) @@ -511,6 +512,7 @@ pqsecure_write(PGconn *conn, const void *ptr, size_t len) DISABLE_SIGPIPE(conn, spinfo, return -1); SOCK_ERRNO_SET(0); + ERR_clear_error(); n = SSL_write(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); switch (err) Similar situation discussed on stackoverflow:http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
От
Heikki Linnakangas
Дата:
On 02/24/2015 05:09 AM, william.welter@4linux.com.br wrote: > According to OpenSSL manual > (https://www.openssl.org/docs/ssl/SSL_get_error.html#DESCRIPTION) "The > current thread's error queue must be empty before the TLS/SSL I/O operation > is attempted, or SSL_get_error() will not work reliably" > > But libpq in pgsecure_read()/pqsecure_write() on branch REL9_4_STABLE or in > pgtls_read()/pgtls_write() on branch MASTER there no calls to > ERR_clear_error() to clear error queue > (https://www.openssl.org/docs/crypto/ERR_clear_error.html) and avoid > unpredictable conditions. <pedantic> The OpenSSL manual doesn't directly require you to call ERR_clear_error() before every SSL_* call. It just requires that you ensure that the error queue is empty. Libpq ensures that by always clearing the queue *after* an error happens, in SSLerrmessage(). </pedantic> I'm actually not 100% sure we clear the error queue after every error. Looking at the OpenSSL sources on SSL_get_error(), it seems possible that it returns SSL_ERROR_SYSCALL, with an error in the error queue. The libpq code does not call SSLerrmessag() when SSL_get_error() returns SSL_ERROR_SYSCALL, so that would leave the error dangling in the queue. Also, if there are multiple errors in the queue, we only return and remove from the queue the first one. So should we clear the error queue before each call, just to be sure? The risk is that the application has performed some other OpenSSL operations, outside libpq, but not checked the error queue yet, and when we clear the queue, the original error is gone and isn't reported. An application probably shouldn't do that, you should check the error after each SSL call, and not leave the error dangling in the error queue as a landmine for the next SSL call. OTOH, if the application is built so that the error queue is cleared before each SSL call, not after, that's incompatible with the way libpq currently works. And that's certainly a much more common practice than the hypothetical application that would miss the error if it's cleared unexpectedly. Overall, I agree we should clear the error queue before each SSL call. It makes the application as whole more robust. > The solution is simple, see following patches: Yeah. Should also do the same in the backend, and also before SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used. - Heikki
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
От
william.welter@4linux.com.br
Дата:
> <pedantic> > The OpenSSL manual doesn't directly require you to call > ERR_clear_error() before every SSL_* call. It just requires that you > ensure that the error queue is empty. Libpq ensures that by always > clearing the queue *after* an error happens, in SSLerrmessage(). > </pedantic> > Ok, i understand. > I'm actually not 100% sure we clear the error queue after every error. > Looking at the OpenSSL sources on SSL_get_error(), it seems possible > that it returns SSL_ERROR_SYSCALL, with an error in the error queue. The > libpq code does not call SSLerrmessag() when SSL_get_error() returns > SSL_ERROR_SYSCALL, so that would leave the error dangling in the queue. > Also, if there are multiple errors in the queue, we only return and > remove from the queue the first one. > > So should we clear the error queue before each call, just to be sure? > The risk is that the application has performed some other OpenSSL > operations, outside libpq, but not checked the error queue yet, and when > we clear the queue, the original error is gone and isn't reported. An > application probably shouldn't do that, you should check the error after > each SSL call, and not leave the error dangling in the error queue as a > landmine for the next SSL call. OTOH, if the application is built so > that the error queue is cleared before each SSL call, not after, that's > incompatible with the way libpq currently works. And that's certainly a > much more common practice than the hypothetical application that would > miss the error if it's cleared unexpectedly. If i understand correctly: - If libpq clear the error queue before each call this can break current applications that not check error queue immedialtyafter calls. - But if libpq not clear the queue, application can leave errors on the queue which lead libpq throws unrelated errors. On PHP case, the problem is that the queue is only cleared if the developer explicit call "openssl_error_string()" for eachopenssl function they called before. I think this behavior can be changed on PHP to prevent this kind of problem withlibpq or other libs that use openssl. > > Overall, I agree we should clear the error queue before each SSL call. > It makes the application as whole more robust. > > > The solution is simple, see following patches: > > Yeah. Should also do the same in the backend, and also before > SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used. Question: Is possible to fix this to next major release ? (im volunteering) > > - Heikki > > -- > Message protected by MailGuard: e-mail anti-virus, anti-spam and content > filtering.http://www.mailguard.com.au/mg > Click here to report this message as spam: > https://console.mailguard.com.au/ras/1LubMvNYuN/6iGxWUBDMTyqo5LXNO4F30/0 > >
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
От
Heikki Linnakangas
Дата:
On 02/27/2015 04:00 AM, william.welter@4linux.com.br wrote: >>> The solution is simple, see following patches: >> >> Yeah. Should also do the same in the backend, and also before >> SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used. > > Question: Is possible to fix this to next major release ? (im volunteering) Yeah. If you could write a complete patch, per above, I can commit it. - Heikki
On 3/1/2015 3:32 PM, Heikki Linnakangas wrote: > On 02/27/2015 04:00 AM, william.welter@4linux.com.br wrote: >>>> The solution is simple, see following patches: >>> >>> Yeah. Should also do the same in the backend, and also before >>> SSL_connect/SSL_accept() calls. I.e. any time SSL_get_error is used. >> >> Question: Is possible to fix this to next major release ? (im >> volunteering) > > Yeah. If you could write a complete patch, per above, I can commit it. > > - Heikki > > We independently ran into this, diagnosed it, and fixed it. Here is the complete patch covering every use of SSL_get_error. diff -ur src/backend/libpq/be-secure-openssl.c src/backend/libpq/be-secure-openssl.c --- src/backend/libpq/be-secure-openssl.c 2016-02-18 16:40:38.540898100 -0500 +++ src/backend/libpq/be-secure-openssl.c 2016-02-18 16:59:07.331898100 -0500 @@ -353,6 +353,10 @@ port->ssl_in_use = true; aloop: + /* make sure SSL_get_error_doesn't fetch an + * old error that predates the SSL_accept below. + */ + ERR_clear_error(); r = SSL_accept(port->ssl); if (r <= 0) { @@ -501,6 +505,9 @@ int err; errno = 0; + /* make sure SSL_get_error_doesn't fetch an + * old error that predates the SSL_read below. + */ n = SSL_read(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); switch (err) @@ -558,6 +565,9 @@ int err; errno = 0; + /* make sure SSL_get_error_doesn't fetch an + * old error that predates the SSL_write below. + */ n = SSL_write(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); switch (err) diff -ur src/interfaces/libpq/fe-secure-openssl.c src/interfaces/libpq/fe-secure-openssl.c --- src/interfaces/libpq/fe-secure-openssl.c 2016-02-18 16:40:03.575898100 -0500 +++ src/interfaces/libpq/fe-secure-openssl.c 2016-02-18 16:58:01.711898100 -0500 @@ -212,6 +212,10 @@ rloop: SOCK_ERRNO_SET(0); + /* make sure SSL_get_error_doesn't fetch an + * old error that predates the SSL_read below. + */ + ERR_clear_error(); n = SSL_read(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); switch (err) @@ -320,6 +324,10 @@ int err; SOCK_ERRNO_SET(0); + /* make sure SSL_get_error_doesn't fetch an + * old error that predates the SSL_write below. + */ + ERR_clear_error(); n = SSL_write(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); switch (err) @@ -1327,6 +1335,10 @@ { int r; + /* make sure SSL_get_error_doesn't fetch an + * old error that predates the SSL_connect below. + */ + ERR_clear_error(); r = SSL_connect(conn->ssl); if (r <= 0) { - Dave
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
От
Peter Geoghegan
Дата:
On Thu, Feb 18, 2016 at 2:11 PM, Dave Vitek <dvitek@grammatech.com> wrote: > We independently ran into this, diagnosed it, and fixed it. Here is the > complete patch covering every use of SSL_get_error. I posted a patch for this issue independently, and quite recently: https://commitfest.postgresql.org/9/520/ Do you happen to have any idea why there has been an uptick in problem reports about this recently, despite the fact that it's been an issue for a while (that's been the case within Heroku, at least)? Are you aware that there is some specific trend behind that? -- Peter Geoghegan
On 2/18/2016 5:16 PM, Peter Geoghegan wrote: > On Thu, Feb 18, 2016 at 2:11 PM, Dave Vitek <dvitek@grammatech.com> wrote: >> We independently ran into this, diagnosed it, and fixed it. Here is the >> complete patch covering every use of SSL_get_error. > I posted a patch for this issue independently, and quite recently: > > https://commitfest.postgresql.org/9/520/ > > Do you happen to have any idea why there has been an uptick in problem > reports about this recently, despite the fact that it's been an issue > for a while (that's been the case within Heroku, at least)? Are you > aware that there is some specific trend behind that? > I can only speak for my case. postgres is part of our CodeSonar product, and we are adding an option to have it use TLS sockets to talk to postgres. Maintainers: Peter's patch is better than mine, at least for the front end side. I also adjusted be-secure-openssl.c, which perhaps is not necessary, but then again it's hard to be sure. It might be worth doing a patch for be-secure-openssl.c in the spirit of what Peter did for the frontend (sorry, not volunteering :).
Dave Vitek <dvitek@grammatech.com> writes: > Maintainers: Peter's patch is better than mine, at least for the front > end side. I also adjusted be-secure-openssl.c, which perhaps is not > necessary, but then again it's hard to be sure. It might be worth doing > a patch for be-secure-openssl.c in the spirit of what Peter did for the > frontend (sorry, not volunteering :). +1 for changing both sides. I'm fairly sure that you could provoke problems of this ilk in the backend too, for example if client connection is using SSL and we also establish an outgoing SSL connection using postgres_fdw or dblink. BTW, do we have a reproducible test case? regards, tom lane
On 2/18/2016 5:38 PM, Tom Lane wrote: > Dave Vitek <dvitek@grammatech.com> writes: >> Maintainers: Peter's patch is better than mine, at least for the front >> end side. I also adjusted be-secure-openssl.c, which perhaps is not >> necessary, but then again it's hard to be sure. It might be worth doing >> a patch for be-secure-openssl.c in the spirit of what Peter did for the >> frontend (sorry, not volunteering :). > +1 for changing both sides. I'm fairly sure that you could provoke > problems of this ilk in the backend too, for example if client connection > is using SSL and we also establish an outgoing SSL connection using > postgres_fdw or dblink. > > BTW, do we have a reproducible test case? > > regards, tom lane I can reproduce it, but I don't have a self contained unit test. Such a test case might look like: call ERR_put_error with SSL_ERROR_SSL and then cause libpq to invoke SSL_read. SSL_get_error after the read will probably return SSL_ERROR_SSL even if the read goes fine, causing postgres to conclude things have failed.
Re: BUG #12799: libpq - SSL pqsecure_read() doesn't clean openssl error queue before reading
От
Peter Geoghegan
Дата:
On Thu, Feb 18, 2016 at 2:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for changing both sides. I'm fairly sure that you could provoke > problems of this ilk in the backend too, for example if client connection > is using SSL and we also establish an outgoing SSL connection using > postgres_fdw or dblink. I didn't consider that, although I did note that Heikki was also in favor of covering both sides. It should be fairly straightforward. > BTW, do we have a reproducible test case? Yes, but that seems almost unnecessary. Basically, we trust that the per-thread error queue doesn't have anything in it, even though it clearly can. That assumption could be violated because malloc() returns NULL in SSLerrmessage(), for example. That's a case that does not even involve any non-libpq use of OpenSSL. Clearly we need to be more careful about clearing the queue generally, but especially because everyone else will get this wrong. The SSL_get_error() man page says: """ In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably. """ It is more or less that simple. We can't let this happen without severe consequences. It's on us to coordinate how to prevent this outcome, with no direction provided on how that should work out in the real world. There seem to be some really thin wrappers for OpenSSL for scripting languages like Ruby and PHP around, that pass the buck on to users of those languages. Naturally, they often get it wrong, because the interface is so impractical. -- Peter Geoghegan