Обсуждение: SSL renegotiation and other related woes

Поиск

Список

Период

Сортировка

SSL renegotiation and other related woes

От

Andres Freund

Дата:

26 января 2015 г., 13:14:23

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean just that the same parameters have to be passed
in,but also that we can't just constantly switch between _read()/write(). Especially nonblocking backend code (i.e.
walsender)and the whole frontend code violate this rule.

2) We start renegotiations in be_tls_write() while in nonblocking mode, but don't properly retry to handle socket
readyness.There's a loop that retries handshakes twenty times (???), but what actually is needed is to call
SSL_get_error()and ensure that there's actually data available.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

As far as I can see the only realistic way to fix 1) is to change both
frontend and backend code to:
a) Always check for socket read/writeability before calling SSL_read/write() when in nonblocking mode. That's a bit
annoying because it nearly doubles the amount of syscalls we do or client communication, but I can't really se an
alternative.That allows us to avoid waiting inside after a WANT_READ/WRITE, or havin to setup a larger state machine
thatkeeps track what we tried last.

b) When SSL_read/write nonetheless returns WANT_READ/WRITE, even though we tested for read/writeability, we're very
likelydoing renegotiation. In that case we'll just have to block. There's already code that busy loops (and thus
waits)in the frontend (c.f. pgtls_read's WANT_WRITE case, triggered during reneg). We can't just return immediately
tothe upper layers as we'd otherwise likely violate the rule about calling ssl with the same parameters again.

c) Add a somewhat hacky optimization whereas we allow to break out of a WANT_READ condition in a nonblocking socket
whenssl->state == SSL_ST_OK. That's the cases where it actually, at least by my reading of the unreadable ssl code,
safeto not wait. That case is somewhat important because we otherwise can end up waiting on both sides due to b),
evenwhen nonblocking calls where actually made. That condition essentially means that we'll only block if
renegotiationor partial reads are in progress. Afaics at least.

d) Remove the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER hack - we don't actually need it anymore.

These errors are much less frequent when using a plain frontend
(e.g. psql/pgbench) because they don't use copy both stuff - the way
these clients use the FE/BE protocol there's essentially natural
synchronization points where nothing but renegotiation happens. With
walsender (or pipelined queries!) both sides can write at the same time.

My testcase for this is just to setup a server with a low
ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
receive data via pg_receivexlog -n. Usually it'll error out quickly.

I've done a preliminary implementation of the above steps and it
survives transferring 25GB of WAL via the replication protocol with a
ssl_renegotiation_limit=100kB - previously it failed much earlier.

Does anybody have a neater way to tackle this? I'm not happy about this
solution, but I really can't think of anything better (save ditching
openssl maybe). I'm willing to clean up my hacked up fix for this, but
not if we can't find agreement on the approach.

Greetings,

Andres Freund

-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: SSL renegotiation and other related woes

От

Andres Freund

Дата:

26 января 2015 г., 13:20:19

On 2015-01-26 11:14:05 +0100, Andres Freund wrote:
> My testcase for this is just to setup a server with a low
> ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
> receive data via pg_receivexlog -n. Usually it'll error out quickly.

...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

wal.sql

Re: SSL renegotiation and other related woes

От

Heikki Linnakangas

Дата:

06 февраля 2015 г., 00:04:08

On 01/26/2015 12:14 PM, Andres Freund wrote:
> Hi,
>
> When working on getting rid of ImmediateInterruptOK I wanted to verify
> that ssl still works correctly. Turned out it didn't. But neither did it
> in master.
>
> Turns out there's two major things we do wrong:
>
> 1) We ignore the rule that once called and returning
>     SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
>     again with the same parameters. Unfortunately that rule doesn't mean
>     just that the same parameters have to be passed in, but also that we
>     can't just constantly switch between _read()/write(). Especially
>     nonblocking backend code (i.e. walsender) and the whole frontend code
>     violate this rule.

I don't buy that. Googling around, and looking at the man pages, I just
don't see anything to support that claim. I believe it is supported to
switch between reads and writes.

> 2) We start renegotiations in be_tls_write() while in nonblocking mode,
>     but don't properly retry to handle socket readyness. There's a loop
>     that retries handshakes twenty times (???), but what actually is
>     needed is to call SSL_get_error() and ensure that there's actually
>     data available.

Yeah, that's just crazy.

Actually, I don't think we need to call SSL_do_handshake() at all. At
least on modern OpenSSL versions. OpenSSL internally uses a state
machine, and SSL_read() and SSL_write() calls will complete the
handshake just as well.

> 2) is easy enough to fix, but 1) is pretty hard. Before anybody says
> that 1) isn't an important rule: It reliably causes connection aborts
> within a couple renegotiations. The higher the latency the higher the
> likelihood of aborts. Even locally it doesn't take very long to
> abort. Errors usually are something like "SSL connection has been closed
> unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
> of other similar messages. There's a couple reports of those in the
> archives and I've seen many more in client logs.

Yeah, I can reproduce that. There's clearly something wrong.

I believe this is an OpenSSL bug. I traced the "unexpected message"
error to this code in OpenSSL's s3_read_bytes() function:

>     case SSL3_RT_APPLICATION_DATA:
>         /*
>          * At this point, we were expecting handshake data, but have
>          * application data.  If the library was running inside ssl3_read()
>          * (i.e. in_read_app_data is set) and it makes sense to read
>          * application data at this point (session renegotiation not yet
>          * started), we will indulge it.
>          */
>         if (s->s3->in_read_app_data &&
>             (s->s3->total_renegotiations != 0) &&
>             (((s->state & SSL_ST_CONNECT) &&
>               (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
>               (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
>              ) || ((s->state & SSL_ST_ACCEPT) &&
>                    (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
>                    (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
>              )
>             )) {
>             s->s3->in_read_app_data = 2;
>             return (-1);
>         } else {
>             al = SSL_AD_UNEXPECTED_MESSAGE;
>         SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
>             goto f_err;
>         }

So that quite clearly throws an error, *unless* the original application
call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
SSL_write() is called when it's in renegotiation. I think that's
OpenSSL's fault and it should "indulge" the application data message
even in SSL_write().

Can we work-around that easily? I believe we can. The crucial part is
that we mustn't let SSL_write() to process any incoming application
data. We can achieve that if we always call SSL_read() to drain such
data, before calling SSL_write(). But that's not quite enough; if we're
in renegotiation, SSL_write() might still try to read more data from the
socket that has arrived after the SSL_read() call. Fortunately, we can
easily prevent that by hacking pqsecure_raw_read() to always return
EWOULDBLOCK, when it's called through SSL_write().

The attached patch does that. I've been running your pg_receivexlog test
case with this for about 15 minutes without any errors now, with
ssl_renegotiation_limit=50kB, while before it errored out within seconds.

In theory, I guess we should do similar hacks in the server, and always
call SSL_read() before SSL_write(), but it seems to work without it. Or
maybe not; OpenSSL server and client code is not symmetric, so perhaps
it works in server mode without that.

PS. The server code started renegotiation when it's 1kB from reaching
ssl_renegotiation_limit. I increased that to 32kB, because in testing, I
got some "SSL failed to renegotiate connection before limit expired"
errors in testing before I did that.

PPS. I did all testing of this patch with my sleeping logic
simplification patch applied
(http://www.postgresql.org/message-id/54D3821E.9060004@vmware.com). It
applies without it too, and I don't think it matters, but I thought I'd
mention it.

- Heikki

Вложения

0001-SSL-renegotiation-fixes.patch

Re: SSL renegotiation and other related woes

От

Heikki Linnakangas

Дата:

11 февраля 2015 г., 15:54:37

On 02/05/2015 11:03 PM, Heikki Linnakangas wrote:
> On 01/26/2015 12:14 PM, Andres Freund wrote:
> Can we work-around that easily? I believe we can. The crucial part is
> that we mustn't let SSL_write() to process any incoming application
> data. We can achieve that if we always call SSL_read() to drain such
> data, before calling SSL_write(). But that's not quite enough; if we're
> in renegotiation, SSL_write() might still try to read more data from the
> socket that has arrived after the SSL_read() call. Fortunately, we can
> easily prevent that by hacking pqsecure_raw_read() to always return
> EWOULDBLOCK, when it's called through SSL_write().
>
> The attached patch does that. I've been running your pg_receivexlog test
> case with this for about 15 minutes without any errors now, with
> ssl_renegotiation_limit=50kB, while before it errored out within seconds.

Here is a simplified version of this patch. It seems to be enough to not
let SSL_write() to read any data from the socket. No need to call
SSL_read() first, and the server-side changes I made were only needed
because of the other patch I had applied.

Thoughts? Can you reproduce any errors with this?

> In theory, I guess we should do similar hacks in the server, and always
> call SSL_read() before SSL_write(), but it seems to work without it. Or
> maybe not; OpenSSL server and client code is not symmetric, so perhaps
> it works in server mode without that.

Not included in this patch, but I believe we apply a similar patch to
the server-side too.

- Heikki

Вложения

0001-Fix-sslv3-alert-unexpected-message-errors-in-SSL-ren.patch

Re: SSL renegotiation and other related woes

От

Albe Laurenz

Дата:

11 февраля 2015 г., 17:16:56

Heikki Linnakangas wrote:
>> Can we work-around that easily? I believe we can. The crucial part is
>> that we mustn't let SSL_write() to process any incoming application
>> data. We can achieve that if we always call SSL_read() to drain such
>> data, before calling SSL_write(). But that's not quite enough; if we're
>> in renegotiation, SSL_write() might still try to read more data from the
>> socket that has arrived after the SSL_read() call. Fortunately, we can
>> easily prevent that by hacking pqsecure_raw_read() to always return
>> EWOULDBLOCK, when it's called through SSL_write().
>>
>> The attached patch does that. I've been running your pg_receivexlog test
>> case with this for about 15 minutes without any errors now, with
>> ssl_renegotiation_limit=50kB, while before it errored out within seconds.
> 
> Here is a simplified version of this patch. It seems to be enough to not
> let SSL_write() to read any data from the socket. No need to call
> SSL_read() first, and the server-side changes I made were only needed
> because of the other patch I had applied.
> 
> Thoughts? Can you reproduce any errors with this?
> 
>> In theory, I guess we should do similar hacks in the server, and always
>> call SSL_read() before SSL_write(), but it seems to work without it. Or
>> maybe not; OpenSSL server and client code is not symmetric, so perhaps
>> it works in server mode without that.
> 
> Not included in this patch, but I believe we apply a similar patch to
> the server-side too.

It seems to me that there is a bug in the server part of your first patch
(not included in your second patch):

--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -576,11 +576,11 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)   /*    * If SSL renegotiations are
enabledand we're getting close to the    * limit, start one now; but avoid it if there's one already in
 
-    * progress.  Request the renegotiation 1kB before the limit has
+    * progress.  Request the renegotiation 32kB before the limit has    * actually expired.    */   if
(ssl_renegotiation_limit&& !in_ssl_renegotiation &&
 
-       port->count > (ssl_renegotiation_limit - 1) * 1024L)
+       port->count > (ssl_renegotiation_limit - 32) * 1024L)   {       in_ssl_renegotiation = true;

Experiment shows that for values of ssl_renegotiation_limit less than 32,
no renegotiation takes place at all with this change applied.
port->count is an unsigned long.

Yours,
Laurenz Albe

Re: SSL renegotiation and other related woes

От

Andres Freund

Дата:

12 февраля 2015 г., 02:33:53

On 2015-02-11 14:54:03 +0200, Heikki Linnakangas wrote:
> Thoughts? Can you reproduce any errors with this?

I'm on battery right now, so I can't really test much.

Did you test having an actual standby instead of pg_receivexlog? I saw
some slightly different errors there.

Does this patch alone work for you or did you test it together with the
earlier one to fix the renegotiation handling server side when
nonblocking? Because that frequently seemed to error out for me, at
least over actual network. A latch loop + checking for the states seemed
to fix that for me.

I'm pretty darn happy that you seem to have found a much less ugly
solution than mine. Although it's only pretty by comparison ;)

> +    /*
> +     * To work-around an issue with OpenSSL and renegotiation, we don't
> +     * let SSL_write() to read any incoming data.

superflous "to".

> +     * NB: This relies on the calling code to call pqsecure_read(), completing
> +     * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
> +     * we'll never make progress.
> +     */

I think this should a) refer to the fact that pqSendSome does that in
blocking scenarios b) expand the comment around the pqReadData to
reference that fact.

How are we going to deal with callers using libpq in nonblocking mode. A
client doing a large COPY FROM STDIN to the server using a nonblocking
connection (not a stupid thing to do) could hit this IIUC.

Could we perhaps call SSL_peek() when SSL_write() hits WANTS_READ? In
combination with your fix I think that should fix the possibility of
such a deadlock? Especially on the serverside where the socket most of
the time is is in, although emulated, blocking mode that seems easier -
we can just retry afterwards.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: SSL renegotiation and other related woes

От

Andres Freund

Дата:

12 февраля 2015 г., 02:42:07

Hi,

On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:
> On 01/26/2015 12:14 PM, Andres Freund wrote:
> >Hi,
> >
> >When working on getting rid of ImmediateInterruptOK I wanted to verify
> >that ssl still works correctly. Turned out it didn't. But neither did it
> >in master.
> >
> >Turns out there's two major things we do wrong:
> >
> >1) We ignore the rule that once called and returning
> >    SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
> >    again with the same parameters. Unfortunately that rule doesn't mean
> >    just that the same parameters have to be passed in, but also that we
> >    can't just constantly switch between _read()/write(). Especially
> >    nonblocking backend code (i.e. walsender) and the whole frontend code
> >    violate this rule.
> 
> I don't buy that. Googling around, and looking at the man pages, I just
> don't see anything to support that claim. I believe it is supported to
> switch between reads and writes.

There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.

> >2) We start renegotiations in be_tls_write() while in nonblocking mode,
> >    but don't properly retry to handle socket readyness. There's a loop
> >    that retries handshakes twenty times (???), but what actually is
> >    needed is to call SSL_get_error() and ensure that there's actually
> >    data available.
> 
> Yeah, that's just crazy.
> 
> Actually, I don't think we need to call SSL_do_handshake() at all. At least
> on modern OpenSSL versions. OpenSSL internally uses a state machine, and
> SSL_read() and SSL_write() calls will complete the handshake just as well.

Some blaming seems to show that that's been the case for a long time.

> >2) is easy enough to fix, but 1) is pretty hard. Before anybody says
> >that 1) isn't an important rule: It reliably causes connection aborts
> >within a couple renegotiations. The higher the latency the higher the
> >likelihood of aborts. Even locally it doesn't take very long to
> >abort. Errors usually are something like "SSL connection has been closed
> >unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
> >of other similar messages. There's a couple reports of those in the
> >archives and I've seen many more in client logs.
> 
> Yeah, I can reproduce that. There's clearly something wrong.

> I believe this is an OpenSSL bug. I traced the "unexpected message" error to
> this code in OpenSSL's s3_read_bytes() function:

Yep, I got to that as well.

> >    case SSL3_RT_APPLICATION_DATA:
> >        /*
> >         * At this point, we were expecting handshake data, but have
> >         * application data.  If the library was running inside ssl3_read()
> >         * (i.e. in_read_app_data is set) and it makes sense to read
> >         * application data at this point (session renegotiation not yet
> >         * started), we will indulge it.
> >         */
> >        if (s->s3->in_read_app_data &&
> >            (s->s3->total_renegotiations != 0) &&
> >            (((s->state & SSL_ST_CONNECT) &&
> >              (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
> >              (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
> >             ) || ((s->state & SSL_ST_ACCEPT) &&
> >                   (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
> >                   (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
> >             )
> >            )) {
> >            s->s3->in_read_app_data = 2;
> >            return (-1);
> >        } else {
> >            al = SSL_AD_UNEXPECTED_MESSAGE;
> >        SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
> >            goto f_err;
> >        }
> 
> So that quite clearly throws an error, *unless* the original application
> call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
> SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
> fault and it should "indulge" the application data message even in
> SSL_write().

That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.

> In theory, I guess we should do similar hacks in the server, and always call
> SSL_read() before SSL_write(), but it seems to work without it. Or maybe
> not; OpenSSL server and client code is not symmetric, so perhaps it works in
> server mode without that.

I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.

Greetings,

Andres Freund

Re: SSL renegotiation and other related woes

От

Heikki Linnakangas

Дата:

12 февраля 2015 г., 16:35:43

On 02/12/2015 01:33 AM, Andres Freund wrote:
> On 2015-02-11 14:54:03 +0200, Heikki Linnakangas wrote:
>> Thoughts? Can you reproduce any errors with this?
>
> I'm on battery right now, so I can't really test much.
>
> Did you test having an actual standby instead of pg_receivexlog? I saw
> some slightly different errors there.
>
> Does this patch alone work for you or did you test it together with the
> earlier one to fix the renegotiation handling server side when
> nonblocking? Because that frequently seemed to error out for me, at
> least over actual network. A latch loop + checking for the states seemed
> to fix that for me.

This patch alone worked for me.

>> +     * NB: This relies on the calling code to call pqsecure_read(), completing
>> +     * the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
>> +     * we'll never make progress.
>> +     */
>
> I think this should a) refer to the fact that pqSendSome does that in
> blocking scenarios b) expand the comment around the pqReadData to
> reference that fact.
>
> How are we going to deal with callers using libpq in nonblocking mode. A
> client doing a large COPY FROM STDIN to the server using a nonblocking
> connection (not a stupid thing to do) could hit this IIUC.

Hmm, that's problematic even without SSL. If you do a large COPY FROM 
STDIN, and the server sends a lot of NOTICEs, the NOTICEs will be 
accumulated in the TCP send and receive buffers until they fill up. 
After that, the server will block on the send, and will stop reading the 
tuple data the client sends, and you get a deadlock. In blocking mode, 
pqSendSome calls pqReadData to avoid that.

I think pqSendSome should call pqReadData() after getting EWOULDBLOCK, 
even in non-blocking mode. It won't completely prevent the problem: it's 
possible that the receive buffer fills up, but the application doesn't 
call pqSendSome() until the socket becomes writeable, which won't happen 
until the client drains the incoming data is drained and unblocks the 
server. But it greatly reduces the risk. In practice that will solve the 
problem for SSL renegotiations.

We should also document that the application should always wait for the 
socket readable condition, in addition to writeable, and call 
pqConsumeInput(). That fixes the issue for good.

> Could we perhaps call SSL_peek() when SSL_write() hits WANTS_READ? In
> combination with your fix I think that should fix the possibility of
> such a deadlock? Especially on the serverside where the socket most of
> the time is is in, although emulated, blocking mode that seems easier -
> we can just retry afterwards.

Hmm. Not sure what the semantics of SSL_peek() are. At least we'd need 
to call it with a large enough buffer that it would read all the 
incoming data from the OS buffer. I think it would be safer to just call 
SSL_read(). Both the server and libpq have an input buffer that you can 
easily read into at any time.

- Heikki

Re: SSL renegotiation and other related woes

От

Heikki Linnakangas

Дата:

14 февраля 2015 г., 00:43:00

On 02/12/2015 01:41 AM, Andres Freund wrote:
> Hi,
>
> On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:
>> On 01/26/2015 12:14 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> When working on getting rid of ImmediateInterruptOK I wanted to verify
>>> that ssl still works correctly. Turned out it didn't. But neither did it
>>> in master.
>>>
>>> Turns out there's two major things we do wrong:
>>>
>>> 1) We ignore the rule that once called and returning
>>>     SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
>>>     again with the same parameters. Unfortunately that rule doesn't mean
>>>     just that the same parameters have to be passed in, but also that we
>>>     can't just constantly switch between _read()/write(). Especially
>>>     nonblocking backend code (i.e. walsender) and the whole frontend code
>>>     violate this rule.
>>
>> I don't buy that. Googling around, and looking at the man pages, I just
>> don't see anything to support that claim. I believe it is supported to
>> switch between reads and writes.
>
> There's at least two implementations that seem to have workaround to
> achieve something like that. Apache's mod_ssl and the ssl layer for
> libevent.
>
>>> 2) We start renegotiations in be_tls_write() while in nonblocking mode,
>>>     but don't properly retry to handle socket readyness. There's a loop
>>>     that retries handshakes twenty times (???), but what actually is
>>>     needed is to call SSL_get_error() and ensure that there's actually
>>>     data available.
>>
>> Yeah, that's just crazy.
>>
>> Actually, I don't think we need to call SSL_do_handshake() at all. At least
>> on modern OpenSSL versions. OpenSSL internally uses a state machine, and
>> SSL_read() and SSL_write() calls will complete the handshake just as well.
>
> Some blaming seems to show that that's been the case for a long time.
>
>>> 2) is easy enough to fix, but 1) is pretty hard. Before anybody says
>>> that 1) isn't an important rule: It reliably causes connection aborts
>>> within a couple renegotiations. The higher the latency the higher the
>>> likelihood of aborts. Even locally it doesn't take very long to
>>> abort. Errors usually are something like "SSL connection has been closed
>>> unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
>>> of other similar messages. There's a couple reports of those in the
>>> archives and I've seen many more in client logs.
>>
>> Yeah, I can reproduce that. There's clearly something wrong.
>
>> I believe this is an OpenSSL bug. I traced the "unexpected message" error to
>> this code in OpenSSL's s3_read_bytes() function:
>
> Yep, I got to that as well.
>
>>>     case SSL3_RT_APPLICATION_DATA:
>>>         /*
>>>          * At this point, we were expecting handshake data, but have
>>>          * application data.  If the library was running inside ssl3_read()
>>>          * (i.e. in_read_app_data is set) and it makes sense to read
>>>          * application data at this point (session renegotiation not yet
>>>          * started), we will indulge it.
>>>          */
>>>         if (s->s3->in_read_app_data &&
>>>             (s->s3->total_renegotiations != 0) &&
>>>             (((s->state & SSL_ST_CONNECT) &&
>>>               (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
>>>               (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
>>>              ) || ((s->state & SSL_ST_ACCEPT) &&
>>>                    (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
>>>                    (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
>>>              )
>>>             )) {
>>>             s->s3->in_read_app_data = 2;
>>>             return (-1);
>>>         } else {
>>>             al = SSL_AD_UNEXPECTED_MESSAGE;
>>>         SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
>>>             goto f_err;
>>>         }
>>
>> So that quite clearly throws an error, *unless* the original application
>> call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
>> SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
>> fault and it should "indulge" the application data message even in
>> SSL_write().
>
> That'd be nice, but I think there were multiple reports to them about
> this and there wasn't much of a response. Maybe it's time to try again.
>
>> In theory, I guess we should do similar hacks in the server, and always call
>> SSL_read() before SSL_write(), but it seems to work without it. Or maybe
>> not; OpenSSL server and client code is not symmetric, so perhaps it works in
>> server mode without that.
>
> I think we should do it on the server side - got some server side errors
> when I cranked up the pg_receivexlog's status interval and set the wal
> sender timeout very low.

I've not been able to reproduce any errors on the server side, with my
latest client-only patch. Not sure why. I would assume that the server
has the same problem, but perhaps there are some mitigating factors that
make it impossible or at least less likely there.

I'm planning to commit and backpatch the first two of the attached
patches. The first is essentially the same as what I've posted before,
with some minor cleanup. I realized that the hack can be isolated
completely in fe-secure-openssl.c by putting the "kill-switch" in the
custom BIO_read routine, instead of pqsecure_raw_read(), so I did that.
The second patch makes pqSendSome() call pqReadData() also in
non-blocking mode. Per discussion, that's necessary to avoid getting
stuck, if an application that uses non-blocking mode doesn't call
pqConsumeInput() between pqFlush() calls.

For the sake of completeness, the third one applies the same kill-switch
hack to the server, preventing SSL_write() from reading anything.
However, as it doesn't seem to actually be necessary, and it happens to
be a bit more messy to implement in the backend than in libpq, I'm not
going to commit that one.

- Heikki

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: SSL renegotiation and other related woes

Вложения

Вложения

Вложения

Вложения