SSL renegotiation and other related woes

Поиск
Список
Период
Сортировка
От Andres Freund
Тема SSL renegotiation and other related woes
Дата
Msg-id 20150126101405.GA31719@awork2.anarazel.de
обсуждение исходный текст
Ответы Re: SSL renegotiation and other related woes  (Andres Freund <andres@2ndquadrant.com>)
Re: SSL renegotiation and other related woes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning  SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
again with the same parameters. Unfortunately that rule doesn't mean  just that the same parameters have to be passed
in,but also that we  can't just constantly switch between _read()/write(). Especially  nonblocking backend code (i.e.
walsender)and the whole frontend code  violate this rule.
 

2) We start renegotiations in be_tls_write() while in nonblocking mode,  but don't properly retry to handle socket
readyness.There's a loop  that retries handshakes twenty times (???), but what actually is  needed is to call
SSL_get_error()and ensure that there's actually  data available.
 

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

As far as I can see the only realistic way to fix 1) is to change both
frontend and backend code to:
a) Always check for socket read/writeability before calling  SSL_read/write() when in nonblocking mode. That's a bit
annoying because it nearly doubles the amount of syscalls we do or client  communication, but I can't really se an
alternative.That allows us  to avoid waiting inside after a WANT_READ/WRITE, or havin to setup a  larger state machine
thatkeeps track what we tried last.
 

b) When SSL_read/write nonetheless returns WANT_READ/WRITE, even though  we tested for read/writeability, we're very
likelydoing  renegotiation. In that case we'll just have to block. There's already  code that busy loops (and thus
waits)in the frontend  (c.f. pgtls_read's WANT_WRITE case, triggered during reneg). We can't  just return immediately
tothe upper layers as we'd otherwise likely  violate the rule about calling ssl with the same parameters again.
 

c) Add a somewhat hacky optimization whereas we allow to break out of a  WANT_READ condition in a nonblocking socket
whenssl->state ==  SSL_ST_OK. That's the cases where it actually, at least by my reading  of the unreadable ssl code,
safeto not wait. That case is somewhat  important because we otherwise can end up waiting on both sides due  to b),
evenwhen nonblocking calls where actually made.  That  condition essentially means that we'll only block if
renegotiationor  partial reads are in progress. Afaics at least.
 

d) Remove the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER hack - we don't  actually need it anymore.

These errors are much less frequent when using a plain frontend
(e.g. psql/pgbench) because they don't use copy both stuff - the way
these clients use the FE/BE protocol there's essentially natural
synchronization points where nothing but renegotiation happens. With
walsender (or pipelined queries!) both sides can write at the same time.


My testcase for this is just to setup a server with a low
ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
receive data via pg_receivexlog -n. Usually it'll error out quickly.


I've done a preliminary implementation of the above steps and it
survives transferring 25GB of WAL via the replication protocol with a
ssl_renegotiation_limit=100kB - previously it failed much earlier.


Does anybody have a neater way to tackle this? I'm not happy about this
solution, but I really can't think of anything better (save ditching
openssl maybe).  I'm willing to clean up my hacked up fix for this, but
not if we can't find agreement on the approach.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: pg_dump with both --serializable-deferrable and -j
Следующее
От: Andres Freund
Дата:
Сообщение: Re: SSL renegotiation and other related woes