libpq SSL with non-blocking sockets
От | Martin Pihlak |
---|---|
Тема | libpq SSL with non-blocking sockets |
Дата | |
Msg-id | 4DF0DB25.9050105@gmail.com обсуждение исходный текст |
Ответы |
Re: libpq SSL with non-blocking sockets
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
I believe I have found a bug in libpq COPY handling with non-blocking SSL connections. The bug manifests itself by dropping the connection in PGputCopyData() with "server closed the connection unexpectedly" message. The connection drop only occurs with nonblocking connections - blocking connections work as expected. I'll skip a lot of debugging details, but what I found out was that the connection drop was caused by a SSL "bad write retry" from SSL_write(). Which in turn appears to be caused by bad data passed to SSL_write() retry. So, what happened was that at some point SSL_write() returned SSL_ERROR_WANT_WRITE meaning that the write needs to be retried. Now, according to OpenSSL manual the retry SSL_write needs to be passed the same arguments as the previous call: http://www.openssl.org/docs/ssl/SSL_write.html#WARNING However the next SSL_write was passed another buffer with a different length and it failed with SSL_ERROR_SSL "bad write retry". Which in turn caused the pqSendSome to drop the connection and failed the PQputCopyData. Actually, now that I think of it, this is probably not only related to COPY handling ... The connection drop can be reproduced by introducing some "network problems" to the connection, so that the SSL_write needs to be retried. This can be most easily accomplished by passing a huge buffer to PQputCopyData. For instance on my laptop I can do about 3 PQputCopyData with a 100MB buffer before the connection is dropped. Others results may vary. Also, it doesn't seem to matter what the libpq version is used - I initially started on 8.3, but later confirmed that the same problem exists on 8.4, 9.0 and 9.1. Tested with OpenSSL 0.9.8g and 0.9.8o, no idea about other versions. The following is a log excerpt from my test program (attached) with some extra logging added to libpq: PQputCopyData #0 buf:100000000 total:0 pqFlush() pqFlush() = 0 pqPutMsgEnd() pqSendSome(99999744) SSL_write(0x7e59d60, 0x7ff5a7eac010, 99999744) = 99999744: SSL_error: 0, errno: 0 pqPutMsgEnd() = 0 PQputCopyData #1 buf:100000000 total:100000000 pqPutMsgEnd() pqSendSome(99999744) SSL_write(0x7e59d60, 0x7ff5a7eac010, 99999744) = -1: SSL_error: 3, errno: 11 pqPutMsgEnd() = 0 PQputCopyData #2 buf:100000000 total:200000000 pqFlush() pqSendSome(100000266) SSL_write(0x7e59d60, 0x7ff5a7eac010, 100000266) = -1: SSL_error: 3, errno: 11 pqFlush() = 1 pqPutMsgEnd() pqSendSome(199999488) SSL_write(0x7e59d60, 0x7ff597eab010, 199999488) = -1: SSL_error: 1, errno: 0 SSL error: bad write retry pqPutMsgEnd() = -1 PQputCopyData() = -1: server closed the connection unexpectedly In PQputCopyData #2 it is visible that the first SSL_write called from pqFlush failed with SSL_ERROR_WANT_WRITE. The next SSL_write should have been a retry with the same parameters, but instead was passed a buffer with a different address and length. Hence the "bad write retry". Some googling turned out similar issues for other projects using SSL with non-blocking sockets. The possible workarounds are to disable SSL or to disable non-blocking libpq connections. Both are not always possible - security reasons, 3rd party applications, drivers, etc. So I think this should be fixed in libpq. Not sure exactly how though. It would seem that for the PQputCopyData the best would be to return 0 to indicate that the operation should be retried. No idea for the other possible cases of SSL_write() retry though. What do you think? regards, Martin
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS