Inexplicable OpenSSL errors (SSL_Read/SSL_write)


Lately I had experienced strange problem with Openssl library. During usual work on one of my services SSL_read function started to fails randomly. SSL_get_error() always showed SSL_ERROR_SSL error. Further investigation using ERR_get_error() didn't help - sometimes error was related to handshake, sometimes to other reasons, but still I wasn't able to determine exact error. More strangeness brings the fact that handshake error was shown in case when connection was established for a long time and data was successfully transmitted many times. This strange error appears in random order without any explanations.

To make deeper investigation I tried to capture traffic on client/server side. It was a big challenge because using of ECDHA ciphers required client/server modification. Finally I caught error and started to looking into traffic and..... i found nothing. Even Wireshark was able to normally decode all data. So my next assumption was incorrect memory read/write access.

I build debug version of my service and started it under valgrind. To exclude errors on client side I wrote small test app using python and then tried to reproduce issue. After some time I saw this issue again and valgrind didn't show anything regarding memory access (multiple thread access wasn't even checked because app is single-threaded).

I tried to google such problems, found a few threads but with no solution. Then I decided to rewrite code that working with openssl API. This didn't help me to find a solution.

At the same time I tried to google information one more time. Likely I was directed to 15-years old thread . It says that I should use ERR_clear_error() before any SSL-relayed I/O operation... But neither SSL_read man file says so, nor SSL_get_error man file or ERR_get_error. So finally I added ERR_clean_error() before each SSL_read/SSL_write operation and finally my issue was gone. This simple fact costed me ~2 weeks of troubleshooting...


Comments