From be5422db22a496b1a463d141c1d093a46f056f9f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?I=C3=B1aki=20Baz=20Castillo?= Date: Thu, 22 Feb 2024 12:31:07 +0100 Subject: [PATCH] Fix DTLS fragment exceeds MTU size if the certificate is large Fixes #1100 ### Details I cannot believe that what I've done works. So let me first explain what the problem is (which is perfectly explained in issue #1100), I hope I'm right: As commented in https://github.com/versatica/mediasoup/issues/1100#issuecomment-1959002237, the problem we had is: - When it comes to send DTLS data to the remote client, `DtlsTransport::SendPendingOutgoingDtlsData()` in mediasoup calls `read = BIO_get_mem_data(...)`, and obtained `read` maybe higher than our `DtlsMtu = 1350` (for example in DTLS Server Hello message if our certificate is very large). Let's say that `read` is 3000 bytes which is higher than 1350 so problems here. - But we enable all the MTU related stuff in OpenSSL DTLS in `DtlsTransport` so we **know** (and this is proven, see https://github.com/versatica/mediasoup/issues/1100#issuecomment-1959002237) that those 3000 bytes contain **N fragments** of a DTLS packet (in this case Server Hello). - But we don't know how long each fragment is so we cannot read and send them separately. And we don't know it because `read = BIO_get_mem_data(...)` returns the **total** size as explained above. What does this PR? - In `DtlsTransport::SendPendingOutgoingDtlsData()` we call `read = BIO_get_mem_data(...)` as usual. - If `read <= DtlsMtu` then nothing changes, we notify the listener with the full data. - If `read > DtlsMtu` then we split the total `data` into fragments of max `DtlsMtu` bytes and notify the parent separately with each of them. - So the parent (`WebRtcTransport`) will send each fragment into a separate UDP datagram. - And for whatever reason it **WORKS!** ``` DtlsTransport::SendPendingOutgoingDtlsData() | data to be sent is longer than DTLS MTU, fragmenting it [len:6987, DtlsMtu:1350] +248ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:0, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:1350, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:2700, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:4050, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:5400, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:6750, len:237] +0ms ``` An interesting fact is that, if I remove `SSL_OP_NO_QUERY_MTU` flag in the call to `SSL_CTX_set_options()`, then things don't work and the client/browser keeps sending DTLS Client Hello forever, meaning that the DTLS data it's receiving from mediasoup is invalid. So this is magic. How is possible that this works? AFAIU OpenSSL is generating valid DTLS **fragments** but it doesn't expose them to the application separately (see the problem description above). So does this PR work by accident??? Maybe not! Probably OpenSSL is generating DTLS fragments of **exactly** our given `DtlsMtu` size (which BTW makes sense since we call `SSL_set_mtu(this->ssl, DtlsMtu)` and `DTLS_set_link_mtu(this->ssl, DtlsMtu)`), so if we split the total DTLS data to be sent into fragments of `DtlsMtu` **exact** sizes, then we are **indeed** taking **valid** DTLS fragments, so we can send them into separate UDP datagrams to the client. **IS IT???** **DOES IT MAKE ANY SENSE???** IMHO it does, but I did **NOT** expect this to work. --- worker/src/RTC/DtlsTransport.cpp | 37 ++++++++++++++++++++++++++------ 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/worker/src/RTC/DtlsTransport.cpp b/worker/src/RTC/DtlsTransport.cpp index 3bb8a9657c..7af0d6933f 100644 --- a/worker/src/RTC/DtlsTransport.cpp +++ b/worker/src/RTC/DtlsTransport.cpp @@ -1189,10 +1189,8 @@ namespace RTC return; } - int64_t read; char* data{ nullptr }; - - read = BIO_get_mem_data(this->sslBioToNetwork, &data); // NOLINT + const int64_t read = BIO_get_mem_data(this->sslBioToNetwork, &data); // NOLINT if (read <= 0) { @@ -1201,9 +1199,36 @@ namespace RTC MS_DEBUG_DEV("%" PRIu64 " bytes of DTLS data ready to sent to the peer", read); - // Notify the listener. - this->listener->OnDtlsTransportSendData( - this, reinterpret_cast(data), static_cast(read)); + if (read <= DtlsMtu) + { + // Notify the listener. + this->listener->OnDtlsTransportSendData( + this, reinterpret_cast(data), static_cast(read)); + } + else + { + MS_WARN_DEV( + "data to be sent is longer than DTLS MTU, fragmenting it [len:%" PRIi64 ", DtlsMtu:%i]", + read, + DtlsMtu); + + size_t offset{ 0u }; + + while (offset < read) + { + auto* fragmentData = reinterpret_cast(data + offset); + auto fragmentLen = static_cast(read) - offset >= DtlsMtu + ? DtlsMtu + : static_cast(read) - offset; + + MS_DEBUG_DEV("sending fragment [offset:%zu, len:%zu]", offset, fragmentLen); + + // Notify the listener. + this->listener->OnDtlsTransportSendData(this, fragmentData, fragmentLen); + + offset += fragmentLen; + } + } // Clear the BIO buffer. // NOTE: the (void) avoids the -Wunused-value warning.