diff --git a/docs/dev/Network-Protocol.md b/docs/dev/Network-Protocol.md deleted file mode 100644 index e25d5994fe0..00000000000 --- a/docs/dev/Network-Protocol.md +++ /dev/null @@ -1,5 +0,0 @@ -# Mumble Network Protocol Documentation - -The network protocol between client and server is documented on the website [mumble-protocol.readthedocs.io](https://mumble-protocol.readthedocs.io/). - -*That documentation is sourced from the [mumble-protocol repository](https://github.com/mumble-voip/mumble-protocol).* diff --git a/docs/dev/network-protocol/README.rst b/docs/dev/network-protocol/README.rst new file mode 100644 index 00000000000..9f09bd5c387 --- /dev/null +++ b/docs/dev/network-protocol/README.rst @@ -0,0 +1,12 @@ +Mumble Network Protocol Documentation +============================= + +The Mumble Network Protocol documentation is meant to be a reference for the +Mumble VoIP 1.2.X server-client communication protocol. It reflects the state of +the protocol implemented in the Mumble 1.2.8 client and might be outdated by the +time you are reading this. + +* `Overview `_ +* `Protocol Stack (TCP) `_ +* `Establishing a Connection `_ +* `Voice Data `_ diff --git a/docs/dev/network-protocol/establishing_connection.rst b/docs/dev/network-protocol/establishing_connection.rst new file mode 100644 index 00000000000..71c3b0323cf --- /dev/null +++ b/docs/dev/network-protocol/establishing_connection.rst @@ -0,0 +1,241 @@ +Establishing a connection +========================= + +This section describes the communication between the server and the client +during connection establishing, note that only the TCP connection needs +to be established for the client to be connected. After this the client +will be visible to the other clients on the server and able to send other +types of messages. + +Connect +------- + +As the basis for the synchronization procedure the client has to first +establish the TCP connection to the server and do a common TLSv1 handshake. +To be able to use the complete feature set of the Mumble protocol it is +recommended that the client provides a strong certificate to the server. +This however is not mandatory as you can connect to the server without +providing a certificate. However the server must provide the client with +its certificate and it is recommended that the client checks this. + +.. figure:: resources/mumble_connection_setup.png + :alt: Mumble connection setup + :align: center + + Mumble connection setup + +Version exchange +---------------- + +Once the TLS handshake is completed both sides should transmit their version +information using the Version message. The message structure is described below. + +.. table:: Version message + + +--------------------------------------+ + | Version | + +===========================+==========+ + | version   | uint32 | + +---------------------------+----------+ + | release | string | + +---------------------------+----------+ + | os | string | + +---------------------------+----------+ + | os_version | string | + +---------------------------+----------+ + +The version field is a combination of major, minor and patch version numbers (e.g. 1.2.0) +so that major number takes two bytes and minor and patch numbers take one byte each. +The structure is shown in figure \ref{fig:versionEncoding}. The release, os and os\_version +fields are common strings containing additional information. + +.. table:: Version field encoding (uint32) + + +---------------------------+----------+----------+ + | Major | Minor | Patch | + +===========================+==========+==========+ + | 2 bytes | 1 byte | 1 byte | + +---------------------------+----------+----------+ + +The version information may be used as part of the *SuggestConfig* checks, which usually +refer to the standard client versions. The major changes between these versions are listed +in table below. The *release*, *os* and *os_version* information is not interpreted in +any way at the moment. + +.. table:: Mumble version differences + + +---------------+-------------------------------------------+ + | Version | Major changes | + +===============+===========================================+ + | 1.2.0 | CELT 0.7.0 codec support | + +---------------+-------------------------------------------+ + | 1.2.2 | CELT 0.7.1 codec support | + +---------------+-------------------------------------------+ + | 1.2.3 | CELT 0.11.0 codec | + +---------------+-------------------------------------------+ + | 1.2.4 | Opus codec support, SuggestConfig message | + +---------------+-------------------------------------------+ + +Authenticate +------------ + +Once the client has sent the version it should follow this with the Authenticate message. +The message structure is described in the figure below. This message may be sent immediately +after sending the version message. The client does not need to wait for the server version +message. + +.. table:: Authenticate message + + +-----------------------------------------------+ + | Authenticate | + +===========================+===================+ + | username   | string | + +---------------------------+-------------------+ + | password | string | + +---------------------------+-------------------+ + | tokens | string | + +---------------------------+-------------------+ + +The username and password are UTF-8 encoded strings. While the client is free to accept any +username from the user the server is allowed to impose further restrictions. Furthermore +if the client certificate has been registered with the server the client is primarily +known with the username they had when the certificate was registered. For more +information see the server documentation. + +The password must only be provided if the server is passworded, the client provided no +certificate but wants to authenticate to an account which has a password set, or to +access the SuperUser account. + +The third field contains a list of zero or more token strings which act as passwords +that may give the client access to certain ACL groups without actually being a +registered member in them, again see the server documentation for more information. + +Crypto setup +------------ + +Once the Version packets are exchanged the server will send a CryptSetup packet to +the client. It contains the necessary cryptographic information for the OCB-AES128 +encryption used in the UDP Voice channel. The packet is described in figure +below. The encryption itself is described in a later section. + +.. table:: CryptSetup message + + +-----------------------------------------------+ + | CryptSetup | + +===========================+===================+ + | key   | bytes | + +---------------------------+-------------------+ + | client_nonce | bytes | + +---------------------------+-------------------+ + | server_nonce | bytes | + +---------------------------+-------------------+ + +Channel states +-------------- + +After the client has successfully authenticated the server starts listing the channels +by transmitting partial ChannelState message for every channel on this server. These +messages lack the channel link information as the client does not yet have full +picture of all the channels. Once the initial ChannelState has been transmitted +for all channels the server updates the linked channels by sending new packets for +these. The full structure of these ChannelState messages is shown below: + +.. table:: ChannelState message + + +-----------------------------------------------+ + | ChannelState | + +===========================+===================+ + | channel_id | uint32 | + +---------------------------+-------------------+ + | parent | uint32 | + +---------------------------+-------------------+ + | name | string | + +---------------------------+-------------------+ + | links | repeated uint32 | + +---------------------------+-------------------+ + | description | string | + +---------------------------+-------------------+ + | links_add | repeated uint32 | + +---------------------------+-------------------+ + | links_remove | repeated uint32 | + +---------------------------+-------------------+ + | temporary | optional bool | + +---------------------------+-------------------+ + | position | optional int32 | + +---------------------------+-------------------+ + + +*The server must send a ChannelState for the root channel identified with ID 0.* + +User states +----------- + +After the channels have been synchronized the server continues by listing the +connected users. This is done by sending a UserState message for each user +currently on the server, including the user that is currently connecting. + +.. table:: UserState message + + +-----------------------------------------------+ + | UserState | + +===========================+===================+ + | session | uint32 | + +---------------------------+-------------------+ + | actor | uint32 | + +---------------------------+-------------------+ + | name | string | + +---------------------------+-------------------+ + | user_id | uint32 | + +---------------------------+-------------------+ + | channel_id | uint32 | + +---------------------------+-------------------+ + | mute | bool | + +---------------------------+-------------------+ + | deaf | bool | + +---------------------------+-------------------+ + | suppress | bool | + +---------------------------+-------------------+ + | self_mute | bool | + +---------------------------+-------------------+ + | self_deaf | bool | + +---------------------------+-------------------+ + | texture | bytes | + +---------------------------+-------------------+ + | plugin_context | bytes | + +---------------------------+-------------------+ + | plugin_identity | string | + +---------------------------+-------------------+ + | comment | string | + +---------------------------+-------------------+ + | hash | string | + +---------------------------+-------------------+ + | comment_hash | bytes | + +---------------------------+-------------------+ +  | texture_hash | bytes | + +---------------------------+-------------------+ + | priority_speaker | bool | + +---------------------------+-------------------+ + | recording | bool | + +---------------------------+-------------------+ + +Server sync +----------- + +The client has now received a copy of the parts of the server state it +needs to know about. To complete the synchronization the server transmits +a ServerSync message containing the session id of the clients session, +the maximum bandwidth allowed on this server, the servers welcome text +as well as the permissions the client has in the channel it ended up in. + +For more information pease refer to the Mumble.proto file [#f1]_. + +Ping +---- + +If the client wishes to maintain the connection to the server it is required +to ping the server. If the server does not receive a ping for 30 seconds it +will disconnect the client. + +.. rubric:: Footnotes + +.. [#f1] https://raw.github.com/mumble-voip/mumble/master/src/Mumble.proto \ No newline at end of file diff --git a/docs/dev/network-protocol/overview.rst b/docs/dev/network-protocol/overview.rst new file mode 100644 index 00000000000..5d870fe6f6b --- /dev/null +++ b/docs/dev/network-protocol/overview.rst @@ -0,0 +1,29 @@ +Overview +======== + +Mumble is based on a standard server-client communication model. It +utilizes two channels of communication, the first one is a TCP connection +which is used to reliably transfer control data between the client and the +server. The second one is a UDP connection which is used for unreliable, +low latency transfer of voice data. + +.. figure:: resources/mumble_system_overview.png + :alt: Mumble system overview + :align: center + + Mumble system overview + +Both are protected by strong cryptography, this encryption is mandatory and cannot be disabled. The TCP control channel uses TLSv1 AES256-SHA [#f1]_ while the voice channel is encrypted with OCB-AES128 [#f2]_. + +.. figure:: resources/mumble_crypt_types.png + :alt: Mumble crypt types + :align: center + + Mumble crypto types + +While the TCP connection is mandatory the UDP connection can be compensated by tunnelling the UDP packets through the TCP connection as described in the protocol description later. + +.. rubric:: Footnotes + +.. [#f1] http://en.wikipedia.org/wiki/Transport_Layer_Security +.. [#f2] http://www.cs.ucdavis.edu/~rogaway/ocb/ocb-back.htm \ No newline at end of file diff --git a/docs/dev/network-protocol/protocol_stack_tcp.rst b/docs/dev/network-protocol/protocol_stack_tcp.rst new file mode 100644 index 00000000000..de4973379c6 --- /dev/null +++ b/docs/dev/network-protocol/protocol_stack_tcp.rst @@ -0,0 +1,88 @@ +Protocol stack (TCP) +==================== + +Mumble has a shallow and easy to understand stack. Basically it +uses Google's Protocol Buffers [#f1]_ with simple prefixing to +distinguish the different kinds of packets sent through an TLSv1 +encrypted connection. This makes the protocol very easily expandable. + +.. _mumble-packet: + +.. figure:: resources/mumble_packet.png + :alt: Mumble packet + :align: center + + Mumble packet + +The prefix consists out of the two bytes defining the type of the packet +in the payload and 4 bytes stating the length of the payload in bytes +followed by the payload itself. The following packet types are available +in the current protocol and all but UDPTunnel are simple protobuf messages. +If not mentioned otherwise all fields outside the protobuf encoding are big-endian. + + +.. table:: Packet types + + +---------+------------------------+ + | Type | Payload | + +=========+========================+ + | 0 | Version | + +---------+------------------------+ + | 1 | UDPTunnel | + +---------+------------------------+ + | 2 | Authenticate | + +---------+------------------------+ + | 3 | Ping | + +---------+------------------------+ + | 4 | Reject | + +---------+------------------------+ + | 5 | ServerSync | + +---------+------------------------+ + | 6 | ChannelRemove | + +---------+------------------------+ + | 7 | ChannelState | + +---------+------------------------+ + | 8 | UserRemove | + +---------+------------------------+ + | 9 | UserState | + +---------+------------------------+ + | 10 | BanList | + +---------+------------------------+ + | 11 | TextMessage | + +---------+------------------------+ + | 12 | PermissionDenied | + +---------+------------------------+ + | 13 | ACL | + +---------+------------------------+ + | 14 | QueryUsers | + +---------+------------------------+ + | 15 | CryptSetup | + +---------+------------------------+ + | 16 | ContextActionModify | + +---------+------------------------+ + | 17 | ContextAction | + +---------+------------------------+ + | 18 | UserList | + +---------+------------------------+ + | 19 | VoiceTarget | + +---------+------------------------+ + | 20 | PermissionQuery | + +---------+------------------------+ + | 21 | CodecVersion | + +---------+------------------------+ + | 22 | UserStats | + +---------+------------------------+ + | 23 | RequestBlob | + +---------+------------------------+ + | 24 | ServerConfig | + +---------+------------------------+ + | 25 | SuggestConfig | + +---------+------------------------+ + +For raw representation of each packet type see the attached Mumble.proto [#f2]_ file. + + +.. rubric:: Footnotes + +.. [#f1] https://github.com/google/protobuf +.. [#f2] https://raw.github.com/mumble-voip/mumble/master/src/Mumble.proto diff --git a/docs/dev/network-protocol/resources/mumble_connection_setup.odg b/docs/dev/network-protocol/resources/mumble_connection_setup.odg new file mode 100644 index 00000000000..75b24eef27c Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_connection_setup.odg differ diff --git a/docs/dev/network-protocol/resources/mumble_connection_setup.pdf b/docs/dev/network-protocol/resources/mumble_connection_setup.pdf new file mode 100644 index 00000000000..a7a2dd7f9e8 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_connection_setup.pdf differ diff --git a/docs/dev/network-protocol/resources/mumble_connection_setup.png b/docs/dev/network-protocol/resources/mumble_connection_setup.png new file mode 100644 index 00000000000..6b115bcf6c5 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_connection_setup.png differ diff --git a/docs/dev/network-protocol/resources/mumble_crypt_types.odg b/docs/dev/network-protocol/resources/mumble_crypt_types.odg new file mode 100644 index 00000000000..84dbb982aa1 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_crypt_types.odg differ diff --git a/docs/dev/network-protocol/resources/mumble_crypt_types.pdf b/docs/dev/network-protocol/resources/mumble_crypt_types.pdf new file mode 100644 index 00000000000..f30b011d1d6 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_crypt_types.pdf differ diff --git a/docs/dev/network-protocol/resources/mumble_crypt_types.png b/docs/dev/network-protocol/resources/mumble_crypt_types.png new file mode 100644 index 00000000000..dfb545e9e4f Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_crypt_types.png differ diff --git a/docs/dev/network-protocol/resources/mumble_packet.odg b/docs/dev/network-protocol/resources/mumble_packet.odg new file mode 100644 index 00000000000..278ae81524e Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_packet.odg differ diff --git a/docs/dev/network-protocol/resources/mumble_packet.pdf b/docs/dev/network-protocol/resources/mumble_packet.pdf new file mode 100644 index 00000000000..5c45bc89d0a Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_packet.pdf differ diff --git a/docs/dev/network-protocol/resources/mumble_packet.png b/docs/dev/network-protocol/resources/mumble_packet.png new file mode 100644 index 00000000000..60fc0eb82f1 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_packet.png differ diff --git a/docs/dev/network-protocol/resources/mumble_protocol_stack.odg b/docs/dev/network-protocol/resources/mumble_protocol_stack.odg new file mode 100644 index 00000000000..d405a8f8868 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_protocol_stack.odg differ diff --git a/docs/dev/network-protocol/resources/mumble_protocol_stack.pdf b/docs/dev/network-protocol/resources/mumble_protocol_stack.pdf new file mode 100644 index 00000000000..350b29cf1ff Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_protocol_stack.pdf differ diff --git a/docs/dev/network-protocol/resources/mumble_protocol_stack.png b/docs/dev/network-protocol/resources/mumble_protocol_stack.png new file mode 100644 index 00000000000..cdbb41de72b Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_protocol_stack.png differ diff --git a/docs/dev/network-protocol/resources/mumble_system_overview.odg b/docs/dev/network-protocol/resources/mumble_system_overview.odg new file mode 100644 index 00000000000..02a4d54fbc0 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_system_overview.odg differ diff --git a/docs/dev/network-protocol/resources/mumble_system_overview.pdf b/docs/dev/network-protocol/resources/mumble_system_overview.pdf new file mode 100644 index 00000000000..8ac7e3a8a57 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_system_overview.pdf differ diff --git a/docs/dev/network-protocol/resources/mumble_system_overview.png b/docs/dev/network-protocol/resources/mumble_system_overview.png new file mode 100644 index 00000000000..bfb24d3de24 Binary files /dev/null and b/docs/dev/network-protocol/resources/mumble_system_overview.png differ diff --git a/docs/dev/network-protocol/voice_data.rst b/docs/dev/network-protocol/voice_data.rst new file mode 100644 index 00000000000..ee3dd1872e7 --- /dev/null +++ b/docs/dev/network-protocol/voice_data.rst @@ -0,0 +1,393 @@ +.. _voice-data: + +Voice data +========== + +Mumble audio channel is used to transmit the actual audio packets over the +network. Unlike the TCP control channel, the audio channel uses a custom +encoding for the audio packets. The audio channel is transport independent and +features such as encryption are implemented by the transport layer. Integers +above 8-bits are encoded using the `Variable length integer encoding`_. + +.. _packet-format: + +Packet format +------------- + +The mumble audio channel packets are variable length packets that begin with an +8-bit header field which describes the packet type and target. The most +significant 3 bits define the packet type while the remaining 5 bits define the +target. The header is followed by the packet payload. The maximum size for the +whole audio data packet is 1020 bytes. This allows applications to use 1024 +byte buffers for receiving UDP datagrams with the 4-byte encryption header +overhead. + +.. _Audio packet structure: +.. table:: Audio packet structure + :class: bits8 + + +-------------------------------+ + | Audio packet structure | + +===+===+===+===+===+===+===+===+ + | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | + +---+---+---+---+---+---+---+---+ + | ``type`` | ``target`` | + +-----------+-------------------+ + | Payload... | + +-------------------------------+ + +type + The audio packet type. The packets transmitted over the audio channel are + either ping packets used to diagnose the transport layer connectivity or + audio packets encoded with different codecs. Different types are listed in + `Audio packet types`_ table. + +.. _Audio packet types: +.. table:: Audio packet types + + +---------+---------------+--------------------------------------------+ + | Type | Bitfield | Description | + +=========+===============+============================================+ + | ``0`` | ``000xxxxx`` | CELT Alpha encoded voice data | + +---------+---------------+--------------------------------------------+ + | ``1`` | ``001xxxxx`` | Ping packet | + +---------+---------------+--------------------------------------------+ + | ``2`` | ``010xxxxx`` | Speex encoded voice data | + +---------+---------------+--------------------------------------------+ + | ``3`` | ``011xxxxx`` | CELT Beta encoded voice data | + +---------+---------------+--------------------------------------------+ + | ``4`` | ``100xxxxx`` | OPUS encoded voice data | + +---------+---------------+--------------------------------------------+ + | ``5-7`` | | Unused | + +---------+---------------+--------------------------------------------+ + +target + The target portion defines the recipient for the audio data. The two constant + targets are *Normal talking* (``0``) and *Server Loopback* (``31``). The + range 1-30 is reserved for whisper targets. These targets are specified + separately in the control channel using the ``VoiceTarget`` packets. The + targets are listed in `Audio targets`_ table. + + When a client registers a VoiceTarget on the server, it gives the target an + ID. This voice target ID can be used as a target in the voice packets to send + audio to specific users or channels. When receiving whisper-audio the server + uses target 1 to specify the audio results from a whisper to a channel and + target 2 to specify that the audio results from a direct whisper to the user. + +.. _Audio targets: +.. table:: Audio targets + + +-----------+-----------------------------------------------------+ + | Target | Description | + +===========+=====================================================+ + | ``0`` | Normal talking | + +-----------+-----------------------------------------------------+ + | ``1-30`` | Whisper target | + | | | + | | - VoiceTarget ID when sending whisper from client. | + | | - 1 when receiving whisper to channel. | + | | - 2 when receiving direct whisper to user. | + +-----------+-----------------------------------------------------+ + | ``31`` | Server loopback | + +-----------+-----------------------------------------------------+ + +Ping packet +~~~~~~~~~~~ + +Audio channel ping packets are used as part of the connectivity checks on the +audio transport layer. These packets contain only varint encoded timestamp as +data. See `UDP connectivity checks`_ section below for the logic involved in +the connectivity checks. + +.. _Audio transport ping packet: + +.. table:: Audio transport ping packet + + +------------+-------------+----------------------------------+ + | Field | Type | Description | + +============+=============+==================================+ + | Header | ``byte`` | ``00100000b`` (``0x20``) | + +------------+-------------+----------------------------------+ + | Data | ``varint`` | Timestamp | + +------------+-------------+----------------------------------+ + +Header + Common audio packet header. For ping packets this should have the value of + 0x20. + +Data + Timestamp. The packet should be echoed back so the timestamp format can be + decided by the original sender - the only limitation is that it must fit in a + 64-bit integer for the varint encoding. + +Encoded audio data packet +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Encoded audio packets contain the actual user audio data for the voice +communication. Incoming audio data packets contain the common header byte +followed by varint encoded session ID of the source user and varint encoded +sequence number of the packet. Outgoing audio data packets contain only the +header byte and the sequence number of the packet. The server matches these to +the correct session using the transport layer information. + +The remainder of the packet is made up of multiple encoded audio segments and +optional positional audio information. The audio segment format depends on the +codec of the whole audio packets. The audio segments contain codec +implementation specific information on where the audio segments end so the +possible positional audio data can be read from the end. + +.. _Incoming encoded audio packet: +.. table:: Incoming encoded audio packet + + +--------------------+--------------+-----------------------------------------------------------+ + | Field | Type | Description | + +====================+==============+===========================================================+ + | Header | ``byte`` | Codec type/Audio target | + +--------------------+--------------+-----------------------------------------------------------+ + | Session ID | ``varint`` | Session ID of the source user. | + +--------------------+--------------+-----------------------------------------------------------+ + | Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | + +--------------------+--------------+-----------------------------------------------------------+ + | Payload | ``byte[]`` | Audio payload | + +--------------------+--------------+-----------------------------------------------------------+ + | Position Info | ``float[3]`` | Positional audio information | + +--------------------+--------------+-----------------------------------------------------------+ + + +.. _Outgoing encoded audio packet: +.. table:: Outgoing encoded audio packet + + +--------------------+--------------+-----------------------------------------------------------+ + | Field | Type | Description | + +====================+==============+===========================================================+ + | Header | ``byte`` | Codec type/Audio target | + +--------------------+--------------+-----------------------------------------------------------+ + | Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | + +--------------------+--------------+-----------------------------------------------------------+ + | Payload | ``byte[]`` | Audio payload | + +--------------------+--------------+-----------------------------------------------------------+ + | Position Info | ``float[3]`` | Positional audio information | + +--------------------+--------------+-----------------------------------------------------------+ + +Header + The common audio packet header + +Session ID + Session ID of the user to whom the audio packet belongs. + +Sequence Number + Audio data sequence number. The sequence number is used to maintain the + packet order when the audio data is transported over unreliable transports + such as UDP. + + The sequence number might increase by more than one between subsequent audio + packets in case the audio packets contain multiple audio segments. This + allows the packet loss concealment algorithms to figure out how many audio + frames were lost between two received packets. + +Payload + Audio payload. Format depends on the audio codec defined in the Header. The + payload must be self-delimiting to determine whether the position info exists + at the end of the packet. + +Position Info + The XYZ coordinates of the audio source. In addition to sending the position + information, the user must be using a positional plugin defined in the + ``UserState`` message. The plugins might define different contexts which + prevent voice communication between users in other contexts. + +Speex and CELT audio frames +""""""""""""""""""""""""""" + +Encoded Speex and CELT audio is transported as individual encoded frames. Each +frame is prefixed with a single byte length and terminator header. + +.. _celt-encoded-audio-data: + +.. table:: CELT encoded audio data + + +---------+-------------+-----------------------------------------+ + | Field | Type | Description | + +=========+=============+=========================================+ + | Header | ``byte`` | length/continuation header | + +---------+-------------+-----------------------------------------+ + | Data | ``byte[]`` | Encoded voice frame | + +---------+-------------+-----------------------------------------+ + +Header + The length of the Data field. The most significant bit (``0x80``) acts as the + continuation bit and is set for all but the last frame in the payload. The + remaining 7 bits of the header contain the actual length of the Data frame. + + Note the length may be zero, which is used to signal the end of a voice + transmission. In this case the audio data is a single zero-byte which can be + interpreted normally as length of 0 with no continuation bit set. + +Data + Single encoded audio frame. The encoding depends on the codec ``type`` header + of the whole audio packet + +Opus audio frames +""""""""""""""""" + +Encoded Opus audio is transported as a single Opus audio frame. The frame is prefixed with a variable byte header. + +.. _opus-encoded-audio-data: + +.. table:: Opus encoded audio data + + +---------+-------------+-----------------------------------------+ + | Field | Type | Description | + +=========+=============+=========================================+ + | Header | ``varint`` | length/terminator header | + +---------+-------------+-----------------------------------------+ + | Data | ``byte[]`` | Encoded voice frame | + +---------+-------------+-----------------------------------------+ + +Header + The length of the Data field. 16-bit variable length integer encoded length + and terminator bit value. The varint encoding is the same as with 64-bit + values, but only 16-bit unencoded values are allowed. + + The maximum voice frame size is 8191 (``0x1FFF``) bytes requiring the 13 least + significant bits of the header. The 14th bit (mask: ``0x2000``) is the terminator + bit which signals whether the packet is the last one in the voice + transmission. + + Note: In CELT the "continuation bit" in the header defines whether there are + more audio frames in the current packet. Opus always contains only one frame + in the packet. In CELT the voice transmission end is signaled with a + zero-byte CELT packet while in Opus we have a dedicated termination bit in + the header. + +Data + The encoded Opus data. + +Codecs +------ + +Mumble supports three distinct codecs; Older Mumble versions use Speex for low +bitrate audio and CELT for higher quality audio while new Mumble versions +prefer Opus for all audio. When multiple clients with different capabilities +communicate together the server is responsible for resolving the codec to use. +The clients should respect the server resolution if they are capable. + +If the server resolves a codec a client doesn't support, that client is free to +use any codec it prefers. Usually this means the client will not be able to +decode incoming audio, but it can still send encoded audio out. + +The CELT bitstream was never frozen which makes most CELT versions incompatible +with each other. The two CELT bitstreams supported by Mumble are: CELT 0.7.0 +(CELT Alpha) and CELT 0.11.0 (CELT Beta). While CELT 0.7.0 should technically +be supported by most Mumble implementations, some servers might be configured +to force Opus codec for the users. Mumble has had Opus support since 1.2.4 +(June 2013) so it should be safe to assume most clients in use support this +now. + +Whispering +---------- + +Normal talking can be heard by the users of the current channel and all linked +channels as long as the speaker has Talk permission on these channels. If the +speaker wishes to broadcast the voice to specific users or channels, they may +use whispering. This is achieved by registering a voice target using the +VoiceTarget message and specifying the target ID as the target in the first +byte of the UDP packet. + +UDP connectivity checks +----------------------- + +Since UDP is a connectionless protocol, it is heavily affected by network +topology such as NAT configuration. It should not be used for audio +transmission before the connectivity has been determined. + +The client starts the connectivity checks by sending a `Ping packet`_ to the +server. When the server receives this packet it will respond by echoing it back +to the address it received it from. Once the client receives the response from +the server it can start using the UDP transport for audio data. When the server +receives incoming audio data over the UDP transport it can switch the outgoing +audio over to UDP transport as well. + +If the client stops receiving replies to the UDP pings at some point, it should +start tunneling the voice communication through the TCP tunnel as described in +the `Tunneling audio over TCP`_ below. When the server receives a tunneled +packet over the TCP connection it must also stop using the UDP for +communication. The client should still continue sending audio ping packets over +the UDP transport in case the UDP connection is restored and the communication +can be switched back to it. + +Tunneling audio over TCP +------------------------ + +If the UDP channel isn't available the voice packets can be transmitted through +the TCP transport used for the control channel. These messages use the normal +TCP prefixing, as shown in figure :ref:`mumble-packet`: 16-bit message type +followed by 32-bit message length. However unlike other TCP messages, the audio +packets are not encoded as protocol buffer messages but instead the raw audio +packet described in `Packet format`_ should be written to the TCP socket +verbatim. + +When the packets are received it is safe to parse the type and length fields +normally. If the type matches that of the audio tunnel the rest of the message +should be processed as an UDP packet without attempting a protocol buffer +decoding. + +Implementation note +~~~~~~~~~~~~~~~~~~~ + +When implementing the protocol it is easier to ignore the UDP transfer layer at +first and just tunnel the UDP data through the TCP tunnel. The TCP layer must +be implemented for authentication in any case. Making sure that the voice +transmission works before implementing the UDP protocol simplifies debugging +greatly. + +Encryption +---------- + +All the packets are encrypted once during transfer. The actual encryption +depends on the used transport layer. If the packets are tunneled through TCP +they are encrypted using the TLS that encrypts the whole control channel +connection and if they are sent directly using UDP they must be encrypted using +the OCB-AES128 encryption. + +Variable length integer encoding +-------------------------------- + +The variable length integer encoding (``varint``) is used to encode long, +64-bit, integers so that short values do not need the full 8 bytes to be +transferred. The basic idea behind the encoding is prefixing the value with a +length prefix and then removing the leading zeroes from the value. The positive +numbers are always right justified. That is to say that the least significant +bit in the encoded presentation matches the least significant bit in the +decoded presentation. The *varint prefixes* table contains the definitions of +the different length prefixes. The encoded ``x`` bits are part of the decoded +number while the ``_`` signifies a unused bit. Encoding should be done by +searching the first decoded description that fits the number that should be +decoded, truncating it to the required bytes and combining it with the defined +encoding prefix. + +See the *quint64* shift operators in +https://github.com/mumble-voip/mumble/blob/master/src/PacketDataStream.h +for a reference implementation. + +.. table:: Varint prefixes + + +----------------------------------+--------------------------------------------------------+ + | Encoded | Decoded | + +==================================+========================================================+ + | ``0xxxxxxx`` | 7-bit positive number | + +----------------------------------+--------------------------------------------------------+ + | ``10xxxxxx`` + 1 byte | 14-bit positive number | + +----------------------------------+--------------------------------------------------------+ + | ``110xxxxx`` + 2 bytes | 21-bit positive number | + +----------------------------------+--------------------------------------------------------+ + | ``1110xxxx`` + 3 bytes | 28-bit positive number | + +----------------------------------+--------------------------------------------------------+ + | ``111100__`` + ``int`` (32-bit) | 32-bit positive number | + +----------------------------------+--------------------------------------------------------+ + | ``111101__`` + ``long`` (64-bit) | 64-bit number | + +----------------------------------+--------------------------------------------------------+ + | ``111110__`` + ``varint`` | Negative recursive varint | + +----------------------------------+--------------------------------------------------------+ + | ``111111xx`` | Byte-inverted negative two bit number (``~xx``) | + +----------------------------------+--------------------------------------------------------+