From 989329292fae73d4d079e7acd56adf86df3c8eea Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:06:57 +0100 Subject: [PATCH 01/14] DOCS(dev): Rename rst files to md This is done as a separate commit to retain Git change file history association. (Git does not store renames. It assumes renames according to content similarity.) --- docs/dev/network-protocol/{README.rst => README.md} | 0 .../{establishing_connection.rst => establishing_connection.md} | 0 docs/dev/network-protocol/{overview.rst => overview.md} | 0 .../{protocol_stack_tcp.rst => protocol_stack_tcp.md} | 0 docs/dev/network-protocol/{voice_data.rst => voice_data.md} | 0 5 files changed, 0 insertions(+), 0 deletions(-) rename docs/dev/network-protocol/{README.rst => README.md} (100%) rename docs/dev/network-protocol/{establishing_connection.rst => establishing_connection.md} (100%) rename docs/dev/network-protocol/{overview.rst => overview.md} (100%) rename docs/dev/network-protocol/{protocol_stack_tcp.rst => protocol_stack_tcp.md} (100%) rename docs/dev/network-protocol/{voice_data.rst => voice_data.md} (100%) diff --git a/docs/dev/network-protocol/README.rst b/docs/dev/network-protocol/README.md similarity index 100% rename from docs/dev/network-protocol/README.rst rename to docs/dev/network-protocol/README.md diff --git a/docs/dev/network-protocol/establishing_connection.rst b/docs/dev/network-protocol/establishing_connection.md similarity index 100% rename from docs/dev/network-protocol/establishing_connection.rst rename to docs/dev/network-protocol/establishing_connection.md diff --git a/docs/dev/network-protocol/overview.rst b/docs/dev/network-protocol/overview.md similarity index 100% rename from docs/dev/network-protocol/overview.rst rename to docs/dev/network-protocol/overview.md diff --git a/docs/dev/network-protocol/protocol_stack_tcp.rst b/docs/dev/network-protocol/protocol_stack_tcp.md similarity index 100% rename from docs/dev/network-protocol/protocol_stack_tcp.rst rename to docs/dev/network-protocol/protocol_stack_tcp.md diff --git a/docs/dev/network-protocol/voice_data.rst b/docs/dev/network-protocol/voice_data.md similarity index 100% rename from docs/dev/network-protocol/voice_data.rst rename to docs/dev/network-protocol/voice_data.md From 24d71a3cc6bdfa1953fdf2fe0a3c0d6ec7ca44ba Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:13:58 +0100 Subject: [PATCH 02/14] DOCS(dev): Convert headlines --- docs/dev/network-protocol/README.md | 3 +- .../establishing_connection.md | 27 +++++-------- docs/dev/network-protocol/overview.md | 3 +- .../network-protocol/protocol_stack_tcp.md | 3 +- docs/dev/network-protocol/voice_data.md | 39 +++++++------------ 5 files changed, 25 insertions(+), 50 deletions(-) diff --git a/docs/dev/network-protocol/README.md b/docs/dev/network-protocol/README.md index 9f09bd5c387..d1cc83c4197 100644 --- a/docs/dev/network-protocol/README.md +++ b/docs/dev/network-protocol/README.md @@ -1,5 +1,4 @@ -Mumble Network Protocol Documentation -============================= +# Mumble Network Protocol Documentation The Mumble Network Protocol documentation is meant to be a reference for the Mumble VoIP 1.2.X server-client communication protocol. It reflects the state of diff --git a/docs/dev/network-protocol/establishing_connection.md b/docs/dev/network-protocol/establishing_connection.md index ba6c2ebfef0..aa8bffadd87 100644 --- a/docs/dev/network-protocol/establishing_connection.md +++ b/docs/dev/network-protocol/establishing_connection.md @@ -1,5 +1,4 @@ -Establishing a connection -========================= +# Establishing a connection This section describes the communication between the server and the client during connection establishing, note that only the TCP connection needs @@ -7,8 +6,7 @@ to be established for the client to be connected. After this the client will be visible to the other clients on the server and able to send other types of messages. -Connect -------- +## Connect As the basis for the synchronization procedure the client has to first establish the TCP connection to the server and do a common TLSv1 handshake. @@ -24,8 +22,7 @@ its certificate and it is recommended that the client checks this. Mumble connection setup -Version exchange ----------------- +## Version exchange Once the TLS handshake is completed both sides should transmit their version information using the Version message. The message structure is described below. @@ -76,8 +73,7 @@ any way at the moment. | 1.2.4 | Opus codec support, SuggestConfig message | +---------------+-------------------------------------------+ -Authenticate ------------- +## Authenticate Once the client has sent the version it should follow this with the Authenticate message. The message structure is described in the figure below. This message may be sent immediately @@ -110,8 +106,7 @@ The third field contains a list of zero or more token strings which act as passw that may give the client access to certain ACL groups without actually being a registered member in them, again see the server documentation for more information. -Crypto setup ------------- +## Crypto setup Once the Version packets are exchanged the server will send a CryptSetup packet to the client. It contains the necessary cryptographic information for the OCB-AES128 @@ -130,8 +125,7 @@ below. The encryption itself is described in a later section. | server_nonce | bytes | +---------------------------+-------------------+ -Channel states --------------- +## Channel states After the client has successfully authenticated the server starts listing the channels by transmitting partial ChannelState message for every channel on this server. These @@ -167,8 +161,7 @@ these. The full structure of these ChannelState messages is shown below: *The server must send a ChannelState for the root channel identified with ID 0.* -User states ------------ +## User states After the channels have been synchronized the server continues by listing the connected users. This is done by sending a UserState message for each user @@ -218,8 +211,7 @@ currently on the server, including the user that is currently connecting. | recording | bool | +---------------------------+-------------------+ -Server sync ------------ +## Server sync The client has now received a copy of the parts of the server state it needs to know about. To complete the synchronization the server transmits @@ -229,8 +221,7 @@ as well as the permissions the client has in the channel it ended up in. For more information pease refer to the Mumble.proto file [#f1]_. -Ping ----- +## Ping If the client wishes to maintain the connection to the server it is required to ping the server. If the server does not receive a ping for 30 seconds it diff --git a/docs/dev/network-protocol/overview.md b/docs/dev/network-protocol/overview.md index 5d870fe6f6b..edd5bb4bfc1 100644 --- a/docs/dev/network-protocol/overview.md +++ b/docs/dev/network-protocol/overview.md @@ -1,5 +1,4 @@ -Overview -======== +# Overview Mumble is based on a standard server-client communication model. It utilizes two channels of communication, the first one is a TCP connection diff --git a/docs/dev/network-protocol/protocol_stack_tcp.md b/docs/dev/network-protocol/protocol_stack_tcp.md index de4973379c6..8e2eef90f3b 100644 --- a/docs/dev/network-protocol/protocol_stack_tcp.md +++ b/docs/dev/network-protocol/protocol_stack_tcp.md @@ -1,5 +1,4 @@ -Protocol stack (TCP) -==================== +# Protocol stack (TCP) Mumble has a shallow and easy to understand stack. Basically it uses Google's Protocol Buffers [#f1]_ with simple prefixing to diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index 57623677b2b..fd0e38f1361 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -1,7 +1,6 @@ .. _voice-data: -Voice data -========== +# Voice data Mumble audio channel is used to transmit the actual audio packets over the network. Unlike the TCP control channel, the audio channel uses a custom @@ -11,8 +10,7 @@ above 8-bits are encoded using the `Variable length integer encoding`_. .. _packet-format: -Packet format -------------- +## Packet format The mumble audio channel packets are variable length packets that begin with an 8-bit header field which describes the packet type and target. The most @@ -91,8 +89,7 @@ target | ``31`` | Server loopback | +-----------+-----------------------------------------------------+ -Ping packet -~~~~~~~~~~~ +### Ping packet Audio channel ping packets are used as part of the connectivity checks on the audio transport layer. These packets contain only varint encoded timestamp as @@ -120,8 +117,7 @@ Data decided by the original sender - the only limitation is that it must fit in a 64-bit integer for the varint encoding. -Encoded audio data packet -~~~~~~~~~~~~~~~~~~~~~~~~~ +### Encoded audio data packet Encoded audio packets contain the actual user audio data for the voice communication. Incoming audio data packets contain the common header byte @@ -196,8 +192,7 @@ Position Info ``UserState`` message. The plugins might define different contexts which prevent voice communication between users in other contexts. -Speex and CELT audio frames -""""""""""""""""""""""""""" +#### Speex and CELT audio frames Encoded Speex and CELT audio is transported as individual encoded frames. Each frame is prefixed with a single byte length and terminator header. @@ -227,8 +222,7 @@ Data Single encoded audio frame. The encoding depends on the codec ``type`` header of the whole audio packet -Opus audio frames -""""""""""""""""" +#### Opus audio frames Encoded Opus audio is transported as a single Opus audio frame. The frame is prefixed with a variable byte header. @@ -263,8 +257,7 @@ Header Data The encoded Opus data. -Codecs ------- +## Codecs Mumble supports three distinct codecs; Older Mumble versions use Speex for low bitrate audio and CELT for higher quality audio while new Mumble versions @@ -284,8 +277,7 @@ to force Opus codec for the users. Mumble has had Opus support since 1.2.4 (June 2013) so it should be safe to assume most clients in use support this now. -Whispering ----------- +## Whispering Normal talking can be heard by the users of the current channel and all linked channels as long as the speaker has Talk permission on these channels. If the @@ -294,8 +286,7 @@ use whispering. This is achieved by registering a voice target using the VoiceTarget message and specifying the target ID as the target in the first byte of the UDP packet. -UDP connectivity checks ------------------------ +## UDP connectivity checks Since UDP is a connectionless protocol, it is heavily affected by network topology such as NAT configuration. It should not be used for audio @@ -316,8 +307,7 @@ communication. The client should still continue sending audio ping packets over the UDP transport in case the UDP connection is restored and the communication can be switched back to it. -Tunneling audio over TCP ------------------------- +## Tunneling audio over TCP If the UDP channel isn't available the voice packets can be transmitted through the TCP transport used for the control channel. These messages use the normal @@ -332,8 +322,7 @@ normally. If the type matches that of the audio tunnel the rest of the message should be processed as an UDP packet without attempting a protocol buffer decoding. -Implementation note -~~~~~~~~~~~~~~~~~~~ +### Implementation note When implementing the protocol it is easier to ignore the UDP transfer layer at first and just tunnel the UDP data through the TCP tunnel. The TCP layer must @@ -341,8 +330,7 @@ be implemented for authentication in any case. Making sure that the voice transmission works before implementing the UDP protocol simplifies debugging greatly. -Encryption ----------- +## Encryption All the packets are encrypted once during transfer. The actual encryption depends on the used transport layer. If the packets are tunneled through TCP @@ -350,8 +338,7 @@ they are encrypted using the TLS that encrypts the whole control channel connection and if they are sent directly using UDP they must be encrypted using the OCB-AES128 encryption. -Variable length integer encoding --------------------------------- +## Variable length integer encoding The variable length integer encoding (``varint``) is used to encode long, 64-bit, integers so that short values do not need the full 8 bytes to be From 69bec3274d8afdb4917e44f0eeebcf1af3a2f199 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:17:19 +0100 Subject: [PATCH 03/14] DOCS(dev): Convert image refs --- docs/dev/network-protocol/establishing_connection.md | 6 +----- docs/dev/network-protocol/overview.md | 12 ++---------- docs/dev/network-protocol/protocol_stack_tcp.md | 8 +------- 3 files changed, 4 insertions(+), 22 deletions(-) diff --git a/docs/dev/network-protocol/establishing_connection.md b/docs/dev/network-protocol/establishing_connection.md index aa8bffadd87..763feb4e0b8 100644 --- a/docs/dev/network-protocol/establishing_connection.md +++ b/docs/dev/network-protocol/establishing_connection.md @@ -16,11 +16,7 @@ This however is not mandatory as you can connect to the server without providing a certificate. However the server must provide the client with its certificate and it is recommended that the client checks this. -.. figure:: resources/mumble_connection_setup.png - :alt: Mumble connection setup - :align: center - - Mumble connection setup +![Mumble connection setup](resources/mumble_connection_setup.png) ## Version exchange diff --git a/docs/dev/network-protocol/overview.md b/docs/dev/network-protocol/overview.md index edd5bb4bfc1..fa37313a774 100644 --- a/docs/dev/network-protocol/overview.md +++ b/docs/dev/network-protocol/overview.md @@ -6,19 +6,11 @@ which is used to reliably transfer control data between the client and the server. The second one is a UDP connection which is used for unreliable, low latency transfer of voice data. -.. figure:: resources/mumble_system_overview.png - :alt: Mumble system overview - :align: center - - Mumble system overview +![Mumble system overview](resources/mumble_system_overview.png) Both are protected by strong cryptography, this encryption is mandatory and cannot be disabled. The TCP control channel uses TLSv1 AES256-SHA [#f1]_ while the voice channel is encrypted with OCB-AES128 [#f2]_. -.. figure:: resources/mumble_crypt_types.png - :alt: Mumble crypt types - :align: center - - Mumble crypto types +![Mumble crypt types](resources/mumble_crypt_types.png) While the TCP connection is mandatory the UDP connection can be compensated by tunnelling the UDP packets through the TCP connection as described in the protocol description later. diff --git a/docs/dev/network-protocol/protocol_stack_tcp.md b/docs/dev/network-protocol/protocol_stack_tcp.md index 8e2eef90f3b..f2ad61f69c0 100644 --- a/docs/dev/network-protocol/protocol_stack_tcp.md +++ b/docs/dev/network-protocol/protocol_stack_tcp.md @@ -5,13 +5,7 @@ uses Google's Protocol Buffers [#f1]_ with simple prefixing to distinguish the different kinds of packets sent through an TLSv1 encrypted connection. This makes the protocol very easily expandable. -.. _mumble-packet: - -.. figure:: resources/mumble_packet.png - :alt: Mumble packet - :align: center - - Mumble packet +![resources/mumble_packet.png](Mumble packet) The prefix consists out of the two bytes defining the type of the packet in the payload and 4 bytes stating the length of the payload in bytes From 809b7af0ca05a2897eda46d01178654617fe6357 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:20:27 +0100 Subject: [PATCH 04/14] DOCS(dev): Convert footnotes --- .../dev/network-protocol/establishing_connection.md | 8 +++----- docs/dev/network-protocol/overview.md | 12 +++++------- docs/dev/network-protocol/protocol_stack_tcp.md | 13 +++++-------- 3 files changed, 13 insertions(+), 20 deletions(-) diff --git a/docs/dev/network-protocol/establishing_connection.md b/docs/dev/network-protocol/establishing_connection.md index 763feb4e0b8..0c08c59bdbe 100644 --- a/docs/dev/network-protocol/establishing_connection.md +++ b/docs/dev/network-protocol/establishing_connection.md @@ -16,7 +16,7 @@ This however is not mandatory as you can connect to the server without providing a certificate. However the server must provide the client with its certificate and it is recommended that the client checks this. -![Mumble connection setup](resources/mumble_connection_setup.png) +![Mumble connection setup](resources/mumble_connection_setup.png) ## Version exchange @@ -215,7 +215,7 @@ a ServerSync message containing the session id of the clients session, the maximum bandwidth allowed on this server, the servers welcome text as well as the permissions the client has in the channel it ended up in. -For more information pease refer to the Mumble.proto file [#f1]_. +For more information pease refer to the Mumble.proto file[^1]. ## Ping @@ -223,6 +223,4 @@ If the client wishes to maintain the connection to the server it is required to ping the server. If the server does not receive a ping for 30 seconds it will disconnect the client. -.. rubric:: Footnotes - -.. [#f1] https://raw.github.com/mumble-voip/mumble/master/src/Mumble.proto \ No newline at end of file +[^1]: diff --git a/docs/dev/network-protocol/overview.md b/docs/dev/network-protocol/overview.md index fa37313a774..2454d197e6b 100644 --- a/docs/dev/network-protocol/overview.md +++ b/docs/dev/network-protocol/overview.md @@ -6,15 +6,13 @@ which is used to reliably transfer control data between the client and the server. The second one is a UDP connection which is used for unreliable, low latency transfer of voice data. -![Mumble system overview](resources/mumble_system_overview.png) +![Mumble system overview](resources/mumble_system_overview.png) -Both are protected by strong cryptography, this encryption is mandatory and cannot be disabled. The TCP control channel uses TLSv1 AES256-SHA [#f1]_ while the voice channel is encrypted with OCB-AES128 [#f2]_. +Both are protected by strong cryptography, this encryption is mandatory and cannot be disabled. The TCP control channel uses TLSv1 AES256-SHA[^1] while the voice channel is encrypted with OCB-AES128[^2]. -![Mumble crypt types](resources/mumble_crypt_types.png) +![Mumble crypt types](resources/mumble_crypt_types.png) While the TCP connection is mandatory the UDP connection can be compensated by tunnelling the UDP packets through the TCP connection as described in the protocol description later. -.. rubric:: Footnotes - -.. [#f1] http://en.wikipedia.org/wiki/Transport_Layer_Security -.. [#f2] http://www.cs.ucdavis.edu/~rogaway/ocb/ocb-back.htm \ No newline at end of file +[^1]: +[^2]: diff --git a/docs/dev/network-protocol/protocol_stack_tcp.md b/docs/dev/network-protocol/protocol_stack_tcp.md index f2ad61f69c0..d1be2c36ceb 100644 --- a/docs/dev/network-protocol/protocol_stack_tcp.md +++ b/docs/dev/network-protocol/protocol_stack_tcp.md @@ -1,11 +1,11 @@ # Protocol stack (TCP) Mumble has a shallow and easy to understand stack. Basically it -uses Google's Protocol Buffers [#f1]_ with simple prefixing to +uses Google's Protocol Buffers[^1] with simple prefixing to distinguish the different kinds of packets sent through an TLSv1 encrypted connection. This makes the protocol very easily expandable. -![resources/mumble_packet.png](Mumble packet) +![resources/mumble_packet.png](Mumble packet) The prefix consists out of the two bytes defining the type of the packet in the payload and 4 bytes stating the length of the payload in bytes @@ -72,10 +72,7 @@ If not mentioned otherwise all fields outside the protobuf encoding are big-endi | 25 | SuggestConfig | +---------+------------------------+ -For raw representation of each packet type see the attached Mumble.proto [#f2]_ file. +For raw representation of each packet type see the attached Mumble.proto [^2] file. - -.. rubric:: Footnotes - -.. [#f1] https://github.com/google/protobuf -.. [#f2] https://raw.github.com/mumble-voip/mumble/master/src/Mumble.proto +[^1]: +[^2]: From 94246a1345a1b58734fb32b9b73cacced2ceb6d0 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:29:08 +0100 Subject: [PATCH 05/14] DOCS(dev): Convert tables to code blocks --- .../establishing_connection.md | 243 +++++++++-------- .../network-protocol/protocol_stack_tcp.md | 115 ++++---- docs/dev/network-protocol/voice_data.md | 257 +++++++++--------- 3 files changed, 302 insertions(+), 313 deletions(-) diff --git a/docs/dev/network-protocol/establishing_connection.md b/docs/dev/network-protocol/establishing_connection.md index 0c08c59bdbe..b5b671c2d0b 100644 --- a/docs/dev/network-protocol/establishing_connection.md +++ b/docs/dev/network-protocol/establishing_connection.md @@ -23,51 +23,51 @@ its certificate and it is recommended that the client checks this. Once the TLS handshake is completed both sides should transmit their version information using the Version message. The message structure is described below. -.. table:: Version message - - +--------------------------------------+ - | Version | - +===========================+==========+ - | version | uint32 | - +---------------------------+----------+ - | release | string | - +---------------------------+----------+ - | os | string | - +---------------------------+----------+ - | os_version | string | - +---------------------------+----------+ +```text ++--------------------------------------+ +| Version | ++===========================+==========+ +| version | uint32 | ++---------------------------+----------+ +| release | string | ++---------------------------+----------+ +| os | string | ++---------------------------+----------+ +| os_version | string | ++---------------------------+----------+ +``` The version field is a combination of major, minor and patch version numbers (e.g. 1.2.0) so that major number takes two bytes and minor and patch numbers take one byte each. The release, os and os\_version fields are common strings containing additional information. -.. table:: Version field encoding (uint32) - - +---------------------------+----------+----------+ - | Major | Minor | Patch | - +===========================+==========+==========+ - | 2 bytes | 1 byte | 1 byte | - +---------------------------+----------+----------+ +```text ++---------------------------+----------+----------+ +| Major | Minor | Patch | ++===========================+==========+==========+ +| 2 bytes | 1 byte | 1 byte | ++---------------------------+----------+----------+ +``` The version information may be used as part of the *SuggestConfig* checks, which usually refer to the standard client versions. The major changes between these versions are listed in table below. The *release*, *os* and *os_version* information is not interpreted in any way at the moment. -.. table:: Mumble version differences - - +---------------+-------------------------------------------+ - | Version | Major changes | - +===============+===========================================+ - | 1.2.0 | CELT 0.7.0 codec support | - +---------------+-------------------------------------------+ - | 1.2.2 | CELT 0.7.1 codec support | - +---------------+-------------------------------------------+ - | 1.2.3 | CELT 0.11.0 codec | - +---------------+-------------------------------------------+ - | 1.2.4 | Opus codec support, SuggestConfig message | - +---------------+-------------------------------------------+ +```text ++---------------+-------------------------------------------+ +| Version | Major changes | ++===============+===========================================+ +| 1.2.0 | CELT 0.7.0 codec support | ++---------------+-------------------------------------------+ +| 1.2.2 | CELT 0.7.1 codec support | ++---------------+-------------------------------------------+ +| 1.2.3 | CELT 0.11.0 codec | ++---------------+-------------------------------------------+ +| 1.2.4 | Opus codec support, SuggestConfig message | ++---------------+-------------------------------------------+ +``` ## Authenticate @@ -76,17 +76,17 @@ The message structure is described in the figure below. This message may be sent after sending the version message. The client does not need to wait for the server version message. -.. table:: Authenticate message - - +-----------------------------------------------+ - | Authenticate | - +===========================+===================+ - | username | string | - +---------------------------+-------------------+ - | password | string | - +---------------------------+-------------------+ - | tokens | string | - +---------------------------+-------------------+ +```text ++-----------------------------------------------+ +| Authenticate | ++===========================+===================+ +| username | string | ++---------------------------+-------------------+ +| password | string | ++---------------------------+-------------------+ +| tokens | string | ++---------------------------+-------------------+ +``` The username and password are UTF-8 encoded strings. While the client is free to accept any username from the user the server is allowed to impose further restrictions. Furthermore @@ -109,17 +109,17 @@ the client. It contains the necessary cryptographic information for the OCB-AES1 encryption used in the UDP Voice channel. The packet is described in figure below. The encryption itself is described in a later section. -.. table:: CryptSetup message - - +-----------------------------------------------+ - | CryptSetup | - +===========================+===================+ - | key | bytes | - +---------------------------+-------------------+ - | client_nonce | bytes | - +---------------------------+-------------------+ - | server_nonce | bytes | - +---------------------------+-------------------+ +```text ++-----------------------------------------------+ +| CryptSetup | ++===========================+===================+ +| key | bytes | ++---------------------------+-------------------+ +| client_nonce | bytes | ++---------------------------+-------------------+ +| server_nonce | bytes | ++---------------------------+-------------------+ +``` ## Channel states @@ -130,30 +130,29 @@ picture of all the channels. Once the initial ChannelState has been transmitted for all channels the server updates the linked channels by sending new packets for these. The full structure of these ChannelState messages is shown below: -.. table:: ChannelState message - - +-----------------------------------------------+ - | ChannelState | - +===========================+===================+ - | channel_id | uint32 | - +---------------------------+-------------------+ - | parent | uint32 | - +---------------------------+-------------------+ - | name | string | - +---------------------------+-------------------+ - | links | repeated uint32 | - +---------------------------+-------------------+ - | description | string | - +---------------------------+-------------------+ - | links_add | repeated uint32 | - +---------------------------+-------------------+ - | links_remove | repeated uint32 | - +---------------------------+-------------------+ - | temporary | optional bool | - +---------------------------+-------------------+ - | position | optional int32 | - +---------------------------+-------------------+ - +```text ++-----------------------------------------------+ +| ChannelState | ++===========================+===================+ +| channel_id | uint32 | ++---------------------------+-------------------+ +| parent | uint32 | ++---------------------------+-------------------+ +| name | string | ++---------------------------+-------------------+ +| links | repeated uint32 | ++---------------------------+-------------------+ +| description | string | ++---------------------------+-------------------+ +| links_add | repeated uint32 | ++---------------------------+-------------------+ +| links_remove | repeated uint32 | ++---------------------------+-------------------+ +| temporary | optional bool | ++---------------------------+-------------------+ +| position | optional int32 | ++---------------------------+-------------------+ +``` *The server must send a ChannelState for the root channel identified with ID 0.* @@ -163,49 +162,49 @@ After the channels have been synchronized the server continues by listing the connected users. This is done by sending a UserState message for each user currently on the server, including the user that is currently connecting. -.. table:: UserState message - - +-----------------------------------------------+ - | UserState | - +===========================+===================+ - | session | uint32 | - +---------------------------+-------------------+ - | actor | uint32 | - +---------------------------+-------------------+ - | name | string | - +---------------------------+-------------------+ - | user_id | uint32 | - +---------------------------+-------------------+ - | channel_id | uint32 | - +---------------------------+-------------------+ - | mute | bool | - +---------------------------+-------------------+ - | deaf | bool | - +---------------------------+-------------------+ - | suppress | bool | - +---------------------------+-------------------+ - | self_mute | bool | - +---------------------------+-------------------+ - | self_deaf | bool | - +---------------------------+-------------------+ - | texture | bytes | - +---------------------------+-------------------+ - | plugin_context | bytes | - +---------------------------+-------------------+ - | plugin_identity | string | - +---------------------------+-------------------+ - | comment | string | - +---------------------------+-------------------+ - | hash | string | - +---------------------------+-------------------+ - | comment_hash | bytes | - +---------------------------+-------------------+ - | texture_hash | bytes | - +---------------------------+-------------------+ - | priority_speaker | bool | - +---------------------------+-------------------+ - | recording | bool | - +---------------------------+-------------------+ +```text ++-----------------------------------------------+ +| UserState | ++===========================+===================+ +| session | uint32 | ++---------------------------+-------------------+ +| actor | uint32 | ++---------------------------+-------------------+ +| name | string | ++---------------------------+-------------------+ +| user_id | uint32 | ++---------------------------+-------------------+ +| channel_id | uint32 | ++---------------------------+-------------------+ +| mute | bool | ++---------------------------+-------------------+ +| deaf | bool | ++---------------------------+-------------------+ +| suppress | bool | ++---------------------------+-------------------+ +| self_mute | bool | ++---------------------------+-------------------+ +| self_deaf | bool | ++---------------------------+-------------------+ +| texture | bytes | ++---------------------------+-------------------+ +| plugin_context | bytes | ++---------------------------+-------------------+ +| plugin_identity | string | ++---------------------------+-------------------+ +| comment | string | ++---------------------------+-------------------+ +| hash | string | ++---------------------------+-------------------+ +| comment_hash | bytes | ++---------------------------+-------------------+ +| texture_hash | bytes | ++---------------------------+-------------------+ +| priority_speaker | bool | ++---------------------------+-------------------+ +| recording | bool | ++---------------------------+-------------------+ +``` ## Server sync diff --git a/docs/dev/network-protocol/protocol_stack_tcp.md b/docs/dev/network-protocol/protocol_stack_tcp.md index d1be2c36ceb..ed5543e7552 100644 --- a/docs/dev/network-protocol/protocol_stack_tcp.md +++ b/docs/dev/network-protocol/protocol_stack_tcp.md @@ -13,64 +13,63 @@ followed by the payload itself. The following packet types are available in the current protocol and all but UDPTunnel are simple protobuf messages. If not mentioned otherwise all fields outside the protobuf encoding are big-endian. - -.. table:: Packet types - - +---------+------------------------+ - | Type | Payload | - +=========+========================+ - | 0 | Version | - +---------+------------------------+ - | 1 | UDPTunnel | - +---------+------------------------+ - | 2 | Authenticate | - +---------+------------------------+ - | 3 | Ping | - +---------+------------------------+ - | 4 | Reject | - +---------+------------------------+ - | 5 | ServerSync | - +---------+------------------------+ - | 6 | ChannelRemove | - +---------+------------------------+ - | 7 | ChannelState | - +---------+------------------------+ - | 8 | UserRemove | - +---------+------------------------+ - | 9 | UserState | - +---------+------------------------+ - | 10 | BanList | - +---------+------------------------+ - | 11 | TextMessage | - +---------+------------------------+ - | 12 | PermissionDenied | - +---------+------------------------+ - | 13 | ACL | - +---------+------------------------+ - | 14 | QueryUsers | - +---------+------------------------+ - | 15 | CryptSetup | - +---------+------------------------+ - | 16 | ContextActionModify | - +---------+------------------------+ - | 17 | ContextAction | - +---------+------------------------+ - | 18 | UserList | - +---------+------------------------+ - | 19 | VoiceTarget | - +---------+------------------------+ - | 20 | PermissionQuery | - +---------+------------------------+ - | 21 | CodecVersion | - +---------+------------------------+ - | 22 | UserStats | - +---------+------------------------+ - | 23 | RequestBlob | - +---------+------------------------+ - | 24 | ServerConfig | - +---------+------------------------+ - | 25 | SuggestConfig | - +---------+------------------------+ +```text ++---------+------------------------+ +| Type | Payload | ++=========+========================+ +| 0 | Version | ++---------+------------------------+ +| 1 | UDPTunnel | ++---------+------------------------+ +| 2 | Authenticate | ++---------+------------------------+ +| 3 | Ping | ++---------+------------------------+ +| 4 | Reject | ++---------+------------------------+ +| 5 | ServerSync | ++---------+------------------------+ +| 6 | ChannelRemove | ++---------+------------------------+ +| 7 | ChannelState | ++---------+------------------------+ +| 8 | UserRemove | ++---------+------------------------+ +| 9 | UserState | ++---------+------------------------+ +| 10 | BanList | ++---------+------------------------+ +| 11 | TextMessage | ++---------+------------------------+ +| 12 | PermissionDenied | ++---------+------------------------+ +| 13 | ACL | ++---------+------------------------+ +| 14 | QueryUsers | ++---------+------------------------+ +| 15 | CryptSetup | ++---------+------------------------+ +| 16 | ContextActionModify | ++---------+------------------------+ +| 17 | ContextAction | ++---------+------------------------+ +| 18 | UserList | ++---------+------------------------+ +| 19 | VoiceTarget | ++---------+------------------------+ +| 20 | PermissionQuery | ++---------+------------------------+ +| 21 | CodecVersion | ++---------+------------------------+ +| 22 | UserStats | ++---------+------------------------+ +| 23 | RequestBlob | ++---------+------------------------+ +| 24 | ServerConfig | ++---------+------------------------+ +| 25 | SuggestConfig | ++---------+------------------------+ +``` For raw representation of each packet type see the attached Mumble.proto [^2] file. diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index fd0e38f1361..89f98f501a4 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -20,19 +20,17 @@ whole audio data packet is 1020 bytes. This allows applications to use 1024 byte buffers for receiving UDP datagrams with the 4-byte encryption header overhead. -.. _Audio packet structure: -.. table:: Audio packet structure - :class: bits8 - - +-------------------------------+ - | Audio packet structure | - +===+===+===+===+===+===+===+===+ - | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | - +---+---+---+---+---+---+---+---+ - | ``type`` | ``target`` | - +-----------+-------------------+ - | Payload... | - +-------------------------------+ +```text ++-------------------------------+ +| Audio packet structure | ++===+===+===+===+===+===+===+===+ +| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | ++---+---+---+---+---+---+---+---+ +| ``type`` | ``target`` | ++-----------+-------------------+ +| Payload... | ++-------------------------------+ +``` type The audio packet type. The packets transmitted over the audio channel are @@ -40,24 +38,23 @@ type audio packets encoded with different codecs. Different types are listed in `Audio packet types`_ table. -.. _Audio packet types: -.. table:: Audio packet types - - +---------+---------------+--------------------------------------------+ - | Type | Bitfield | Description | - +=========+===============+============================================+ - | ``0`` | ``000xxxxx`` | CELT Alpha encoded voice data | - +---------+---------------+--------------------------------------------+ - | ``1`` | ``001xxxxx`` | Ping packet | - +---------+---------------+--------------------------------------------+ - | ``2`` | ``010xxxxx`` | Speex encoded voice data | - +---------+---------------+--------------------------------------------+ - | ``3`` | ``011xxxxx`` | CELT Beta encoded voice data | - +---------+---------------+--------------------------------------------+ - | ``4`` | ``100xxxxx`` | OPUS encoded voice data | - +---------+---------------+--------------------------------------------+ - | ``5-7`` | | Unused | - +---------+---------------+--------------------------------------------+ +```text ++---------+---------------+--------------------------------------------+ +| Type | Bitfield | Description | ++=========+===============+============================================+ +| ``0`` | ``000xxxxx`` | CELT Alpha encoded voice data | ++---------+---------------+--------------------------------------------+ +| ``1`` | ``001xxxxx`` | Ping packet | ++---------+---------------+--------------------------------------------+ +| ``2`` | ``010xxxxx`` | Speex encoded voice data | ++---------+---------------+--------------------------------------------+ +| ``3`` | ``011xxxxx`` | CELT Beta encoded voice data | ++---------+---------------+--------------------------------------------+ +| ``4`` | ``100xxxxx`` | OPUS encoded voice data | ++---------+---------------+--------------------------------------------+ +| ``5-7`` | | Unused | ++---------+---------------+--------------------------------------------+ +``` target The target portion defines the recipient for the audio data. The two constant @@ -72,22 +69,21 @@ target uses target 1 to specify the audio results from a whisper to a channel and target 2 to specify that the audio results from a direct whisper to the user. -.. _Audio targets: -.. table:: Audio targets - - +-----------+-----------------------------------------------------+ - | Target | Description | - +===========+=====================================================+ - | ``0`` | Normal talking | - +-----------+-----------------------------------------------------+ - | ``1-30`` | Whisper target | - | | | - | | - VoiceTarget ID when sending whisper from client. | - | | - 1 when receiving whisper to channel. | - | | - 2 when receiving direct whisper to user. | - +-----------+-----------------------------------------------------+ - | ``31`` | Server loopback | - +-----------+-----------------------------------------------------+ +```text ++-----------+-----------------------------------------------------+ +| Target | Description | ++===========+=====================================================+ +| ``0`` | Normal talking | ++-----------+-----------------------------------------------------+ +| ``1-30`` | Whisper target | +| | | +| | - VoiceTarget ID when sending whisper from client. | +| | - 1 when receiving whisper to channel. | +| | - 2 when receiving direct whisper to user. | ++-----------+-----------------------------------------------------+ +| ``31`` | Server loopback | ++-----------+-----------------------------------------------------+ +``` ### Ping packet @@ -96,17 +92,15 @@ audio transport layer. These packets contain only varint encoded timestamp as data. See `UDP connectivity checks`_ section below for the logic involved in the connectivity checks. -.. _Audio transport ping packet: - -.. table:: Audio transport ping packet - - +------------+-------------+----------------------------------+ - | Field | Type | Description | - +============+=============+==================================+ - | Header | ``byte`` | ``00100000b`` (``0x20``) | - +------------+-------------+----------------------------------+ - | Data | ``varint`` | Timestamp | - +------------+-------------+----------------------------------+ +```text ++------------+-------------+----------------------------------+ +| Field | Type | Description | ++============+=============+==================================+ +| Header | ``byte`` | ``00100000b`` (``0x20``) | ++------------+-------------+----------------------------------+ +| Data | ``varint`` | Timestamp | ++------------+-------------+----------------------------------+ +``` Header Common audio packet header. For ping packets this should have the value of @@ -132,38 +126,39 @@ codec of the whole audio packets. The audio segments contain codec implementation specific information on where the audio segments end so the possible positional audio data can be read from the end. -.. _Incoming encoded audio packet: -.. table:: Incoming encoded audio packet - - +--------------------+--------------+-----------------------------------------------------------+ - | Field | Type | Description | - +====================+==============+===========================================================+ - | Header | ``byte`` | Codec type/Audio target | - +--------------------+--------------+-----------------------------------------------------------+ - | Session ID | ``varint`` | Session ID of the source user. | - +--------------------+--------------+-----------------------------------------------------------+ - | Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | - +--------------------+--------------+-----------------------------------------------------------+ - | Payload | ``byte[]`` | Audio payload | - +--------------------+--------------+-----------------------------------------------------------+ - | Position Info | ``float[3]`` | Positional audio information | - +--------------------+--------------+-----------------------------------------------------------+ - - -.. _Outgoing encoded audio packet: -.. table:: Outgoing encoded audio packet - - +--------------------+--------------+-----------------------------------------------------------+ - | Field | Type | Description | - +====================+==============+===========================================================+ - | Header | ``byte`` | Codec type/Audio target | - +--------------------+--------------+-----------------------------------------------------------+ - | Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | - +--------------------+--------------+-----------------------------------------------------------+ - | Payload | ``byte[]`` | Audio payload | - +--------------------+--------------+-----------------------------------------------------------+ - | Position Info | ``float[3]`` | Positional audio information | - +--------------------+--------------+-----------------------------------------------------------+ +Incoming encoded audio packet: + +```text ++--------------------+--------------+-----------------------------------------------------------+ +| Field | Type | Description | ++====================+==============+===========================================================+ +| Header | ``byte`` | Codec type/Audio target | ++--------------------+--------------+-----------------------------------------------------------+ +| Session ID | ``varint`` | Session ID of the source user. | ++--------------------+--------------+-----------------------------------------------------------+ +| Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | ++--------------------+--------------+-----------------------------------------------------------+ +| Payload | ``byte[]`` | Audio payload | ++--------------------+--------------+-----------------------------------------------------------+ +| Position Info | ``float[3]`` | Positional audio information | ++--------------------+--------------+-----------------------------------------------------------+ +``` + +Outgoing encoded audio packet: + +```text ++--------------------+--------------+-----------------------------------------------------------+ +| Field | Type | Description | ++====================+==============+===========================================================+ +| Header | ``byte`` | Codec type/Audio target | ++--------------------+--------------+-----------------------------------------------------------+ +| Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | ++--------------------+--------------+-----------------------------------------------------------+ +| Payload | ``byte[]`` | Audio payload | ++--------------------+--------------+-----------------------------------------------------------+ +| Position Info | ``float[3]`` | Positional audio information | ++--------------------+--------------+-----------------------------------------------------------+ +``` Header The common audio packet header @@ -197,17 +192,15 @@ Position Info Encoded Speex and CELT audio is transported as individual encoded frames. Each frame is prefixed with a single byte length and terminator header. -.. _celt-encoded-audio-data: - -.. table:: CELT encoded audio data - - +---------+-------------+-----------------------------------------+ - | Field | Type | Description | - +=========+=============+=========================================+ - | Header | ``byte`` | length/continuation header | - +---------+-------------+-----------------------------------------+ - | Data | ``byte[]`` | Encoded voice frame | - +---------+-------------+-----------------------------------------+ +```text ++---------+-------------+-----------------------------------------+ +| Field | Type | Description | ++=========+=============+=========================================+ +| Header | ``byte`` | length/continuation header | ++---------+-------------+-----------------------------------------+ +| Data | ``byte[]`` | Encoded voice frame | ++---------+-------------+-----------------------------------------+ +``` Header The length of the Data field. The most significant bit (``0x80``) acts as the @@ -226,17 +219,15 @@ Data Encoded Opus audio is transported as a single Opus audio frame. The frame is prefixed with a variable byte header. -.. _opus-encoded-audio-data: - -.. table:: Opus encoded audio data - - +---------+-------------+-----------------------------------------+ - | Field | Type | Description | - +=========+=============+=========================================+ - | Header | ``varint`` | length/terminator header | - +---------+-------------+-----------------------------------------+ - | Data | ``byte[]`` | Encoded voice frame | - +---------+-------------+-----------------------------------------+ +```text ++---------+-------------+-----------------------------------------+ +| Field | Type | Description | ++=========+=============+=========================================+ +| Header | ``varint`` | length/terminator header | ++---------+-------------+-----------------------------------------+ +| Data | ``byte[]`` | Encoded voice frame | ++---------+-------------+-----------------------------------------+ +``` Header The length of the Data field. 16-bit variable length integer encoded length @@ -357,24 +348,24 @@ See the *quint64* shift operators in https://github.com/mumble-voip/mumble/blob/master/src/PacketDataStream.h for a reference implementation. -.. table:: Varint prefixes - - +----------------------------------+--------------------------------------------------------+ - | Encoded | Decoded | - +==================================+========================================================+ - | ``0xxxxxxx`` | 7-bit positive number | - +----------------------------------+--------------------------------------------------------+ - | ``10xxxxxx`` + 1 byte | 14-bit positive number | - +----------------------------------+--------------------------------------------------------+ - | ``110xxxxx`` + 2 bytes | 21-bit positive number | - +----------------------------------+--------------------------------------------------------+ - | ``1110xxxx`` + 3 bytes | 28-bit positive number | - +----------------------------------+--------------------------------------------------------+ - | ``111100__`` + ``int`` (32-bit) | 32-bit positive number | - +----------------------------------+--------------------------------------------------------+ - | ``111101__`` + ``long`` (64-bit) | 64-bit number | - +----------------------------------+--------------------------------------------------------+ - | ``111110__`` + ``varint`` | Negative recursive varint | - +----------------------------------+--------------------------------------------------------+ - | ``111111xx`` | Byte-inverted negative two bit number (``~xx``) | - +----------------------------------+--------------------------------------------------------+ +```text ++----------------------------------+--------------------------------------------------------+ +| Encoded | Decoded | ++==================================+========================================================+ +| ``0xxxxxxx`` | 7-bit positive number | ++----------------------------------+--------------------------------------------------------+ +| ``10xxxxxx`` + 1 byte | 14-bit positive number | ++----------------------------------+--------------------------------------------------------+ +| ``110xxxxx`` + 2 bytes | 21-bit positive number | ++----------------------------------+--------------------------------------------------------+ +| ``1110xxxx`` + 3 bytes | 28-bit positive number | ++----------------------------------+--------------------------------------------------------+ +| ``111100__`` + ``int`` (32-bit) | 32-bit positive number | ++----------------------------------+--------------------------------------------------------+ +| ``111101__`` + ``long`` (64-bit) | 64-bit number | ++----------------------------------+--------------------------------------------------------+ +| ``111110__`` + ``varint`` | Negative recursive varint | ++----------------------------------+--------------------------------------------------------+ +| ``111111xx`` | Byte-inverted negative two bit number (``~xx``) | ++----------------------------------+--------------------------------------------------------+ +``` From e844cefec65dfb80474eee88f3e8a8ac848ec530 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:30:03 +0100 Subject: [PATCH 06/14] DOCS(dev): Drop underline escape --- docs/dev/network-protocol/establishing_connection.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/dev/network-protocol/establishing_connection.md b/docs/dev/network-protocol/establishing_connection.md index b5b671c2d0b..f1774acd9eb 100644 --- a/docs/dev/network-protocol/establishing_connection.md +++ b/docs/dev/network-protocol/establishing_connection.md @@ -39,7 +39,7 @@ information using the Version message. The message structure is described below. The version field is a combination of major, minor and patch version numbers (e.g. 1.2.0) so that major number takes two bytes and minor and patch numbers take one byte each. -The release, os and os\_version +The release, os and os_version fields are common strings containing additional information. ```text From 4c16d55df677bc5e91247916f462ae03e7fbf8b0 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:32:41 +0100 Subject: [PATCH 07/14] DOCS(dev): Convert file links --- docs/dev/network-protocol/README.md | 8 ++++---- docs/dev/network-protocol/voice_data.md | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/dev/network-protocol/README.md b/docs/dev/network-protocol/README.md index d1cc83c4197..6918f988275 100644 --- a/docs/dev/network-protocol/README.md +++ b/docs/dev/network-protocol/README.md @@ -5,7 +5,7 @@ Mumble VoIP 1.2.X server-client communication protocol. It reflects the state of the protocol implemented in the Mumble 1.2.8 client and might be outdated by the time you are reading this. -* `Overview `_ -* `Protocol Stack (TCP) `_ -* `Establishing a Connection `_ -* `Voice Data `_ +* [Overview](overview.md) +* [Protocol Stack (TCP)](protocol_stack_tcp.md) +* [Establishing a Connection](establishing_connection.md) +* [Voice Data](voice_data.md) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index 89f98f501a4..ecf3e9001ca 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -302,7 +302,7 @@ can be switched back to it. If the UDP channel isn't available the voice packets can be transmitted through the TCP transport used for the control channel. These messages use the normal -TCP prefixing, as shown in `Protocol Stack TCP `_: 16-bit message type +TCP prefixing, as shown in [Protocol Stack TCP](protocol_stack_tcp.md): 16-bit message type followed by 32-bit message length. However unlike other TCP messages, the audio packets are not encoded as protocol buffer messages but instead the raw audio packet described in `Packet format`_ should be written to the TCP socket From 79a82f0a6857b56bf06d8d5dcefeaaafb8574749 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:34:58 +0100 Subject: [PATCH 08/14] DOCS(dev): Replace references in text --- docs/dev/network-protocol/voice_data.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index ecf3e9001ca..736ec83acdb 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -6,7 +6,7 @@ Mumble audio channel is used to transmit the actual audio packets over the network. Unlike the TCP control channel, the audio channel uses a custom encoding for the audio packets. The audio channel is transport independent and features such as encryption are implemented by the transport layer. Integers -above 8-bits are encoded using the `Variable length integer encoding`_. +above 8-bits are encoded using the *Variable length integer encoding*. .. _packet-format: @@ -36,7 +36,7 @@ type The audio packet type. The packets transmitted over the audio channel are either ping packets used to diagnose the transport layer connectivity or audio packets encoded with different codecs. Different types are listed in - `Audio packet types`_ table. + *Audio packet types* table. ```text +---------+---------------+--------------------------------------------+ @@ -61,7 +61,7 @@ target targets are *Normal talking* (``0``) and *Server Loopback* (``31``). The range 1-30 is reserved for whisper targets. These targets are specified separately in the control channel using the ``VoiceTarget`` packets. The - targets are listed in `Audio targets`_ table. + targets are listed in *Audio targets* table. When a client registers a VoiceTarget on the server, it gives the target an ID. This voice target ID can be used as a target in the voice packets to send @@ -89,7 +89,7 @@ target Audio channel ping packets are used as part of the connectivity checks on the audio transport layer. These packets contain only varint encoded timestamp as -data. See `UDP connectivity checks`_ section below for the logic involved in +data. See *UDP connectivity checks* section below for the logic involved in the connectivity checks. ```text @@ -283,7 +283,7 @@ Since UDP is a connectionless protocol, it is heavily affected by network topology such as NAT configuration. It should not be used for audio transmission before the connectivity has been determined. -The client starts the connectivity checks by sending a `Ping packet`_ to the +The client starts the connectivity checks by sending a *Ping packet* to the server. When the server receives this packet it will respond by echoing it back to the address it received it from. Once the client receives the response from the server it can start using the UDP transport for audio data. When the server @@ -292,7 +292,7 @@ audio over to UDP transport as well. If the client stops receiving replies to the UDP pings at some point, it should start tunneling the voice communication through the TCP tunnel as described in -the `Tunneling audio over TCP`_ below. When the server receives a tunneled +the *Tunneling audio over TCP* below. When the server receives a tunneled packet over the TCP connection it must also stop using the UDP for communication. The client should still continue sending audio ping packets over the UDP transport in case the UDP connection is restored and the communication @@ -305,7 +305,7 @@ the TCP transport used for the control channel. These messages use the normal TCP prefixing, as shown in [Protocol Stack TCP](protocol_stack_tcp.md): 16-bit message type followed by 32-bit message length. However unlike other TCP messages, the audio packets are not encoded as protocol buffer messages but instead the raw audio -packet described in `Packet format`_ should be written to the TCP socket +packet described in *Packet format* should be written to the TCP socket verbatim. When the packets are received it is safe to parse the type and length fields From 02b3d6bf15037f0701ec6889e629900710e70fad Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:35:45 +0100 Subject: [PATCH 09/14] DOCS(dev): Drop rst ref defs --- docs/dev/network-protocol/voice_data.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index 736ec83acdb..e417b5897f3 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -1,5 +1,3 @@ -.. _voice-data: - # Voice data Mumble audio channel is used to transmit the actual audio packets over the @@ -8,8 +6,6 @@ encoding for the audio packets. The audio channel is transport independent and features such as encryption are implemented by the transport layer. Integers above 8-bits are encoded using the *Variable length integer encoding*. -.. _packet-format: - ## Packet format The mumble audio channel packets are variable length packets that begin with an From bebb473d5f133f1b2dc62d9caa741cb0b0ca910d Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:44:22 +0100 Subject: [PATCH 10/14] DOCS(dev): Replace in-table code fencing --- docs/dev/network-protocol/voice_data.md | 176 ++++++++++++------------ 1 file changed, 88 insertions(+), 88 deletions(-) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index e417b5897f3..9e198e81943 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -22,7 +22,7 @@ overhead. +===+===+===+===+===+===+===+===+ | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | +---+---+---+---+---+---+---+---+ -| ``type`` | ``target`` | +| type | target | +-----------+-------------------+ | Payload... | +-------------------------------+ @@ -35,21 +35,21 @@ type *Audio packet types* table. ```text -+---------+---------------+--------------------------------------------+ -| Type | Bitfield | Description | -+=========+===============+============================================+ -| ``0`` | ``000xxxxx`` | CELT Alpha encoded voice data | -+---------+---------------+--------------------------------------------+ -| ``1`` | ``001xxxxx`` | Ping packet | -+---------+---------------+--------------------------------------------+ -| ``2`` | ``010xxxxx`` | Speex encoded voice data | -+---------+---------------+--------------------------------------------+ -| ``3`` | ``011xxxxx`` | CELT Beta encoded voice data | -+---------+---------------+--------------------------------------------+ -| ``4`` | ``100xxxxx`` | OPUS encoded voice data | -+---------+---------------+--------------------------------------------+ -| ``5-7`` | | Unused | -+---------+---------------+--------------------------------------------+ ++------+----------+-------------------------------+ +| Type | Bitfield | Description | ++======+==========+===============================+ +| 0 | 000xxxxx | CELT Alpha encoded voice data | ++------+----------+-------------------------------+ +| 1 | 001xxxxx | Ping packet | ++------+----------+-------------------------------+ +| 2 | 010xxxxx | Speex encoded voice data | ++------+----------+-------------------------------+ +| 3 | 011xxxxx | CELT Beta encoded voice data | ++------+----------+-------------------------------+ +| 4 | 100xxxxx | OPUS encoded voice data | ++------+----------+-------------------------------+ +| 5-7 | | Unused | ++------+----------+-------------------------------+ ``` target @@ -66,19 +66,19 @@ target target 2 to specify that the audio results from a direct whisper to the user. ```text -+-----------+-----------------------------------------------------+ -| Target | Description | -+===========+=====================================================+ -| ``0`` | Normal talking | -+-----------+-----------------------------------------------------+ -| ``1-30`` | Whisper target | -| | | -| | - VoiceTarget ID when sending whisper from client. | -| | - 1 when receiving whisper to channel. | -| | - 2 when receiving direct whisper to user. | -+-----------+-----------------------------------------------------+ -| ``31`` | Server loopback | -+-----------+-----------------------------------------------------+ ++--------+-----------------------------------------------------+ +| Target | Description | ++========+=====================================================+ +| 0 | Normal talking | ++--------+-----------------------------------------------------+ +| 1-30 | Whisper target | +| | | +| | - VoiceTarget ID when sending whisper from client. | +| | - 1 when receiving whisper to channel. | +| | - 2 when receiving direct whisper to user. | ++--------+-----------------------------------------------------+ +| 31 | Server loopback | ++--------+-----------------------------------------------------+ ``` ### Ping packet @@ -89,13 +89,13 @@ data. See *UDP connectivity checks* section below for the logic involved in the connectivity checks. ```text -+------------+-------------+----------------------------------+ -| Field | Type | Description | -+============+=============+==================================+ -| Header | ``byte`` | ``00100000b`` (``0x20``) | -+------------+-------------+----------------------------------+ -| Data | ``varint`` | Timestamp | -+------------+-------------+----------------------------------+ ++--------+--------+------------------+ +| Field | Type | Description | ++========+========+==================+ +| Header | byte | 00100000b (0x20) | ++--------+--------+------------------+ +| Data | varint | Timestamp | ++--------+--------+------------------+ ``` Header @@ -125,35 +125,35 @@ possible positional audio data can be read from the end. Incoming encoded audio packet: ```text -+--------------------+--------------+-----------------------------------------------------------+ -| Field | Type | Description | -+====================+==============+===========================================================+ -| Header | ``byte`` | Codec type/Audio target | -+--------------------+--------------+-----------------------------------------------------------+ -| Session ID | ``varint`` | Session ID of the source user. | -+--------------------+--------------+-----------------------------------------------------------+ -| Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | -+--------------------+--------------+-----------------------------------------------------------+ -| Payload | ``byte[]`` | Audio payload | -+--------------------+--------------+-----------------------------------------------------------+ -| Position Info | ``float[3]`` | Positional audio information | -+--------------------+--------------+-----------------------------------------------------------+ ++--------------------+------------+-----------------------------------------------------------+ +| Field | Type | Description | ++====================+============+===========================================================+ +| Header | byte | Codec type/Audio target | ++--------------------+------------+-----------------------------------------------------------+ +| Session ID | varint | Session ID of the source user. | ++--------------------+------------+-----------------------------------------------------------+ +| Sequence Number | varint | Sequence number of the first audio data **segment**. | ++--------------------+------------+-----------------------------------------------------------+ +| Payload | byte[] | Audio payload | ++--------------------+------------+-----------------------------------------------------------+ +| Position Info | float[3] | Positional audio information | ++--------------------+------------+-----------------------------------------------------------+ ``` Outgoing encoded audio packet: ```text -+--------------------+--------------+-----------------------------------------------------------+ -| Field | Type | Description | -+====================+==============+===========================================================+ -| Header | ``byte`` | Codec type/Audio target | -+--------------------+--------------+-----------------------------------------------------------+ -| Sequence Number | ``varint`` | Sequence number of the first audio data **segment**. | -+--------------------+--------------+-----------------------------------------------------------+ -| Payload | ``byte[]`` | Audio payload | -+--------------------+--------------+-----------------------------------------------------------+ -| Position Info | ``float[3]`` | Positional audio information | -+--------------------+--------------+-----------------------------------------------------------+ ++--------------------+------------+-----------------------------------------------------------+ +| Field | Type | Description | ++====================+============+===========================================================+ +| Header | byte | Codec type/Audio target | ++--------------------+------------+-----------------------------------------------------------+ +| Sequence Number | varint | Sequence number of the first audio data **segment**. | ++--------------------+------------+-----------------------------------------------------------+ +| Payload | byte[] | Audio payload | ++--------------------+------------+-----------------------------------------------------------+ +| Position Info | float[3] | Positional audio information | ++--------------------+------------+-----------------------------------------------------------+ ``` Header @@ -189,13 +189,13 @@ Encoded Speex and CELT audio is transported as individual encoded frames. Each frame is prefixed with a single byte length and terminator header. ```text -+---------+-------------+-----------------------------------------+ -| Field | Type | Description | -+=========+=============+=========================================+ -| Header | ``byte`` | length/continuation header | -+---------+-------------+-----------------------------------------+ -| Data | ``byte[]`` | Encoded voice frame | -+---------+-------------+-----------------------------------------+ ++---------+-----------+-----------------------------------------+ +| Field | Type | Description | ++=========+===========+=========================================+ +| Header | byte | length/continuation header | ++---------+-----------+-----------------------------------------+ +| Data | byte[] | Encoded voice frame | ++---------+-----------+-----------------------------------------+ ``` Header @@ -219,9 +219,9 @@ Encoded Opus audio is transported as a single Opus audio frame. The frame is pre +---------+-------------+-----------------------------------------+ | Field | Type | Description | +=========+=============+=========================================+ -| Header | ``varint`` | length/terminator header | +| Header | varint | length/terminator header | +---------+-------------+-----------------------------------------+ -| Data | ``byte[]`` | Encoded voice frame | +| Data | byte[] | Encoded voice frame | +---------+-------------+-----------------------------------------+ ``` @@ -345,23 +345,23 @@ https://github.com/mumble-voip/mumble/blob/master/src/PacketDataStream.h for a reference implementation. ```text -+----------------------------------+--------------------------------------------------------+ -| Encoded | Decoded | -+==================================+========================================================+ -| ``0xxxxxxx`` | 7-bit positive number | -+----------------------------------+--------------------------------------------------------+ -| ``10xxxxxx`` + 1 byte | 14-bit positive number | -+----------------------------------+--------------------------------------------------------+ -| ``110xxxxx`` + 2 bytes | 21-bit positive number | -+----------------------------------+--------------------------------------------------------+ -| ``1110xxxx`` + 3 bytes | 28-bit positive number | -+----------------------------------+--------------------------------------------------------+ -| ``111100__`` + ``int`` (32-bit) | 32-bit positive number | -+----------------------------------+--------------------------------------------------------+ -| ``111101__`` + ``long`` (64-bit) | 64-bit number | -+----------------------------------+--------------------------------------------------------+ -| ``111110__`` + ``varint`` | Negative recursive varint | -+----------------------------------+--------------------------------------------------------+ -| ``111111xx`` | Byte-inverted negative two bit number (``~xx``) | -+----------------------------------+--------------------------------------------------------+ ++--------------------------+---------------------------------------------+ +| Encoded | Decoded | ++==========================+=============================================+ +| 0xxxxxxx | 7-bit positive number | ++--------------------------+---------------------------------------------+ +| 10xxxxxx + 1 byte | 14-bit positive number | ++--------------------------+---------------------------------------------+ +| 110xxxxx + 2 bytes | 21-bit positive number | ++--------------------------+---------------------------------------------+ +| 1110xxxx + 3 bytes | 28-bit positive number | ++--------------------------+---------------------------------------------+ +| 111100__ + int (32-bit) | 32-bit positive number | ++--------------------------+---------------------------------------------+ +| 111101__ + long (64-bit) | 64-bit number | ++--------------------------+---------------------------------------------+ +| 111110__ + varint | Negative recursive varint | ++--------------------------+---------------------------------------------+ +| 111111xx | Byte-inverted negative two bit number (~xx) | ++--------------------------+---------------------------------------------+ ``` From 2096d5602283cf6d70331aff38790bc5f1bf0649 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:45:32 +0100 Subject: [PATCH 11/14] DOCS(dev): Convert text code fencing --- docs/dev/network-protocol/voice_data.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index 9e198e81943..66d7de27646 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -54,9 +54,9 @@ type target The target portion defines the recipient for the audio data. The two constant - targets are *Normal talking* (``0``) and *Server Loopback* (``31``). The + targets are *Normal talking* (`0`) and *Server Loopback* (`31`). The range 1-30 is reserved for whisper targets. These targets are specified - separately in the control channel using the ``VoiceTarget`` packets. The + separately in the control channel using the `VoiceTarget` packets. The targets are listed in *Audio targets* table. When a client registers a VoiceTarget on the server, it gives the target an @@ -180,7 +180,7 @@ Payload Position Info The XYZ coordinates of the audio source. In addition to sending the position information, the user must be using a positional plugin defined in the - ``UserState`` message. The plugins might define different contexts which + `UserState` message. The plugins might define different contexts which prevent voice communication between users in other contexts. #### Speex and CELT audio frames @@ -199,7 +199,7 @@ frame is prefixed with a single byte length and terminator header. ``` Header - The length of the Data field. The most significant bit (``0x80``) acts as the + The length of the Data field. The most significant bit (`0x80`) acts as the continuation bit and is set for all but the last frame in the payload. The remaining 7 bits of the header contain the actual length of the Data frame. @@ -208,7 +208,7 @@ Header interpreted normally as length of 0 with no continuation bit set. Data - Single encoded audio frame. The encoding depends on the codec ``type`` header + Single encoded audio frame. The encoding depends on the codec `type` header of the whole audio packet #### Opus audio frames @@ -230,8 +230,8 @@ Header and terminator bit value. The varint encoding is the same as with 64-bit values, but only 16-bit unencoded values are allowed. - The maximum voice frame size is 8191 (``0x1FFF``) bytes requiring the 13 least - significant bits of the header. The 14th bit (mask: ``0x2000``) is the terminator + The maximum voice frame size is 8191 (`0x1FFF`) bytes requiring the 13 least + significant bits of the header. The 14th bit (mask: `0x2000`) is the terminator bit which signals whether the packet is the last one in the voice transmission. @@ -327,15 +327,15 @@ the OCB-AES128 encryption. ## Variable length integer encoding -The variable length integer encoding (``varint``) is used to encode long, +The variable length integer encoding (`varint`) is used to encode long, 64-bit, integers so that short values do not need the full 8 bytes to be transferred. The basic idea behind the encoding is prefixing the value with a length prefix and then removing the leading zeroes from the value. The positive numbers are always right justified. That is to say that the least significant bit in the encoded presentation matches the least significant bit in the decoded presentation. The *varint prefixes* table contains the definitions of -the different length prefixes. The encoded ``x`` bits are part of the decoded -number while the ``_`` signifies a unused bit. Encoding should be done by +the different length prefixes. The encoded `x` bits are part of the decoded +number while the `_` signifies a unused bit. Encoding should be done by searching the first decoded description that fits the number that should be decoded, truncating it to the required bytes and combining it with the defined encoding prefix. From 6c5615ba9974febbf4c835ab6faf67c0e094d901 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:46:01 +0100 Subject: [PATCH 12/14] DOCS(dev): Mark up URL --- docs/dev/network-protocol/voice_data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index 66d7de27646..ad1e771408e 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -341,7 +341,7 @@ decoded, truncating it to the required bytes and combining it with the defined encoding prefix. See the *quint64* shift operators in -https://github.com/mumble-voip/mumble/blob/master/src/PacketDataStream.h + for a reference implementation. ```text From 6247013cc2948caa1a7761cb7b411ba58a8be41a Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 13:50:24 +0100 Subject: [PATCH 13/14] DOCS(dev): Format field text definitions --- docs/dev/network-protocol/voice_data.md | 26 ++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index ad1e771408e..364bee7892d 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -28,7 +28,7 @@ overhead. +-------------------------------+ ``` -type +`type`: The audio packet type. The packets transmitted over the audio channel are either ping packets used to diagnose the transport layer connectivity or audio packets encoded with different codecs. Different types are listed in @@ -52,7 +52,7 @@ type +------+----------+-------------------------------+ ``` -target +`target`: The target portion defines the recipient for the audio data. The two constant targets are *Normal talking* (`0`) and *Server Loopback* (`31`). The range 1-30 is reserved for whisper targets. These targets are specified @@ -98,11 +98,11 @@ the connectivity checks. +--------+--------+------------------+ ``` -Header +`Header`: Common audio packet header. For ping packets this should have the value of 0x20. -Data +`Data`: Timestamp. The packet should be echoed back so the timestamp format can be decided by the original sender - the only limitation is that it must fit in a 64-bit integer for the varint encoding. @@ -156,13 +156,13 @@ Outgoing encoded audio packet: +--------------------+------------+-----------------------------------------------------------+ ``` -Header +`Header`: The common audio packet header -Session ID +`Session ID`: Session ID of the user to whom the audio packet belongs. -Sequence Number +`Sequence Number`: Audio data sequence number. The sequence number is used to maintain the packet order when the audio data is transported over unreliable transports such as UDP. @@ -172,12 +172,12 @@ Sequence Number allows the packet loss concealment algorithms to figure out how many audio frames were lost between two received packets. -Payload +`Payload`: Audio payload. Format depends on the audio codec defined in the Header. The payload must be self-delimiting to determine whether the position info exists at the end of the packet. -Position Info +`Position Info`: The XYZ coordinates of the audio source. In addition to sending the position information, the user must be using a positional plugin defined in the `UserState` message. The plugins might define different contexts which @@ -198,7 +198,7 @@ frame is prefixed with a single byte length and terminator header. +---------+-----------+-----------------------------------------+ ``` -Header +`Header`: The length of the Data field. The most significant bit (`0x80`) acts as the continuation bit and is set for all but the last frame in the payload. The remaining 7 bits of the header contain the actual length of the Data frame. @@ -207,7 +207,7 @@ Header transmission. In this case the audio data is a single zero-byte which can be interpreted normally as length of 0 with no continuation bit set. -Data +`Data`: Single encoded audio frame. The encoding depends on the codec `type` header of the whole audio packet @@ -225,7 +225,7 @@ Encoded Opus audio is transported as a single Opus audio frame. The frame is pre +---------+-------------+-----------------------------------------+ ``` -Header +`Header`: The length of the Data field. 16-bit variable length integer encoded length and terminator bit value. The varint encoding is the same as with 64-bit values, but only 16-bit unencoded values are allowed. @@ -241,7 +241,7 @@ Header zero-byte CELT packet while in Opus we have a dedicated termination bit in the header. -Data +`Data`: The encoded Opus data. ## Codecs From b50ead336edf471f05bb84f7d2c4c3b1e725a063 Mon Sep 17 00:00:00 2001 From: Jan Klass Date: Sun, 26 Jan 2025 14:07:15 +0100 Subject: [PATCH 14/14] DOCS(dev): Convert tables (where possible) Two tables are not converted because they are not flat tables but contain "complex" content/structure. --- .../establishing_connection.md | 178 ++++++------------ .../network-protocol/protocol_stack_tcp.md | 85 +++------ docs/dev/network-protocol/voice_data.md | 136 +++++-------- 3 files changed, 128 insertions(+), 271 deletions(-) diff --git a/docs/dev/network-protocol/establishing_connection.md b/docs/dev/network-protocol/establishing_connection.md index f1774acd9eb..b9eab7ca0f3 100644 --- a/docs/dev/network-protocol/establishing_connection.md +++ b/docs/dev/network-protocol/establishing_connection.md @@ -23,51 +23,33 @@ its certificate and it is recommended that the client checks this. Once the TLS handshake is completed both sides should transmit their version information using the Version message. The message structure is described below. -```text -+--------------------------------------+ -| Version | -+===========================+==========+ -| version | uint32 | -+---------------------------+----------+ -| release | string | -+---------------------------+----------+ -| os | string | -+---------------------------+----------+ -| os_version | string | -+---------------------------+----------+ -``` +| Field | Type | +| ------------ | -------- | +| `version` | `uint32` | +| `release` | `string` | +| `os` | `string` | +| `os_version` | `string` | The version field is a combination of major, minor and patch version numbers (e.g. 1.2.0) so that major number takes two bytes and minor and patch numbers take one byte each. The release, os and os_version fields are common strings containing additional information. -```text -+---------------------------+----------+----------+ -| Major | Minor | Patch | -+===========================+==========+==========+ -| 2 bytes | 1 byte | 1 byte | -+---------------------------+----------+----------+ -``` +| Major | Minor | Patch | +| ------- | ------ | ------ | +| 2 bytes | 1 byte | 1 byte | The version information may be used as part of the *SuggestConfig* checks, which usually refer to the standard client versions. The major changes between these versions are listed in table below. The *release*, *os* and *os_version* information is not interpreted in any way at the moment. -```text -+---------------+-------------------------------------------+ -| Version | Major changes | -+===============+===========================================+ -| 1.2.0 | CELT 0.7.0 codec support | -+---------------+-------------------------------------------+ -| 1.2.2 | CELT 0.7.1 codec support | -+---------------+-------------------------------------------+ -| 1.2.3 | CELT 0.11.0 codec | -+---------------+-------------------------------------------+ -| 1.2.4 | Opus codec support, SuggestConfig message | -+---------------+-------------------------------------------+ -``` +| Version | Major changes | +| ------- | ----------------------------------------- | +| 1.2.0 | CELT 0.7.0 codec support | +| 1.2.2 | CELT 0.7.1 codec support | +| 1.2.3 | CELT 0.11.0 codec | +| 1.2.4 | Opus codec support, SuggestConfig message | ## Authenticate @@ -76,17 +58,11 @@ The message structure is described in the figure below. This message may be sent after sending the version message. The client does not need to wait for the server version message. -```text -+-----------------------------------------------+ -| Authenticate | -+===========================+===================+ -| username | string | -+---------------------------+-------------------+ -| password | string | -+---------------------------+-------------------+ -| tokens | string | -+---------------------------+-------------------+ -``` +| Field | Type | +| ---------- | -------- | +| `username` | `string` | +| `password` | `string` | +| `tokens` | `string` | The username and password are UTF-8 encoded strings. While the client is free to accept any username from the user the server is allowed to impose further restrictions. Furthermore @@ -109,17 +85,11 @@ the client. It contains the necessary cryptographic information for the OCB-AES1 encryption used in the UDP Voice channel. The packet is described in figure below. The encryption itself is described in a later section. -```text -+-----------------------------------------------+ -| CryptSetup | -+===========================+===================+ -| key | bytes | -+---------------------------+-------------------+ -| client_nonce | bytes | -+---------------------------+-------------------+ -| server_nonce | bytes | -+---------------------------+-------------------+ -``` +| Field | Type | +| -------------- | ----- | +| `key` | bytes | +| `client_nonce` | bytes | +| `server_nonce` | bytes | ## Channel states @@ -130,29 +100,17 @@ picture of all the channels. Once the initial ChannelState has been transmitted for all channels the server updates the linked channels by sending new packets for these. The full structure of these ChannelState messages is shown below: -```text -+-----------------------------------------------+ -| ChannelState | -+===========================+===================+ -| channel_id | uint32 | -+---------------------------+-------------------+ -| parent | uint32 | -+---------------------------+-------------------+ -| name | string | -+---------------------------+-------------------+ -| links | repeated uint32 | -+---------------------------+-------------------+ -| description | string | -+---------------------------+-------------------+ -| links_add | repeated uint32 | -+---------------------------+-------------------+ -| links_remove | repeated uint32 | -+---------------------------+-------------------+ -| temporary | optional bool | -+---------------------------+-------------------+ -| position | optional int32 | -+---------------------------+-------------------+ -``` +| Field | Type | +| -------------- | ----------------- | +| `channel_id` | `uint32` | +| `parent` | `uint32` | +| `name` | `string` | +| `links` | repeated `uint32` | +| `description` | `string` | +| `links_add` | repeated `uint32` | +| `links_remove` | repeated `uint32` | +| `temporary` | optional `bool` | +| `position` | optional `int32` | *The server must send a ChannelState for the root channel identified with ID 0.* @@ -162,49 +120,27 @@ After the channels have been synchronized the server continues by listing the connected users. This is done by sending a UserState message for each user currently on the server, including the user that is currently connecting. -```text -+-----------------------------------------------+ -| UserState | -+===========================+===================+ -| session | uint32 | -+---------------------------+-------------------+ -| actor | uint32 | -+---------------------------+-------------------+ -| name | string | -+---------------------------+-------------------+ -| user_id | uint32 | -+---------------------------+-------------------+ -| channel_id | uint32 | -+---------------------------+-------------------+ -| mute | bool | -+---------------------------+-------------------+ -| deaf | bool | -+---------------------------+-------------------+ -| suppress | bool | -+---------------------------+-------------------+ -| self_mute | bool | -+---------------------------+-------------------+ -| self_deaf | bool | -+---------------------------+-------------------+ -| texture | bytes | -+---------------------------+-------------------+ -| plugin_context | bytes | -+---------------------------+-------------------+ -| plugin_identity | string | -+---------------------------+-------------------+ -| comment | string | -+---------------------------+-------------------+ -| hash | string | -+---------------------------+-------------------+ -| comment_hash | bytes | -+---------------------------+-------------------+ -| texture_hash | bytes | -+---------------------------+-------------------+ -| priority_speaker | bool | -+---------------------------+-------------------+ -| recording | bool | -+---------------------------+-------------------+ -``` +| Field | Type | +| ------------------ | -------- | +| `session` | `uint32` | +| `actor` | `uint32` | +| `name` | `string` | +| `user_id` | `uint32` | +| `channel_id` | `uint32` | +| `mute` | `bool` | +| `deaf` | `bool` | +| `suppress` | `bool` | +| `self_mute` | `bool` | +| `self_deaf` | `bool` | +| `texture` | `bytes` | +| `plugin_context` | `bytes` | +| `plugin_identity` | `string` | +| `comment` | `string` | +| `hash` | `string` | +| `comment_hash` | `bytes` | +| `texture_hash` | `bytes` | +| `priority_speaker` | `bool` | +| `recording` | `bool` | ## Server sync diff --git a/docs/dev/network-protocol/protocol_stack_tcp.md b/docs/dev/network-protocol/protocol_stack_tcp.md index ed5543e7552..f4694957d3c 100644 --- a/docs/dev/network-protocol/protocol_stack_tcp.md +++ b/docs/dev/network-protocol/protocol_stack_tcp.md @@ -13,63 +13,34 @@ followed by the payload itself. The following packet types are available in the current protocol and all but UDPTunnel are simple protobuf messages. If not mentioned otherwise all fields outside the protobuf encoding are big-endian. -```text -+---------+------------------------+ -| Type | Payload | -+=========+========================+ -| 0 | Version | -+---------+------------------------+ -| 1 | UDPTunnel | -+---------+------------------------+ -| 2 | Authenticate | -+---------+------------------------+ -| 3 | Ping | -+---------+------------------------+ -| 4 | Reject | -+---------+------------------------+ -| 5 | ServerSync | -+---------+------------------------+ -| 6 | ChannelRemove | -+---------+------------------------+ -| 7 | ChannelState | -+---------+------------------------+ -| 8 | UserRemove | -+---------+------------------------+ -| 9 | UserState | -+---------+------------------------+ -| 10 | BanList | -+---------+------------------------+ -| 11 | TextMessage | -+---------+------------------------+ -| 12 | PermissionDenied | -+---------+------------------------+ -| 13 | ACL | -+---------+------------------------+ -| 14 | QueryUsers | -+---------+------------------------+ -| 15 | CryptSetup | -+---------+------------------------+ -| 16 | ContextActionModify | -+---------+------------------------+ -| 17 | ContextAction | -+---------+------------------------+ -| 18 | UserList | -+---------+------------------------+ -| 19 | VoiceTarget | -+---------+------------------------+ -| 20 | PermissionQuery | -+---------+------------------------+ -| 21 | CodecVersion | -+---------+------------------------+ -| 22 | UserStats | -+---------+------------------------+ -| 23 | RequestBlob | -+---------+------------------------+ -| 24 | ServerConfig | -+---------+------------------------+ -| 25 | SuggestConfig | -+---------+------------------------+ -``` +| Type | Payload | +| ---- | ------------------- | +| `0` | Version | +| `1` | UDPTunnel | +| `2` | Authenticate | +| `3` | Ping | +| `4` | Reject | +| `5` | ServerSync | +| `6` | ChannelRemove | +| `7` | ChannelState | +| `8` | UserRemove | +| `9` | UserState | +| `10` | BanList | +| `11` | TextMessage | +| `12` | PermissionDenied | +| `13` | ACL | +| `14` | QueryUsers | +| `15` | CryptSetup | +| `16` | ContextActionModify | +| `17` | ContextAction | +| `18` | UserList | +| `19` | VoiceTarget | +| `20` | PermissionQuery | +| `21` | CodecVersion | +| `22` | UserStats | +| `23` | RequestBlob | +| `24` | ServerConfig | +| `25` | SuggestConfig | For raw representation of each packet type see the attached Mumble.proto [^2] file. diff --git a/docs/dev/network-protocol/voice_data.md b/docs/dev/network-protocol/voice_data.md index 364bee7892d..a967c13f1c1 100644 --- a/docs/dev/network-protocol/voice_data.md +++ b/docs/dev/network-protocol/voice_data.md @@ -34,23 +34,14 @@ overhead. audio packets encoded with different codecs. Different types are listed in *Audio packet types* table. -```text -+------+----------+-------------------------------+ -| Type | Bitfield | Description | -+======+==========+===============================+ -| 0 | 000xxxxx | CELT Alpha encoded voice data | -+------+----------+-------------------------------+ -| 1 | 001xxxxx | Ping packet | -+------+----------+-------------------------------+ -| 2 | 010xxxxx | Speex encoded voice data | -+------+----------+-------------------------------+ -| 3 | 011xxxxx | CELT Beta encoded voice data | -+------+----------+-------------------------------+ -| 4 | 100xxxxx | OPUS encoded voice data | -+------+----------+-------------------------------+ -| 5-7 | | Unused | -+------+----------+-------------------------------+ -``` +| Type | Bitfield | Description | +| ------- | ---------- | ----------------------------- | +| `0` | `000xxxxx` | CELT Alpha encoded voice data | +| `1` | `001xxxxx` | Ping packet | +| `2` | `010xxxxx` | Speex encoded voice data | +| `3` | `011xxxxx` | CELT Beta encoded voice data | +| `4` | `100xxxxx` | OPUS encoded voice data | +| `5`-`7` | | Unused | `target`: The target portion defines the recipient for the audio data. The two constant @@ -88,15 +79,10 @@ audio transport layer. These packets contain only varint encoded timestamp as data. See *UDP connectivity checks* section below for the logic involved in the connectivity checks. -```text -+--------+--------+------------------+ -| Field | Type | Description | -+========+========+==================+ -| Header | byte | 00100000b (0x20) | -+--------+--------+------------------+ -| Data | varint | Timestamp | -+--------+--------+------------------+ -``` +| Field | Type | Description | +| -------- | -------- | -------------------- | +| `Header` | `byte` | `00100000b` (`0x20`) | +| `Data` | `varint` | Timestamp | `Header`: Common audio packet header. For ping packets this should have the value of @@ -124,37 +110,22 @@ possible positional audio data can be read from the end. Incoming encoded audio packet: -```text -+--------------------+------------+-----------------------------------------------------------+ -| Field | Type | Description | -+====================+============+===========================================================+ -| Header | byte | Codec type/Audio target | -+--------------------+------------+-----------------------------------------------------------+ -| Session ID | varint | Session ID of the source user. | -+--------------------+------------+-----------------------------------------------------------+ -| Sequence Number | varint | Sequence number of the first audio data **segment**. | -+--------------------+------------+-----------------------------------------------------------+ -| Payload | byte[] | Audio payload | -+--------------------+------------+-----------------------------------------------------------+ -| Position Info | float[3] | Positional audio information | -+--------------------+------------+-----------------------------------------------------------+ -``` +| Field | Type | Description | +| ----------------- | ---------- | -------------------------------------------------- | +| `Header` | `byte` | Codec type/Audio target | +| `Session ID` | `varint` | Session ID of the source user. | +| `Sequence Number` | `varint` | Sequence number of the first audio data *segment*. | +| `Payload` | `byte[]` | Audio payload | +| `Position Info` | `float[3]` | Positional audio information | Outgoing encoded audio packet: -```text -+--------------------+------------+-----------------------------------------------------------+ -| Field | Type | Description | -+====================+============+===========================================================+ -| Header | byte | Codec type/Audio target | -+--------------------+------------+-----------------------------------------------------------+ -| Sequence Number | varint | Sequence number of the first audio data **segment**. | -+--------------------+------------+-----------------------------------------------------------+ -| Payload | byte[] | Audio payload | -+--------------------+------------+-----------------------------------------------------------+ -| Position Info | float[3] | Positional audio information | -+--------------------+------------+-----------------------------------------------------------+ -``` +| Field | Type | Description | +| ----------------- | ---------- | -------------------------------------------------- | +| `Header` | `byte` | Codec type/Audio target | +| `Sequence Number` | `varint` | Sequence number of the first audio data *segment*. | +| `Payload` | `byte[]` | Audio payload | +| `Position Info` | `float[3]` | Positional audio information | `Header`: The common audio packet header @@ -188,15 +159,10 @@ Outgoing encoded audio packet: Encoded Speex and CELT audio is transported as individual encoded frames. Each frame is prefixed with a single byte length and terminator header. -```text -+---------+-----------+-----------------------------------------+ -| Field | Type | Description | -+=========+===========+=========================================+ -| Header | byte | length/continuation header | -+---------+-----------+-----------------------------------------+ -| Data | byte[] | Encoded voice frame | -+---------+-----------+-----------------------------------------+ -``` +| Field | Type | Description | +| -------- | -------- | -------------------------- | +| `Header` | `byte` | length/continuation header | +| `Data` | `byte[]` | Encoded voice frame | `Header`: The length of the Data field. The most significant bit (`0x80`) acts as the @@ -215,15 +181,10 @@ frame is prefixed with a single byte length and terminator header. Encoded Opus audio is transported as a single Opus audio frame. The frame is prefixed with a variable byte header. -```text -+---------+-------------+-----------------------------------------+ -| Field | Type | Description | -+=========+=============+=========================================+ -| Header | varint | length/terminator header | -+---------+-------------+-----------------------------------------+ -| Data | byte[] | Encoded voice frame | -+---------+-------------+-----------------------------------------+ -``` +| Field | Type | Description | +| -------- | -------- | ------------------------ | +| `Header` | `varint` | length/terminator header | +| `Data` | `byte[]` | Encoded voice frame | `Header`: The length of the Data field. 16-bit variable length integer encoded length @@ -344,24 +305,13 @@ See the *quint64* shift operators in for a reference implementation. -```text -+--------------------------+---------------------------------------------+ -| Encoded | Decoded | -+==========================+=============================================+ -| 0xxxxxxx | 7-bit positive number | -+--------------------------+---------------------------------------------+ -| 10xxxxxx + 1 byte | 14-bit positive number | -+--------------------------+---------------------------------------------+ -| 110xxxxx + 2 bytes | 21-bit positive number | -+--------------------------+---------------------------------------------+ -| 1110xxxx + 3 bytes | 28-bit positive number | -+--------------------------+---------------------------------------------+ -| 111100__ + int (32-bit) | 32-bit positive number | -+--------------------------+---------------------------------------------+ -| 111101__ + long (64-bit) | 64-bit number | -+--------------------------+---------------------------------------------+ -| 111110__ + varint | Negative recursive varint | -+--------------------------+---------------------------------------------+ -| 111111xx | Byte-inverted negative two bit number (~xx) | -+--------------------------+---------------------------------------------+ -``` +| Encoded | Decoded | +| ---------------------------- | ------------------------------------------- | +| `0xxxxxxx` | 7-bit positive number | +| `10xxxxxx` + 1 byte | 14-bit positive number | +| `110xxxxx` + 2 bytes | 21-bit positive number | +| `1110xxxx` + 3 bytes | 28-bit positive number | +| `111100__` + `int` (32-bit) | 32-bit positive number | +| `111101__` + `long` (64-bit) | 64-bit number | +| `111110__` + `varint` | Negative recursive varint | +| `111111xx` | Byte-inverted negative two bit number (~xx) |