Internet Engineering Task Force (IETF) M. Westerlund Request for Comments: 7667 Ericsson Obsoletes: 5117 S. Wenger Category: Informational Vidyo ISSN: 2070-1721 November 2015 RTP Topologies
AbstractThis document discusses point-to-point and multi-endpoint topologies used in environments based on the Real-time Transport Protocol (RTP). In particular, centralized topologies commonly employed in the video conferencing industry are mapped to the RTP terminology. This document is updated with additional topologies and replaces RFC 5117. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7667.
Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Definitions Related to RTP Grouping Taxonomy . . . . . . 5 3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 6 3.2. Point to Point via Middlebox . . . . . . . . . . . . . . 7 3.2.1. Translators . . . . . . . . . . . . . . . . . . . . . 7 3.2.2. Back-to-Back RTP sessions . . . . . . . . . . . . . . 11 3.3. Point to Multipoint Using Multicast . . . . . . . . . . . 12 3.3.1. Any-Source Multicast (ASM) . . . . . . . . . . . . . 12 3.3.2. Source-Specific Multicast (SSM) . . . . . . . . . . . 14 3.3.3. SSM with Local Unicast Resources . . . . . . . . . . 15 3.4. Point to Multipoint Using Mesh . . . . . . . . . . . . . 17 3.5. Point to Multipoint Using the RFC 3550 Translator . . . . 20 3.5.1. Relay - Transport Translator . . . . . . . . . . . . 20 3.5.2. Media Translator . . . . . . . . . . . . . . . . . . 21 3.6. Point to Multipoint Using the RFC 3550 Mixer Model . . . 22 3.6.1. Media-Mixing Mixer . . . . . . . . . . . . . . . . . 24 3.6.2. Media-Switching Mixer . . . . . . . . . . . . . . . . 27 3.7. Selective Forwarding Middlebox . . . . . . . . . . . . . 29 3.8. Point to Multipoint Using Video-Switching MCUs . . . . . 33 3.9. Point to Multipoint Using RTCP-Terminating MCU . . . . . 34 3.10. Split Component Terminal . . . . . . . . . . . . . . . . 35 3.11. Non-symmetric Mixer/Translators . . . . . . . . . . . . . 38 3.12. Combining Topologies . . . . . . . . . . . . . . . . . . 38 4. Topology Properties . . . . . . . . . . . . . . . . . . . . . 39 4.1. All-to-All Media Transmission . . . . . . . . . . . . . . 39 4.2. Transport or Media Interoperability . . . . . . . . . . . 40 4.3. Per-Domain Bitrate Adaptation . . . . . . . . . . . . . . 40 4.4. Aggregation of Media . . . . . . . . . . . . . . . . . . 41 4.5. View of All Session Participants . . . . . . . . . . . . 41 4.6. Loop Detection . . . . . . . . . . . . . . . . . . . . . 42 4.7. Consistency between Header Extensions and RTCP . . . . . 42 5. Comparison of Topologies . . . . . . . . . . . . . . . . . . 42 6. Security Considerations . . . . . . . . . . . . . . . . . . . 43 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 7.1. Normative References . . . . . . . . . . . . . . . . . . 45 7.2. Informative References . . . . . . . . . . . . . . . . . 45 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 48 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 48
RFC3550] topologies describe methods for interconnecting RTP entities and their processing behavior for RTP and the RTP Control Protocol (RTCP). This document tries to address past and existing confusion, especially with respect to terms not defined in RTP but in common use in the communication industry, such as the Multipoint Control Unit or MCU. When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was developed, the main emphasis lay in the efficient support of point-to-point and small multipoint scenarios without centralized multipoint control. In practice, however, most multipoint conferences operate utilizing centralized units referred to as MCUs. MCUs may implement mixer or translator functionality (in RTP [RFC3550] terminology) and signaling support. They may also contain additional application-layer functionality. This document focuses on the media transport aspects of the MCU that can be realized using RTP, as discussed below. Further considered are the properties of mixers and translators, and how some types of deployed MCUs deviate from these properties. This document also codifies new multipoint architectures that have recently been introduced and that were not anticipated in RFC 5117; thus, this document replaces [RFC5117]. These architectures use scalable video coding and simulcasting, and their associated centralized units are referred to as Selective Forwarding Middleboxes (SFMs). This codification provides a common information basis for future discussion and specification work. The new topologies are Point to Point via Middlebox (Section 3.2), Source-Specific Multicast (Section 3.3.2), SSM with Local Unicast Resources (Section 3.3.3), Point to Multipoint Using Mesh (Section 3.4), Selective Forwarding Middlebox (Section 3.7), and Split Component Terminal (Section 3.10). The Point to Multipoint Using the RFC 3550 Mixer Model (Section 3.6) has been significantly expanded to cover two different versions, namely Media-Mixing Mixer (Section 3.6.1) and Media-Switching Mixer (Section 3.6.2). The document's attempt to clarify and explain sections of the RTP spec [RFC3550] is informal. It is not intended to update or change what is normatively specified within RFC 3550.
RFC7656]. Communication Session: A Communication Session is an association among two or more Participants communicating with each other via one or more Multimedia Sessions. Endpoint: A single addressable entity sending or receiving RTP packets. It may be decomposed into several functional blocks, but as long as it behaves as a single RTP stack mentity, it is classified as a single "endpoint". Media Source: A Media Source is the logical source of a time progressing digital media stream synchronized to a reference clock. This stream is called a Source Stream.
Multimedia Session: A Multimedia Session is an association among a group of participants engaged in communication via one or more RTP sessions. RFC4585] and [RFC5104]. Figure 1) consists of two endpoints, communicating using unicast. Both RTP and RTCP traffic are conveyed endpoint to endpoint, using unicast traffic only (even if, in exotic cases, this unicast traffic happens to be conveyed over an IP multicast address). +---+ +---+ | A |<------->| B | +---+ +---+ Figure 1: Point to Point The main property of this topology is that A sends to B, and only B, while B sends to A, and only A. This avoids all complexities of handling multiple endpoints and combining the requirements stemming from them. Note that an endpoint can still use multiple RTP Synchronization Sources (SSRCs) in an RTP session. The number of RTP sessions in use between A and B can also be of any number, subject only to system-level limitations like the number range of ports.
RTCP feedback messages for the indicated SSRCs are communicated directly between the endpoints. Therefore, this topology poses minimal (if any) issues for any feedback messages. For RTP sessions that use multiple SSRCs per endpoint, it can be relevant to implement support for cross-reporting suppression as defined in "Sending Multiple Media Streams in a Single RTP Session" [MULTI-STREAM-OPT]. Section 7.1 of [RFC3550], the SSRC space is common for all participants in the RTP session, independent of on which side of the translator the session resides. Therefore, it is the responsibility of the endpoints (as the RTP session participants) to run SSRC collision detection, and the SSRC is thus a field the translator cannot change. Any Source Description (SDES) information associated with an SSRC or CSRC also needs to be forwarded between the domains for any SSRC/CSRC used in the different domains. A translator commonly does not use an SSRC of its own and is not visible as an active participant in the RTP session. One reason to have its own SSRC is when a translator acts as a quality monitor that sends RTCP reports and therefore is required to have an SSRC. Another example is the case when a translator is prepared to use RTCP feedback messages. This may, for example, occur in a translator
configured to detect packet loss of important video packets, and it wants to trigger repair by the media sending endpoint, by sending feedback messages. While such feedback could use the SSRC of the target for the translator (the receiving endpoint), this in turn would require translation of the target RTCP reports to make them consistent. It may be simpler to expose an additional SSRC in the session. The only concern is that endpoints failing to support the full RTP specification may have issues with multiple SSRCs reporting on the RTP streams sent by that endpoint, as this use case may be viewed as exotic by implementers. In general, a translator implementation should consider which RTCP feedback messages or codec-control messages it needs to understand in relation to the functionality of the translator itself. This is completely in line with the requirement to also translate RTCP messages between the domains. RFC3022] traversal by pinning the media path to a public address domain relay and network topologies where the RTP stream is required to pass a particular point for audit by employing relaying, or preserving privacy by hiding each peer's transport addresses to the other party. Other protocols or functionalities that provide this behavior are Traversal Using Relays around NAT (TURN) [RFC5766] servers, Session Border Gateways, and Media Processing Nodes with media anchoring functionalities. +---+ +---+ +---+ | A |<------>| T |<------->| B | +---+ +---+ +---+ Figure 2: Point to Point with Translator A common element in these functions is that they are normally transparent at the RTP level, i.e., they perform no changes on any RTP or RTCP packet fields and only affect the lower layers. They may affect, however, the path since the RTP and RTCP packets are routed between the endpoints in the RTP session, and thereby they indirectly affect the RTP session. For this reason, one could believe that Transport Translator-type middleboxes do not need to be included in this document. This topology, however, can raise additional
requirements in the RTP implementation and its interactions with the signaling solution. Both in signaling and in certain RTCP fields, network addresses other than those of the relay can occur since B has a different network address than the relay (T). Implementations that cannot support this will also not work correctly when endpoints are subject to NAT. The Transport Relay implementations also have to take into account security considerations. In particular, source address filtering of incoming packets is usually important in relays, to prevent attackers from injecting traffic into a session, which one peer may, in the absence of adequate security in the relay, think it comes from the other peer. Section 22.214.171.124.
codec. Media Translators are commonly used to connect endpoints without a common interoperability point in the media encoding. Stand-alone Media Translators are rare. Most commonly, a combination of Transport and Media Translator is used to translate both the media and the transport aspects of the RTP stream carrying the media between two transport domains. When media translation occurs, the translator's task regarding handling of RTCP traffic becomes substantially more complex. In this case, the translator needs to rewrite endpoint B's RTCP receiver report before forwarding them to endpoint A. The rewriting is needed as the RTP stream received by B is not the same RTP stream as the other participants receive. For example, the number of packets transmitted to B may be lower than what A sends, due to the different media format and data rate. Therefore, if the receiver reports were forwarded without changes, the extended highest sequence number would indicate that B was substantially behind in reception, while it most likely would not be. Therefore, the translator must translate that number to a corresponding sequence number for the stream the translator received. Similar requirements exist for most other fields in the RTCP receiver reports. A Media Translator may in some cases act on behalf of the "real" source (the endpoint originally sending the media to the translator) and respond to RTCP feedback messages. This may occur, for example, when a receiving endpoint requests a bandwidth reduction, and the Media Translator has not detected any congestion or other reasons for bandwidth reduction between the sending endpoint and itself. In that case, it is sensible that the Media Translator reacts to codec control messages itself, for example, by transcoding to a lower media rate. A variant of translator behavior worth pointing out is the one depicted in Figure 3 of an endpoint A sending an RTP stream containing media (only) to B. On the path, there is a device T that manipulates the RTP streams on A's behalf. One common example is that T adds a second RTP stream containing Forward Error Correction (FEC) information in order to protect A's (non FEC-protected) RTP stream. In this case, T needs to semantically bind the new FEC RTP stream to A's media-carrying RTP stream, for example, by using the same CNAME as A.
+------+ +------+ +------+ | | | | | | | A |------->| T |-------->| B | | | | |---FEC-->| | +------+ +------+ +------+ Figure 3: Media Translator Adding FEC There may also be cases where information is added into the original RTP stream, while leaving most or all of the original RTP packets intact (with the exception of certain RTP header fields, such as the sequence number). One example is the injection of metadata into the RTP stream, carried in their own RTP packets. Similarly, a Media Translator can sometimes remove information from the RTP stream, while otherwise leaving the remaining RTP packets unchanged (again with the exception of certain RTP header fields). Either type of functionality where T manipulates the RTP stream, or adds an accompanying RTP stream, on behalf of A is also covered under the Media Translator definition. |<--Session A-->| |<--Session B-->| +------+ +------+ +------+ | A |------->| MB |-------->| B | +------+ +------+ +------+ Figure 4: Back-to-Back RTP Sessions through Middlebox The middlebox acts as an application-level gateway and bridges the two RTP sessions. This bridging can be as basic as forwarding the RTP payloads between the sessions or more complex including media transcoding. The difference of this topology relative to the single RTP session context is the handling of the SSRCs and the other session-related identifiers, such as CNAMEs. With two different RTP sessions, these can be freely changed and it becomes the middlebox's responsibility to maintain the correct relations.
The signaling or other above RTP-level functionalities referencing RTP streams may be what is most impacted by using two RTP sessions and changing identifiers. The structure with two RTP sessions also puts a congestion control requirement on the middlebox, because it becomes fully responsible for the media stream it sources into each of the sessions. Adherence to congestion control can be solved locally on each of the two segments or by bridging statistics from the receiving endpoint through the middlebox to the sending endpoint. From an implementation point, however, the latter requires dealing with a number of inconsistencies. First, packet loss must be detected for an RTP stream sent from A to the middlebox, and that loss must be reported through a skipped sequence number in the RTP stream from the middlebox to B. This coupling and the resulting inconsistencies are conceptually easier to handle when considering the two RTP streams as belonging to a single RTP session. RFC1112] where any multicast group participant can send to the group address and expect the packet to reach all group participants and Source-Specific Multicast (SSM) [RFC3569], where only a particular IP host sends to the multicast group. Each of these models are discussed below in their respective sections. +-----+ +---+ / \ +---+ | A |----/ \---| B | +---+ / Multi- \ +---+ + cast + +---+ \ Network / +---+ | C |----\ /---| D | +---+ \ / +---+ +-----+ Figure 5: Point to Multipoint Using Multicast
Point to Multipoint (PtM) is defined here as using a multicast topology as a transmission model, in which traffic from any multicast group participant reaches all the other multicast group participants, except for cases such as: o packet loss, or o when a multicast group participant does not wish to receive the traffic for a specific multicast group and, therefore, has not subscribed to the IP multicast group in question. This scenario can occur, for example, where a Multimedia Session is distributed using two or more multicast groups, and a multicast group participant is subscribed only to a subset of these sessions. In the above context, "traffic" encompasses both RTP and RTCP traffic. The number of multicast group participants can vary between one and many, as RTP and RTCP scale to very large multicast groups (the theoretical limit of the number of participants in a single RTP session is in the range of billions). The above can be realized using ASM. For feedback usage, it is useful to define a "small multicast group" as a group where the number of multicast group participants is so low (and other factors such as the connectivity is so good) that it allows the participants to use early or immediate feedback, as defined in AVPF [RFC4585]. Even when the environment would allow for the use of a small multicast group, some applications may still want to use the more limited options for RTCP feedback available to large multicast groups, for example, when there is a likelihood that the threshold of the small multicast group (in terms of multicast group participants) may be exceeded during the lifetime of a session. RTCP feedback messages in multicast reach, like media data, every subscriber (subject to packet losses and multicast group subscription). Therefore, the feedback suppression mechanism discussed in [RFC4585] is typically required. Each individual endpoint that is a multicast group participant needs to process every feedback message it receives, not only to determine if it is affected or if the feedback message applies only to some other endpoint but also to derive timing restrictions for the sending of its own feedback messages, if any.
RFC3569][RFC4607] refers to scenarios where only a single source (Distribution Source) can send to the multicast group, creating a topology that looks like the one below: +--------+ +-----+ |Media | | | Source-Specific |Sender 1|<----->| D S | Multicast +--------+ | I O | +--+----------------> R(1) | S U | | | | +--------+ | T R | | +-----------> R(2) | |Media |<----->| R C |->+ | : | | |Sender 2| | I E | | +------> R(n-1) | | +--------+ | B | | | | | | : | U | +--+--> R(n) | | | : | T +-| | | | | : | I | |<---------+ | | | +--------+ | O |F|<---------------+ | | |Media | | N |T|<--------------------+ | |Sender M|<----->| | |<-------------------------+ +--------+ +-----+ RTCP Unicast FT = Feedback Target Transport from the Feedback Target to the Distribution Source is via unicast or multicast RTCP if they are not co-located. Figure 6: Point to Multipoint Using Source-Specific Multicast In the SSM topology (Figure 6), a number of RTP sending endpoints (RTP sources henceforth) (1 to M) are allowed to send media to the SSM group. These sources send media to a dedicated Distribution Source, which forwards the RTP streams to the multicast group on behalf of the original RTP sources. The RTP streams reach the receiving endpoints (receivers henceforth) (R(1) to R(n)). The receivers' RTCP messages cannot be sent to the multicast group, as the SSM multicast group by definition has only a single IP sender. To support RTCP, an RTP extension for SSM [RFC5760] was defined. It uses unicast transmission to send RTCP from each of the receivers to one or more Feedback Targets (FT). The Feedback Targets relay the RTCP unmodified, or provide a summary of the participants' RTCP reports towards the whole group by forwarding the RTCP traffic to the
Distribution Source. Figure 6 only shows a single Feedback Target integrated in the Distribution Source, but for scalability the FT can be distributed and each instance can have responsibility for subgroups of the receivers. For summary reports, however, there typically must be a single Feedback Target aggregating all the summaries to a common message to the whole receiver group. The RTP extension for SSM specifies how feedback (both reception information and specific feedback events) are handled. The more general problems associated with the use of multicast, where everyone receives what the Distribution Source sends, need to be accounted for. The aforementioned situation results in common behavior for RTP multicast: 1. Multicast applications often use a group of RTP sessions, not one. Each endpoint needs to be a member of most or all of these RTP sessions in order to perform well. 2. Within each RTP session, the number of media sinks is likely to be much larger than the number of RTP sources. 3. Multicast applications need signaling functions to identify the relationships between RTP sessions. 4. Multicast applications need signaling functions to identify the relationships between SSRCs in different RTP sessions. All multicast configurations share a signaling requirement: all of the endpoints need to have the same RTP and payload type configuration. Otherwise, endpoint A could, for example, be using payload type 97 to identify the video codec H.264, while endpoint B would identify it as MPEG-2, with unpredictable but almost certainly not visually pleasing results. Security solutions for this type of group communication are also challenging. First, the key management and the security protocol must support group communication. Source authentication becomes more difficult and requires specialized solutions. For more discussion on this, please review "Options for Securing RTP Sessions" [RFC7201]. RFC6285] results in additional extensions to SSM topology.
----------- -------------- | |------------------------------------>| | | |.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.->| | | | | | | Multicast | ---------------- | | | Source | | Retransmission | | | | |-------->| Server (RS) | | | | |.-.-.-.->| | | | | | | ------------ | | | ----------- | | Feedback | |<.=.=.=.=.| | | | Target (FT)| |<~~~~~~~~~| RTP Receiver | PRIMARY MULTICAST | ------------ | | (RTP_Rx) | RTP SESSION with | | | | UNICAST FEEDBACK | | | | | | | | - - - - - - - - - - - |- - - - - - - - |- - - - - |- - - - - - - |- - | | | | UNICAST BURST | ------------ | | | (or RETRANSMISSION) | | Burst/ | |<~~~~~~~~>| | RTP SESSION | | Retrans. | |.........>| | | |Source (BRS)| |<.=.=.=.=>| | | ------------ | | | | | | | ---------------- -------------- -------> Multicast RTP Stream .-.-.-.> Multicast RTCP Stream .=.=.=.> Unicast RTCP Reports ~~~~~~~> Unicast RTCP Feedback Messages .......> Unicast RTP Stream Figure 7: SSM with Local Unicast Resources (RAMS) The rapid acquisition extension allows an endpoint joining an SSM multicast session to request media starting with the last sync point (from where media can be decoded without requiring context established by the decoding of prior packets) to be sent at high speed until such time where, after the decoding of these burst- delivered media packets, the correct media timing is established, i.e., media packets are received within adequate buffer intervals for this application. This is accomplished by first establishing a unicast PtP RTP session between the Burst/Retransmission Source (BRS) (Figure 7) and the RTP Receiver. The unicast session is used to transmit cached packets from the multicast group at higher then normal speed in order to synchronize the receiver to the ongoing multicast RTP stream. Once the RTP receiver and its decoder have caught up with the multicast session's current delivery, the receiver switches over to receiving directly from the multicast group. In
many deployed applications, the (still existing) PtP RTP session is used as a repair channel, i.e., for RTP Retransmission traffic of those packets that were not received from the multicast group.