RFC 8108

Sending Multiple RTP Streams in a Single RTP Session

Pages: 29
Proposed Standard
Updates: 3550 4585

Part 1 of 2 – Pages 1 to 15

RFC8108 - Page 1

Internet Engineering Task Force (IETF)                         J. Lennox
Request for Comments: 8108                                         Vidyo
Updates: 3550, 4585                                        M. Westerlund
Category: Standards Track                                       Ericsson
ISSN: 2070-1721                                                    Q. Wu
                                                                  Huawei
                                                              C. Perkins
                                                   University of Glasgow
                                                              March 2017


          Sending Multiple RTP Streams in a Single RTP Session

Abstract

   This memo expands and clarifies the behavior of Real-time Transport
   Protocol (RTP) endpoints that use multiple synchronization sources
   (SSRCs).  This occurs, for example, when an endpoint sends multiple
   RTP streams in a single RTP session.  This memo updates RFC 3550 with
   regard to handling multiple SSRCs per endpoint in RTP sessions, with
   a particular focus on RTP Control Protocol (RTCP) behavior.  It also
   updates RFC 4585 to change and clarify the calculation of the timeout
   of SSRCs and the inclusion of feedback messages.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc8108.

RFC8108 - Page 2

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

RFC8108 - Page 3

Table of Contents

   1. Introduction ....................................................4
   2. Terminology .....................................................4
   3. Use Cases for Multi-Stream Endpoints ............................4
      3.1. Endpoints with Multiple Capture Devices ....................4
      3.2. Multiple Media Types in a Single RTP Session ...............5
      3.3. Multiple Stream Mixers .....................................5
      3.4. Multiple SSRCs for a Single Media Source ...................5
   4. Use of RTP by Endpoints That Send Multiple Media Streams ........6
   5. Use of RTCP by Endpoints That Send Multiple Media Streams .......6
      5.1. RTCP Reporting Requirement .................................7
      5.2. Initial Reporting Interval .................................7
      5.3. Aggregation of Reports into Compound RTCP Packets ..........8
           5.3.1. Maintaining AVG_RTCP_SIZE ...........................9
           5.3.2. Scheduling RTCP when Aggregating Multiple SSRCs ....10
      5.4. Use of RTP/AVPF or RTP/SAVPF Feedback .....................13
           5.4.1. Choice of SSRC for Feedback Packets ................13
           5.4.2. Scheduling an RTCP Feedback Packet .................14
   6. Adding and Removing SSRCs ......................................15
      6.1. Adding RTP Streams ........................................16
      6.2. Removing RTP Streams ......................................16
   7. RTCP Considerations for Streams with Disparate Rates ...........17
      7.1. Timing Out SSRCs ..........................................19
           7.1.1. Problems with the RTP/AVPF T_rr_interval
                  Parameter ..........................................19
           7.1.2. Avoiding Premature Timeout .........................20
           7.1.3. Interoperability between RTP/AVP and RTP/AVPF ......21
           7.1.4. Updated SSRC Timeout Rules .........................22
      7.2. Tuning RTCP Transmissions .................................22
           7.2.1. RTP/AVP and RTP/SAVP ...............................22
           7.2.2. RTP/AVPF and RTP/SAVPF .............................24
   8. Security Considerations ........................................25
   9. References .....................................................26
      9.1. Normative References ......................................26
      9.2. Informative References ....................................26
   Acknowledgments ...................................................29
   Authors' Addresses ................................................29

RFC8108 - Page 4

1.  Introduction

   At the time the Real-Time Transport Protocol (RTP) [RFC3550] was
   originally designed, and for quite some time after, endpoints in RTP
   sessions typically only transmitted a single media source and, thus,
   used a single RTP stream and synchronization source (SSRC) per RTP
   session, where separate RTP sessions were typically used for each
   distinct media type.  Recently, however, a number of scenarios have
   emerged in which endpoints wish to send multiple RTP streams,
   distinguished by distinct RTP synchronization source (SSRC)
   identifiers, in a single RTP session.  These are outlined in
   Section 3.  Although the initial design of RTP did consider such
   scenarios, the specification was not consistently written with such
   use cases in mind; thus, the specification is somewhat unclear in
   places.

   This memo updates [RFC3550] to clarify behavior in use cases where
   endpoints use multiple SSRCs.  It also updates [RFC4585] to resolve
   problems with regard to timeout of inactive SSRCs and to clarify
   behavior around inclusion of feedback messages.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in RFC
   2119 [RFC2119] and indicate requirement levels for compliant
   implementations.

3.  Use Cases for Multi-Stream Endpoints

   This section discusses several use cases that have motivated the
   development of endpoints that sends RTP data using multiple SSRCs in
   a single RTP session.

3.1.  Endpoints with Multiple Capture Devices

   The most straightforward motivation for an endpoint to send multiple
   simultaneous RTP streams in a single RTP session is when an endpoint
   has multiple capture devices and, hence, can generate multiple media
   sources, of the same media type and characteristics.  For example,
   telepresence systems of the type described by the CLUE Telepresence
   Framework [CLUE-FRAME] often have multiple cameras or microphones
   covering various areas of a room and, hence, send several RTP streams
   of each type within a single RTP session.

RFC8108 - Page 5

3.2.  Multiple Media Types in a Single RTP Session

   Recent work has updated RTP [MULTI-RTP] and Session Description
   Protocol (SDP) [SDP-BUNDLE] to remove the historical assumption in
   RTP that media sources of different media types would always be sent
   on different RTP sessions.  In this work, a single endpoint's audio
   and video RTP streams (for example) are instead sent in a single RTP
   session to reduce the number of transport-layer flows used.

3.3.  Multiple Stream Mixers

   There are several RTP topologies that can involve a central device
   that itself generates multiple RTP streams in a session.  An example
   is a mixer providing centralized compositing for a multi-capture
   scenario like that described in Section 3.1.  In this case, the
   centralized node is behaving much like a multi-capturer endpoint,
   generating several similar and related sources.

   A more complex example is the selective forwarding middlebox,
   described in Section 3.7 of [RFC7667].  This is a middlebox that
   receives RTP streams from several endpoints and then selectively
   forwards modified versions of some RTP streams toward the other
   endpoints to which it is connected.  For each connected endpoint, a
   separate media source appears in the session for every other source
   connected to the middlebox, "projected" from the original streams,
   but at any given time many of them can appear to be inactive (and
   thus are receivers, not senders, in RTP).  This sort of device is
   closer to being an RTP mixer than an RTP translator: it terminates
   RTCP reporting about the mixed streams; it can rewrite SSRCs,
   timestamps, and sequence numbers, as well as the contents of the RTP
   payloads; and it can turn sources on and off at will without
   appearing to generate packet loss.  Each projected stream will
   typically preserve its original RTCP source description (SDES)
   information.

3.4.  Multiple SSRCs for a Single Media Source

   There are also several cases where multiple SSRCs can be used to send
   data from a single media source within a single RTP session.  These
   include, but are not limited to, transport robustness tools, such as
   the RTP retransmission payload format [RFC4588], that require one
   SSRC to be used for the media data and another SSRC for the repair
   data.  Similarly, some layered media encoding schemes, for example,
   H.264 Scalable Video Coding (SVC) [RFC6190], can be used in a
   configuration where each layer is sent using a different SSRC within
   a single RTP session.

RFC8108 - Page 6

4.  Use of RTP by Endpoints That Send Multiple Media Streams

   RTP is inherently a group communication protocol.  Each endpoint in
   an RTP session will use one or more SSRCs, as will some types of RTP-
   level middlebox.  Accordingly, unless restrictions on the number of
   SSRCs have been signaled, RTP endpoints can expect to receive RTP
   data packets sent using a number of different SSRCs, within a single
   RTP session.  This can occur irrespective of whether the RTP session
   is running over a point-to-point connection or a multicast group,
   since middleboxes can be used to connect multiple transport
   connections together into a single RTP session (the RTP session is
   defined by the shared SSRC space, not by the transport connections).
   Furthermore, if RTP mixers are used, some SSRCs might only be visible
   in the contributing source (CSRC) list of an RTP packet and in RTCP,
   and might not appear directly as the SSRC of an RTP data packet.

   Every RTP endpoint will have an allocated share of the available
   session bandwidth, as determined by signaling and congestion control.
   The endpoint needs to keep its total media sending rate within this
   share.  However, endpoints that send multiple RTP streams do not
   necessarily need to subdivide their share of the available bandwidth
   independently or uniformly to each RTP stream and its SSRCs.  In
   particular, an endpoint can vary the bandwidth allocation to
   different streams depending on their needs, and it can dynamically
   change the bandwidth allocated to different SSRCs (for example, by
   using a variable-rate codec), provided the total sending rate does
   not exceed its allocated share.  This includes enabling or disabling
   RTP streams, or their redundancy streams, as more or less bandwidth
   becomes available.

5.  Use of RTCP by Endpoints That Send Multiple Media Streams

   RTCP is defined in Section 6 of [RFC3550].  The description of the
   protocol is phrased in terms of the behavior of "participants" in an
   RTP session, under the assumption that each endpoint is a participant
   with a single SSRC.  However, for correct operation in cases where
   endpoints have multiple SSRC values, implementations MUST treat each
   SSRC as a separate participant in the RTP session, so that an
   endpoint that has multiple SSRCs counts as multiple participants.

RFC8108 - Page 7

5.1.  RTCP Reporting Requirement

   An RTP endpoint that has multiple SSRCs MUST treat each SSRC as a
   separate participant in the RTP session.  Each SSRC will maintain its
   own RTCP-related state information and, hence, will have its own RTCP
   reporting interval that determines when it sends RTCP reports.  If
   the mechanism in [MULTI-STREAM-OPT] is not used, then each SSRC will
   send RTCP reports for all other SSRCs, including those co-located at
   the same endpoint.

   If the endpoint has some SSRCs that are sending data and some that
   are only receivers, then they will receive different shares of the
   RTCP bandwidth and calculate different base RTCP reporting intervals.
   Otherwise, all SSRCs at an endpoint will calculate the same base RTCP
   reporting interval.  The actual reporting intervals for each SSRC are
   randomized in the usual way, but reports can be aggregated as
   described in Section 5.3.

5.2.  Initial Reporting Interval

   When a participant joins a unicast session, the following text from
   Section 6.2 of [RFC3550] is relevant: "For unicast sessions... the
   delay before sending the initial compound RTCP packet MAY be zero."
   The basic assumption is that this also ought to apply in the case of
   multiple SSRCs.  Caution has to be exercised, however, when an
   endpoint (or middlebox) with a large number of SSRCs joins a unicast
   session, since immediate transmission of many RTCP reports can create
   a significant burst of traffic, leading to transient congestion and
   packet loss due to queue overflows.

   To ensure that the initial burst of traffic generated by an RTP
   endpoint is no larger than would be generated by a TCP connection, an
   RTP endpoint MUST NOT send more than four compound RTCP packets with
   zero initial delay when it joins an RTP session, independent of the
   number of SSRCs used by the endpoint.  Each of those initial compound
   RTCP packets MAY include aggregated reports from multiple SSRCs,
   provided the total compound RTCP packet size does not exceed the MTU,
   and the avg_rtcp_size is maintained as in Section 5.3.1.  Aggregating
   reports from several SSRCs in the initial compound RTCP packets
   allows a substantial number of SSRCs to report immediately.
   Endpoints SHOULD prioritize reports on SSRCs that are likely to be
   most immediately useful, e.g., for SSRCs that are initially senders.

   An endpoint that needs to report on more SSRCs than will fit into the
   four compound RTCP reports that can be sent immediately MUST send the
   other reports later, following the usual RTCP timing rules including
   timer reconsideration.  Those reports MAY be aggregated as described
   in Section 5.3.

RFC8108 - Page 8

      Note: The above is chosen to match the TCP maximum initial window
      of four packets [RFC3390], not the larger TCP initial windows for
      which there is an ongoing experiment [RFC6928].  The reason for
      this is a desire to be conservative, since an RTP endpoint will
      also in many cases start sending RTP data packets at the same time
      as these initial RTCP packets are sent.

5.3.  Aggregation of Reports into Compound RTCP Packets

   As outlined in Section 5.1, an endpoint with multiple SSRCs has to
   treat each SSRC as a separate participant when it comes to sending
   RTCP reports.  This will lead to each SSRC sending a compound RTCP
   packet in each reporting interval.  Since these packets are coming
   from the same endpoint, it might reasonably be expected that they can
   be aggregated to reduce overheads.  Indeed, Section 6.1 of [RFC3550]
   allows RTP translators and mixers to aggregate packets in similar
   circumstances:

      It is RECOMMENDED that translators and mixers combine individual
      RTCP packets from the multiple sources they are forwarding into
      one compound packet whenever feasible in order to amortize the
      packet overhead (see Section 7).  An example RTCP compound packet
      as might be produced by a mixer is shown in Fig. 1.  If the
      overall length of a compound packet would exceed the MTU of the
      network path, it SHOULD be segmented into multiple shorter
      compound packets to be transmitted in separate packets of the
      underlying protocol.  This does not impair the RTCP bandwidth
      estimation because each compound packet represents at least one
      distinct participant.  Note that each of the compound packets MUST
      begin with an SR or RR packet.

   This allows RTP translators and mixers to generate compound RTCP
   packets that contain multiple Sender Report (SR) or Receiver Report
   (RR) packets from different SSRCs, as well as any of the other packet
   types.  There are no restrictions on the order in which the RTCP
   packets can occur within the compound packet, except the regular rule
   that the compound RTCP packet starts with an SR or RR packet.  Due to
   this rule, correctly implemented RTP endpoints will be able to handle
   compound RTCP packets that contain RTCP packets relating to multiple
   SSRCs.

   Accordingly, endpoints that use multiple SSRCs can aggregate the RTCP
   packets sent by their different SSRCs into compound RTCP packets,
   provided 1) the resulting compound RTCP packets begin with an SR or
   RR packet, 2) they maintain the average RTCP packet size as described
   in Section 5.3.1, and 3) they schedule packet transmission and manage
   aggregation as described in Section 5.3.2.

RFC8108 - Page 9

5.3.1.  Maintaining AVG_RTCP_SIZE

   The RTCP scheduling algorithm in [RFC3550] works on a per-SSRC basis.
   Each SSRC sends a single compound RTCP packet in each RTCP reporting
   interval.  When an endpoint uses multiple SSRCs, it is desirable to
   aggregate the compound RTCP packets sent by its SSRCs, reducing the
   overhead by forming a larger compound RTCP packet.  This aggregation
   can be done as described in Section 5.3.2, provided the average RTCP
   packet size calculation is updated as follows.

   Participants in an RTP session update their estimate of the average
   RTCP packet size (avg_rtcp_size) each time they send or receive an
   RTCP packet (see Section 6.3.3 of [RFC3550]).  When a compound RTCP
   packet that contains RTCP packets from several SSRCs is sent or
   received, the avg_rtcp_size estimate for each SSRC that is reported
   upon is updated using div_packet_size rather than the actual packet
   size:

      avg_rtcp_size = (1/16) * div_packet_size + (15/16) * avg_rtcp_size

   where div_packet_size is packet_size divided by the number of SSRCs
   reporting in that compound packet.  The number of SSRCs reporting in
   a compound packet is determined by counting the number of different
   SSRCs that are the source of SR or RR RTCP packets within the
   compound RTCP packet.  Non-compound RTCP packets (i.e., RTCP packets
   that do not contain an SR or RR packet [RFC5506]) are considered to
   report on a single SSRC.

   A participant that doesn't follow the above rule, and instead uses
   the full RTCP compound packet size to calculate avg_rtcp_size, will
   derive an RTCP reporting interval that is overly large by a factor
   that is proportional to the number of SSRCs aggregated into compound
   RTCP packets and the size of set of SSRCs being aggregated relative
   to the total number of participants.  This increased RTCP reporting
   interval can cause premature timeouts if it is more than five times
   the interval chosen by the SSRCs that understand compound RTCP that
   aggregate reports from many SSRCs.  A 1500-octet MTU can fit five
   typical-size reports into a compound RTCP packet, so this is a real
   concern if endpoints aggregate RTCP reports from multiple SSRCs.

   The issue raised in the previous paragraph is mitigated by the
   modification in timeout behavior specified in Section 7.1.2 of this
   memo.  This mitigation is in place in those cases where the RTCP
   bandwidth is sufficiently high that an endpoint, using avg_rtcp_size
   calculated without taking into account the number of reporting SSRCs,
   can transmit more frequently than approximately every 5 seconds.
   Note, however, that the non-updated endpoint's RTCP reporting is
   still negatively impacted even if the premature timeouts of its SSRCs

RFC8108 - Page 10

   are avoided.  If compatibility with non-updated endpoints is a
   concern, the number of reports from different SSRCs aggregated into a
   single compound RTCP packet SHOULD either be limited to two reports
   or aggregation ought not be used at all.  This will limit the
   non-updated endpoint's RTCP reporting interval to be no larger than
   twice the RTCP reporting interval that would be chosen by an endpoint
   following this specification.

5.3.2.  Scheduling RTCP when Aggregating Multiple SSRCs

   This section revises and extends the behavior defined in Section 6.3
   of [RFC3550], and in Section 3.5.3 of [RFC4585] if the RTP/AVPF
   profile or the RTP/SAVPF profile is used, regarding actions to take
   when scheduling and sending RTCP packets where multiple reporting
   SSRCs are aggregating their RTCP packets into the same compound RTCP
   packet.  These changes to the RTCP scheduling rules are needed to
   maintain important RTCP timing properties, including the inter-packet
   distribution, and the behavior during flash joins and other changes
   in session membership.

   The variables tn, tp, tc, T, and Td used in the following are defined
   in Section 6.3 of [RFC3550].  The variables T_rr_interval and
   T_rr_last are defined in [RFC4585].

   Each endpoint MUST schedule RTCP transmission independently for each
   of its SSRCs using the regular calculation of tn for the RTP profile
   being used.  Each time the timer tn expires for an SSRC, the endpoint
   MUST perform RTCP timer reconsideration and, if applicable,
   suppression based on T_rr_interval.  If the result indicates that a
   compound RTCP packet is to be sent by that SSRC, and the transmission
   is not an early RTCP packet [RFC4585], then the endpoint SHOULD try
   to aggregate RTCP packets of additional SSRCs that are scheduled in
   the future into the compound RTCP packet before it is sent.  The
   reason to limit or not aggregate due to backwards compatibility
   reasons is discussed in Section 5.3.1.

   Aggregation proceeds as follows.  The endpoint selects the SSRC that
   has the smallest tn value after the current time, tc, and prepares
   the RTCP packets that SSRC would send if its timer tn expired at tc.
   If those RTCP packets will fit into the compound RTCP packet that is
   being generated, taking into account the path MTU and the previously
   added RTCP packets, then they are added to the compound RTCP packet;
   otherwise, they are discarded.  This process is repeated for each
   SSRC, in order of increasing tn, until the compound RTCP packet is
   full or all SSRCs have been aggregated.  At that point, the compound
   RTCP packet is sent.

RFC8108 - Page 11

   When the compound RTCP packet is sent, the endpoint MUST update tp,
   tn, and T_rr_last (if applicable) for each SSRC that was included.
   These variables are updated as follows:

   a.  For the first SSRC that reported in the compound RTCP packet, set
       the effective transmission time, tt, of that SSRC to tc.

   b.  For each additional SSRC that reported in the compound RTCP
       packet, calculate the transmission time that SSRC would have had
       if it had not been aggregated into the compound RTCP packet.
       This is derived by taking tn for that SSRC, then performing
       reconsideration and updating tn until tp + T <= tn.  Once this is
       done, set the effective transmission time, tt, for that SSRC to
       the calculated value of tn.  If the RTP/AVPF profile or the RTP/
       SAVPF profile is being used, then suppression based on
       T_rr_interval MUST NOT be used in this calculation.

   c.  Calculate average effective transmission time, tt_avg, for the
       compound RTCP packet based on the tt values for all SSRCs sent in
       the compound RTCP packet.  Set tp for each of the SSRCs sent in
       the compound RTCP packet to tt_avg.  If the RTP/AVPF profile or
       the RTP/SAVPF profile is being used, set T_tt_last for each SSRC
       sent in the compound RTCP packet to tt_avg.

   d.  For each of the SSRCs sent in the compound RTCP packet, calculate
       new tn values based on the updated parameters and the usual RTCP
       timing rules and reschedule the timers.

   When using the RTP/AVPF profile or the RTP/SAVPF profile, the above
   mechanism only attempts to aggregate RTCP packets when the compound
   RTCP packet to be sent is not an early RTCP packet, and hence the
   algorithm in Section 3.5.3 of [RFC4585] will control RTCP scheduling.
   If T_rr_interval == 0, or if T_rr_interval != 0 and option 1, 2a, or
   2b of the algorithm are chosen, then the above mechanism updates the
   necessary variables.  However, if the transmission is suppressed per
   option 2c of the algorithm, then tp is updated to tc as aggregation
   has not taken place.

   Reverse reconsideration MUST be performed following Section 6.3.4 of
   [RFC3550].  In some cases, this can lead to the value of tp after
   reverse reconsideration being larger than tc.  This is not a problem,
   and has the desired effect of proportionally pulling the tp value
   towards tc (as well as tn) as the reporting interval shrinks in
   direct proportion the reduced group size.

   The above algorithm has been shown in simulations [Sim88] [Sim92] to
   maintain the inter-RTCP packet transmission time distribution for
   each SSRC and to consume the same amount of bandwidth as

RFC8108 - Page 12

   non-aggregated RTCP packets.  With this algorithm, the actual
   transmission interval for an SSRC triggering an RTCP compound packet
   transmission is following the regular transmission rules.  The value
   tp is set to somewhere in the interval [0, 1.5/1.21828*Td] ahead of
   tc.  The actual value is the average of one instance of tc and the
   randomized transmission times of the additional SSRCs; thus, the
   lower range of the interval is more probable.  This compensates for
   the bias that is otherwise introduced by picking the shortest tn
   value out of the N SSRCs included in aggregate.

   The algorithm also handles the cases where the number of SSRCs that
   can be included in an aggregated packet varies.  An SSRC that
   previously was aggregated and fails to fit in a packet still has its
   own transmission scheduled according to normal rules.  Thus, it will
   trigger a transmission in due time, or the SSRC will be included in
   another aggregate.  The algorithm's behavior under SSRC group size
   changes is as follows:

   RTP sessions where the number of SSRCs is growing:  When the group
      size is growing, Td grows in proportion to the number of new SSRCs
      in the group.  When reconsideration is performed due to expiry of
      the tn timer, that SSRC will reconsider the transmission and with
      a certain probability reschedule the tn timer.  This part of the
      reconsideration algorithm is only impacted by the above algorithm
      having tp values that were in the future instead of set to the
      time of the actual last transmission at the time of updating tp.

   RTP sessions where the number of SSRCs is shrinking:  When the group
      shrinks, reverse reconsideration moves the tp and tn values
      towards tc proportionally to the number of SSRCs that leave the
      session compared to the total number of participants when they
      left.  The setting of the tp value forward in time related to the
      tc could be believed to have negative effect.  However, the reason
      for this setting is to compensate for bias caused by picking the
      shortest tn out of the N aggregated.  This bias remains over a
      reduction in the number of SSRCs.  The reverse reconsideration
      compensates the reduction independently of whether or not
      aggregation is being used.  The negative effect that can occur on
      removing an SSRC is that the most favorable tn belonged to the
      removed SSRC.  The impact of this is limited to delaying the
      transmission, in the worst case, one reporting interval.

   In conclusion, the investigations performed have found no significant
   negative impact on the scheduling algorithm.

RFC8108 - Page 13

5.4.  Use of RTP/AVPF or RTP/SAVPF Feedback

   This section discusses the transmission of RTP/AVPF feedback packets
   when the transmitting endpoint has multiple SSRCs.  The guidelines in
   this section also apply to endpoints using the RTP/SAVPF profile.

5.4.1.  Choice of SSRC for Feedback Packets

   When an RTP/AVPF endpoint has multiple SSRCs, it can choose what SSRC
   to use as the source for the RTCP feedback packets it sends.  Several
   factors can affect that choice:

   o  RTCP feedback packets relating to a particular media type SHOULD
      be sent by an SSRC that receives that media type.  For example,
      when audio and video are multiplexed onto a single RTP session,
      endpoints will use their audio SSRC to send feedback on the audio
      received from other participants.

   o  RTCP feedback packets and RTCP codec control messages that are
      notifications or indications regarding RTP data processed by an
      endpoint MUST be sent from the SSRC used for that RTP data.  This
      includes notifications that relate to a previously received
      request or command [RFC4585][RFC5104].

   o  If separate SSRCs are used to send and receive media, then the
      corresponding SSRC SHOULD be used for feedback, since they have
      differing RTCP bandwidth fractions.  This can also affect the
      consideration of whether or not the SSRC can be used in immediate
      mode.

   o  Some RTCP feedback packet types require consistency in the SSRC
      used.  For example, if a Temporary Maximum Media Stream Bit Rate
      Request (TMMBR) limitation [RFC5104] is set by an SSRC, the same
      SSRC needs to be used to remove the limitation.

   o  If several SSRCs are suitable for sending feedback, it might be
      desirable to use an SSRC that allows the sending of feedback as an
      early RTCP packet.

   When an RTCP feedback packet is sent as part of a compound RTCP
   packet that aggregates reports from multiple SSRCs, there is no
   requirement that the compound packet contain an SR or RR packet
   generated by the sender of the RTCP feedback packet.  For reduced-
   size RTCP packets, aggregation of RTCP feedback packets from multiple
   sources is not limited further than Section 4.2.2 of [RFC5506].

RFC8108 - Page 14

5.4.2.  Scheduling an RTCP Feedback Packet

   When an SSRC has a need to transmit a feedback packet in early mode,
   it MUST schedule that packet following the algorithm in Section 3.5
   of [RFC4585] modified as follows:

   o  To determine whether an RTP session is considered to be a point-
      to-point session or a multiparty session, an endpoint MUST count
      the number of distinct RTCP SDES CNAME values used by the SSRCs
      listed in the SSRC field of RTP data packets it receives and in
      the "SSRC of sender" field of RTCP SR, RR, RTPFB, or PSFB packets
      it receives.  An RTP session is considered to be a multiparty
      session if more than one CNAME is used by those SSRCs, unless
      signaling indicates that the session is to be handled as point to
      point or RTCP reporting groups [MULTI-STREAM-OPT] are used.  If
      RTCP reporting groups are used, an RTP session is considered to be
      a point-to-point session if the endpoint receives only a single
      reporting group and is considered to be a multiparty session if
      multiple reporting groups are received or a combination of
      reporting groups and SSRCs that are not part of a reporting group
      are received.  Endpoints MUST NOT determine whether an RTP session
      is multiparty or point to point based on the type of connection
      (unicast or multicast) used, or on the number of SSRCs received.

   o  When checking if there is already a scheduled compound RTCP packet
      containing feedback messages (Step 2 in Section 3.5.2 of
      [RFC4585]), that check MUST be done considering all local SSRCs.

   o  If an SSRC is not allowed to send an early RTCP packet, then the
      feedback message MAY be queued for transmission as part of any
      early or regular scheduled transmission that can occur within the
      maximum useful lifetime of the feedback message (T_max_fb_delay).
      This modifies the behavior in item 4a in Section 3.5.2 of
      [RFC4585].

   The first bullet point above specifies a rule to determine if an RTP
   session is to be considered a point-to-point session or a multiparty
   session.  This rule is straightforward to implement, but is known to
   incorrectly classify some sessions as multiparty sessions.  The known
   problems are as follows:

   Endpoint with multiple synchronization contexts:  An endpoint that is
      part of a point-to-point session can have multiple synchronization
      contexts, for example, due to forwarding an external media source
      into an interactive real-time conversation.  In this case, the
      classification will consider the peer as two endpoints, while the
      actual RTP/RTCP transmission will be under the control of one
      endpoint.

RFC8108 - Page 15

   Selective Forwarding Middlebox:  The Selective Forwarding Middlebox
      (SFM) as defined in Section 3.7 of [RFC7667] has control over the
      transmission and configurations between itself and each peer
      endpoint individually.  It also fully controls the RTCP packets
      being forwarded between the individual legs.  Thus, this type of
      middlebox can be compared to the RTP mixer, which uses its own
      SSRCs to mix or select the media it forwards, that will be
      classified as a point-to-point RTP session by the above rule.

   In the above cases, it is very reasonable to use RTCP reporting
   groups [MULTI-STREAM-OPT].  If that extension is used, an endpoint
   can indicate that the multitude of CNAMEs are in fact under a single
   endpoint or middlebox control by using only a single reporting group.

   The above rules will also classify some sessions where the endpoint
   is connected to an RTP mixer as being point to point.  For example,
   the mixer could act as gateway to an RTP session based on Any Source
   Multicast for the discussed endpoint.  However, this will, in most
   cases, be okay, as the RTP mixer provides separation between the two
   parts of the session.  The responsibility falls on the mixer to act
   accordingly in each domain.

   Finally, we note that signaling mechanisms could be defined to
   override the rules when they would result in the wrong
   classification.

(page 15 continued on part 2)