Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 5104

Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)

Pages: 64
Proposed Standard
Updated by:  77288082
Part 3 of 4 – Pages 32 to 52
First   Prev   Next

Top   ToC   RFC5104 - Page 32   prevText

4. RTCP Receiver Report Extensions

This memo specifies six new feedback messages. The Full Intra Request (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal- Spatial Trade-off Notification (TSTN), and Video Back Channel Message (VBCM) are "Payload Specific Feedback Messages" as defined in section 6.3 of AVPF [RFC4585]. The Temporary Maximum Media Stream Bit Rate Request (TMMBR) and Temporary Maximum Media Stream Bit Rate Notification (TMMBN) are "Transport Layer Feedback Messages" as defined in section 6.2 of AVPF. The new feedback messages are defined in the following subsections, following a similar structure to that in sections 6.2 and 6.3 of the AVPF specification [RFC4585].

4.1. Design Principles of the Extension Mechanism

RTCP was originally introduced as a channel to convey presence, reception quality statistics and hints on the desired media coding. A limited set of media control mechanisms was introduced in early RTP payload formats for video formats, for example, in RFC 2032 [RFC2032] (which was obsoleted by RFC 4587 [RFC4587]). However, this specification, for the first time, suggests a two-way handshake for some of its messages. There is danger that this introduction could be misunderstood as a precedent for the use of RTCP as an RTP session control protocol. To prevent such a misunderstanding, this subsection attempts to clarify the scope of the extensions specified in this memo, and it strongly suggests that future extensions follow the rationale spelled out here, or compellingly explain why they divert from the rationale. In this memo, and in AVPF [RFC4585], only such messages have been included as: a) have comparatively strict real-time constraints, which prevent the use of mechanisms such as a SIP re-invite in most application scenarios (the real-time constraints are explained separately for each message where necessary); b) are multicast-safe in that the reaction to potentially contradicting feedback messages is specified, as necessary for each message; and c) are directly related to activities of a certain media codec, class of media codecs (e.g., video codecs), or a given RTP packet stream.
Top   ToC   RFC5104 - Page 33
   In this memo, a two-way handshake is introduced only for messages for
   which:

   a) a notification or acknowledgement is required due to their nature.
      An analysis to determine whether this requirement exists has been
      performed separately for each message.

   b) the notification or acknowledgement cannot be easily derived from
      the media bit stream.

   All messages in AVPF [RFC4585] and in this memo present their
   contents in a simple, fixed binary format.  This accommodates media
   receivers that have not implemented higher control protocol
   functionalities (SDP, XML parsers, and such) in their media path.

   Messages that do not conform to the design principles just described
   are not an appropriate use of RTCP or of the Codec Control Framework
   defined in this document.

4.2. Transport Layer Feedback Messages

As specified in section 6.1 of RFC 4585 [RFC4585], transport layer feedback messages are identified by the RTCP packet type value RTPFB (205). In AVPF, one message of this category had been defined. This memo specifies two more such messages. They are identified by means of the feedback message type (FMT) parameter as follows: Assigned in AVPF [RFC4585]: 1: Generic NACK 31: reserved for future expansion of the identifier number space Assigned in this memo: 2: reserved (see note below) 3: Temporary Maximum Media Stream Bit Rate Request (TMMBR) 4: Temporary Maximum Media Stream Bit Rate Notification (TMMBN) Note: early versions of AVPF [RFC4585] reserved FMT=2 for a code point that has later been removed. It has been pointed out that there may be implementations in the field using this value in accordance with the expired document. As there is sufficient numbering space available, we mark FMT=2 as reserved so to avoid possible interoperability problems with any such early implementations.
Top   ToC   RFC5104 - Page 34
   Available for assignment:

      0:    unassigned
      5-30: unassigned

   The following subsection defines the formats of the Feedback Control
   Information (FCI) entries for the TMMBR and TMMBN messages,
   respectively, and specifies the associated behaviour at the media
   sender and receiver.

4.2.1. Temporary Maximum Media Stream Bit Rate Request (TMMBR)

The Temporary Maximum Media Stream Bit Rate Request is identified by RTCP packet type value PT=RTPFB and FMT=3. The FCI field of a Temporary Maximum Media Stream Bit Rate Request (TMMBR) message SHALL contain one or more FCI entries.
4.2.1.1. Message Format
The Feedback Control Information (FCI) consists of one or more TMMBR FCI entries with the following syntax: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MxTBR Exp | MxTBR Mantissa |Measured Overhead| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 - Syntax of an FCI Entry in the TMMBR Message SSRC (32 bits): The SSRC value of the media sender that is requested to obey the new maximum bit rate. MxTBR Exp (6 bits): The exponential scaling of the mantissa for the maximum total media bit rate value. The value is an unsigned integer [0..63]. MxTBR Mantissa (17 bits): The mantissa of the maximum total media bit rate value as an unsigned integer. Measured Overhead (9 bits): The measured average packet overhead value in bytes. The measurement SHALL be done according to the description in section 4.2.1.2. The value is an unsigned integer [0..511].
Top   ToC   RFC5104 - Page 35
   The maximum total media bit rate (MxTBR) value in bits per second is
   calculated from the MxTBR exponent (exp) and mantissa in the
   following way:

      MxTBR = mantissa * 2^exp

   This allows for 17 bits of resolution in the range 0 to 131072*2^63
   (approximately 1.2*10^24).

   The length of the TMMBR feedback message SHALL be set to 2+2*N where
   N is the number of TMMBR FCI entries.

4.2.1.2. Semantics
Behaviour at the Media Receiver (Sender of the TMMBR) TMMBR is used to indicate a transport-related limitation at the reporting entity acting as a media receiver. TMMBR has the form of a tuple containing two components. The first value is the highest bit rate per sender of a media stream, available at a receiver-chosen protocol layer, which the receiver currently supports in this RTP session. The second value is the measured header overhead in bytes as defined in section 2.2 and measured at the chosen protocol layer in the packets received for the stream. The measurement of the overhead is a running average that is updated for each packet received for this particular media source (SSRC), using the following formula: avg_OH (new) = 15/16*avg_OH (old) + 1/16*pckt_OH, where avg_OH is the running (exponentially smoothed) average and pckt_OH is the overhead observed in the latest packet. If a maximum bit rate has been negotiated through signaling, the maximum total media bit rate that the receiver reports in a TMMBR message MUST NOT exceed the negotiated value converted to a common basis (i.e., with overheads adjusted to bring it to the same reference protocol layer). Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source" is not used and SHALL be set to 0. Within a particular TMMBR FCI entry, the "SSRC of media source" in the FCI field denotes the media sender that the tuple applies to. This is useful in the multicast or translator topologies where the reporting entity may address all of the media senders in a single TMMBR message using multiple FCI entries.
Top   ToC   RFC5104 - Page 36
   The media receiver SHALL save the contents of the latest TMMBN
   message received from each media sender.

   The media receiver MAY send a TMMBR FCI entry to a particular media
   sender under the following circumstances:

     o   before any TMMBN message has been received from that media
         sender;

     o   when the media receiver has been identified as the source of a
         bounding tuple within the latest TMMBN message received from
         that media sender, and the value of the maximum total media bit
         rate or the overhead relating to that media sender has changed;

     o   when the media receiver has not been identified as the source
         of a bounding tuple within the latest TMMBN message received
         from that media sender, and, after the media receiver applies
         the incremental algorithm from section 3.5.4.2 or a stricter
         equivalent, the media receiver's tuple relating to that media
         sender is determined to belong to the bounding set.

   A TMMBR FCI entry MAY be repeated in subsequent TMMBR messages if no
   Temporary Maximum Media Stream Bit Rate Notification (TMMBN) FCI has
   been received from the media sender at the time of transmission of
   the next RTCP packet.  The bit rate value of a TMMBR FCI entry MAY be
   changed from one TMMBR message to the next.  The overhead measurement
   SHALL be updated to the current value of avg_OH each time the entry
   is sent.

   If the value set by a TMMBR message is expected to be permanent, the
   TMMBR setting party SHOULD renegotiate the session parameters to
   reflect that using session setup signaling, e.g., a SIP re-invite.

   Behaviour at the Media Sender (Receiver of the TMMBR)

   When it receives a TMMBR message containing an FCI entry relating to
   it, the media sender SHALL use an initial or incremental algorithm as
   applicable to determine the bounding set of tuples based on the new
   information.  The algorithm used SHALL be at least as strict as the
   corresponding algorithm defined in section 3.5.4.2.  The media sender
   MAY accumulate TMMBRs over a small interval (relative to the RTCP
   sending interval) before making this calculation.

   Once it has determined the bounding set of tuples, the media sender
   MAY use any combination of packet rate and net media bit rate within
   the feasible region that these tuples describe to produce a lower
Top   ToC   RFC5104 - Page 37
   total media stream bit rate, as it may need to address a congestion
   situation or other limiting factors.  See section 5 (congestion
   control) for more discussion.

   If the media sender concludes that it can increase the maximum total
   media bit rate value, it SHALL wait before actually doing so, for a
   period long enough to allow a media receiver to respond to the TMMBN
   if it determines that its tuple belongs in the bounding set.  This
   delay period is estimated by the formula:

      2 * RTT + T_Dither_Max,

   where RTT is the longest round trip time known to the media sender
   and T_Dither_Max is defined in section 3.4 of [RFC4585].  Even in
   point-to-point sessions, a media sender MUST obey the aforementioned
   rule, as it is not guaranteed that a participant is able to determine
   correctly whether all the sources are co-located in a single node,
   and are coordinated.

   A TMMBN message SHALL be sent by the media sender at the earliest
   possible point in time, in response to any TMMBR messages received
   since the last sending of TMMBN.  The TMMBN message indicates the
   calculated set of bounding tuples and the owners of those tuples at
   the time of the transmission of the message.

   An SSRC may time out according to the default rules for RTP session
   participants, i.e., the media sender has not received any RTP or RTCP
   packets from the owner for the last five regular reporting intervals.
   An SSRC may also explicitly leave the session, with the participant
   indicating this through the transmission of an RTCP BYE packet or
   using an external signaling channel.  If the media sender determines
   that the owner of a tuple in the bounding set has left the session,
   the media sender SHALL transmit a new TMMBN containing the previously
   determined set of bounding tuples but with the tuple belonging to the
   departed owner removed.

   A media sender MAY proactively initiate the equivalent to a TMMBR
   message to itself, when it is aware that its transmission path is
   more restrictive than the current limitations.  As a result, a TMMBN
   indicating the media source itself as the owner of a tuple is being
   sent, thereby avoiding unnecessary TMMBR messages from other
   participants.  However, like any other participant, when the media
   sender becomes aware of changed limitations, it is required to change
   the tuple, and to send a corresponding TMMBN.
Top   ToC   RFC5104 - Page 38
   Discussion

   Due to the unreliable nature of transport of TMMBR and TMMBN, the
   above rules may lead to the sending of TMMBR messages that appear to
   disobey those rules.  Furthermore, in multicast scenarios it can
   happen that more than one "non-owning" session participant may
   determine, rightly or wrongly, that its tuple belongs in the bounding
   set.  This is not critical for a number of reasons:

   a) If a TMMBR message is lost in transmission, either the media
      sender sends a new TMMBN message in response to some other media
      receiver or it does not send a new TMMBN message at all.  In the
      first case, the media receiver applies the incremental algorithm
      and, if it determines that its tuple should be part of the
      bounding set, sends out another TMMBR.  In the second case, it
      repeats the sending of a TMMBR unconditionally.  Either way, the
      media sender eventually gets the information it needs.

   b) Similarly, if a TMMBN message gets lost, the media receiver that
      has sent the corresponding TMMBR does not receive the notification
      and is expected to re-send the request and trigger the
      transmission of another TMMBN.

   c) If multiple competing TMMBR messages are sent by different session
      participants, then the algorithm can be applied taking all of
      these messages into account, and the resulting TMMBN provides the
      participants with an updated view of how their tuples compare with
      the bounded set.

   d) If more than one session participant happens to send TMMBR
      messages at the same time and with the same tuple component
      values, it does not matter which of those tuples is taken into the
      bounding set.  The losing session participant will determine,
      after applying the algorithm, that its tuple does not enter the
      bounding set, and will therefore stop sending its TMMBR.

   It is important to consider the security risks involved with faked
   TMMBRs.  See the security considerations in section 6.

   As indicated already, the feedback messages may be used in both
   multicast and unicast sessions in any of the specified topologies.
   However, for sessions with a large number of participants, using the
   lowest common denominator, as required by this mechanism, may not be
   the most suitable course of action.  Large sessions may need to
   consider other ways to adapt the bit rate to participants'
   capabilities, such as partitioning the session into different quality
   tiers or using some other method of achieving bit rate scalability.
Top   ToC   RFC5104 - Page 39
4.2.1.3. Timing Rules
The first transmission of the TMMBR message MAY use early or immediate feedback in cases when timeliness is desirable. Any repetition of a request message SHOULD use regular RTCP mode for its transmission timing.
4.2.1.4. Handling in Translators and Mixers
Media translators and mixers will need to receive and respond to TMMBR messages as they are part of the chain that provides a certain media stream to the receiver. The mixer or translator may act locally on the TMMBR and thus generate a TMMBN to indicate that it has done so. Alternatively, in the case of a media translator it can forward the request, or in the case of a mixer generate one of its own and pass it forward. In the latter case, the mixer will need to send a TMMBN back to the original requestor to indicate that it is handling the request.

4.2.2. Temporary Maximum Media Stream Bit Rate Notification (TMMBN)

The Temporary Maximum Media Stream Bit Rate Notification is identified by RTCP packet type value PT=RTPFB and FMT=4. The FCI field of the TMMBN feedback message may contain zero, one, or more TMMBN FCI entries.
4.2.2.1. Message Format
The Feedback Control Information (FCI) consists of zero, one, or more TMMBN FCI entries with the following syntax: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MxTBR Exp | MxTBR Mantissa |Measured Overhead| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3 - Syntax of an FCI Entry in the TMMBN Message SSRC (32 bits): The SSRC value of the "owner" of this tuple. MxTBR Exp (6 bits): The exponential scaling of the mantissa for the maximum total media bit rate value. The value is an unsigned integer [0..63].
Top   ToC   RFC5104 - Page 40
     MxTBR Mantissa (17 bits): The mantissa of the maximum total media
              bit rate value as an unsigned integer.

     Measured Overhead (9 bits): The measured average packet overhead
              value in bytes represented as an unsigned integer
              [0..511].

   Thus, the FCI within the TMMBN message contains entries indicating
   the bounding tuples.  For each tuple, the entry gives the owner by
   the SSRC, followed by the applicable maximum total media bit rate and
   overhead value.

   The length of the TMMBN message SHALL be set to 2+2*N where N is the
   number of TMMBN FCI entries.

4.2.2.2. Semantics
This feedback message is used to notify the senders of any TMMBR message that one or more TMMBR messages have been received or that an owner has left the session. It indicates to all participants the current set of bounding tuples and the "owners" of those tuples. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the notification. The "SSRC of media source" is not used and SHALL be set to 0. A TMMBN message SHALL be scheduled for transmission after the reception of a TMMBR message with an FCI entry identifying this media sender. Only a single TMMBN SHALL be sent, even if more than one TMMBR message is received between the scheduling of the transmission and the actual transmission of the TMMBN message. The TMMBN message indicates the bounding tuples and their owners at the time of transmitting the message. The bounding tuples included SHALL be the set arrived at through application of the applicable algorithm of section 3.5.4.2 or an equivalent, applied to the previous bounding set, if any, and tuples received in TMMBR messages since the last TMMBN was transmitted. The reception of a TMMBR message SHALL still result in the transmission of a TMMBN message even if, after application of the algorithm, the newly reported TMMBR tuple is not accepted into the bounding set. In such a case, the bounding tuples and their owners are not changed, unless the TMMBR was from an owner of a tuple within the previously calculated bounding set. This procedure allows session participants that did not see the last TMMBN message to get a correct view of this media sender's state.
Top   ToC   RFC5104 - Page 41
   As indicated in section 4.2.1.2, when a media sender determines that
   an "owner" of a bounding tuple has left the session, then that tuple
   is removed from the bounding set, and the media sender SHALL send a
   TMMBN message indicating the remaining bounding tuples.  If there are
   no remaining bounding tuples, a TMMBN without any FCI SHALL be sent
   to indicate this.  Without a remaining bounding tuple, the maximum
   media bit rate and maximum packet rate negotiated in session
   signaling, if any, apply.

     Note: if any media receivers remain in the session, this last will
     be a temporary situation.  The empty TMMBN will cause every
     remaining media receiver to determine that its limitation belongs
     in the bounding set and send a TMMBR in consequence.

   In unicast scenarios (i.e., where a single sender talks to a single
   receiver), the aforementioned algorithm to determine ownership
   degenerates to the media receiver becoming the "owner" of the one
   bounding tuple as soon as the media receiver has issued the first
   TMMBR message.

4.2.2.3. Timing Rules
The TMMBN acknowledgement SHOULD be sent as soon as allowed by the applied timing rules for the session. Immediate or early feedback mode SHOULD be used for these messages.
4.2.2.4. Handling by Translators and Mixers
As discussed in section 4.2.1.4, mixers or translators may need to issue TMMBN messages as responses to TMMBR messages for SSRCs handled by them.

4.3. Payload-Specific Feedback Messages

As specified by section 6.1 of RFC 4585 [RFC4585], Payload-Specific FB messages are identified by the RTCP packet type value PSFB (206). AVPF [RFC4585] defines three payload-specific feedback messages and one application layer feedback message. This memo specifies four additional payload-specific feedback messages. All are identified by means of the FMT parameter as follows:
Top   ToC   RFC5104 - Page 42
   Assigned in [RFC4585]:

     1:     Picture Loss Indication (PLI)
     2:     Slice Lost Indication (SLI)
     3:     Reference Picture Selection Indication (RPSI)
     15:    Application layer FB message
     31:    reserved for future expansion of the number space

   Assigned in this memo:

     4:     Full Intra Request (FIR) Command
     5:     Temporal-Spatial Trade-off Request (TSTR)
     6:     Temporal-Spatial Trade-off Notification (TSTN)
     7:     Video Back Channel Message (VBCM)

   Unassigned:

         0: unassigned
      8-14: unassigned
     16-30: unassigned

   The following subsections define the new FCI formats for the
   payload-specific feedback messages.

4.3.1. Full Intra Request (FIR)

The FIR message is identified by RTCP packet type value PT=PSFB and FMT=4. The FCI field MUST contain one or more FIR entries. Each entry applies to a different media sender, identified by its SSRC.
4.3.1.1. Message Format
The Feedback Control Information (FCI) for the Full Intra Request consists of one or more FCI entries, the content of which is depicted in Figure 4. The length of the FIR feedback message MUST be set to 2+2*N, where N is the number of FCI entries. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4 - Syntax of an FCI Entry in the FIR Message
Top   ToC   RFC5104 - Page 43
     SSRC (32 bits): The SSRC value of the media sender that is
              requested to send a decoder refresh point.

     Seq nr. (8 bits): Command sequence number.  The sequence number
              space is unique for each pairing of the SSRC of command
              source and the SSRC of the command target.  The sequence
              number SHALL be increased by 1 modulo 256 for each new
              command.  A repetition SHALL NOT increase the sequence
              number.  The initial value is arbitrary.

     Reserved (24 bits): All bits SHALL be set to 0 by the sender and
              SHALL be ignored on reception.

   The semantics of this feedback message is independent of the RTP
   payload type.

4.3.1.2. Semantics
Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source" is not used and SHALL be set to 0. The SSRCs of the media senders to which the FIR command applies are in the corresponding FCI entries. A FIR message MAY contain requests to multiple media senders, using one FCI entry per target media sender. Upon reception of FIR, the encoder MUST send a decoder refresh point (see section 2.2) as soon as possible. The sender MUST consider congestion control as outlined in section 5, which MAY restrict its ability to send a decoder refresh point quickly. FIR SHALL NOT be sent as a reaction to picture losses -- it is RECOMMENDED to use PLI [RFC4585] instead. FIR SHOULD be used only in situations where not sending a decoder refresh point would render the video unusable for the users. A typical example where sending FIR is appropriate is when, in a multipoint conference, a new user joins the session and no regular decoder refresh point interval is established. Another example would be a video switching MCU that changes streams. Here, normally, the MCU issues a FIR to the new sender so to force it to emit a decoder refresh point. The decoder refresh point normally includes a Freeze Picture Release (defined outside this specification), which re-starts the rendering process of the receivers. Both techniques mentioned are commonly used in MCU-based multipoint conferences.
Top   ToC   RFC5104 - Page 44
   Other RTP payload specifications such as RFC 2032 [RFC2032] already
   define a feedback mechanism for certain codecs.  An application
   supporting both schemes MUST use the feedback mechanism defined in
   this specification when sending feedback.  For backward-compatibility
   reasons, such an application SHOULD also be capable of receiving and
   reacting to the feedback scheme defined in the respective RTP payload
   format, if this is required by that payload format.

4.3.1.3. Timing Rules
The timing follows the rules outlined in section 3 of [RFC4585]. FIR commands MAY be used with early or immediate feedback. The FIR feedback message MAY be repeated. If using immediate feedback mode, the repetition SHOULD wait at least one RTT before being sent. In early or regular RTCP mode, the repetition is sent in the next regular RTCP packet.
4.3.1.4. Handling of FIR Message in Mixers and Translators
A media translator or a mixer performing media encoding of the content for which the session participant has issued a FIR is responsible for acting upon it. A mixer acting upon a FIR SHOULD NOT forward the message unaltered; instead, it SHOULD issue a FIR itself.
4.3.1.5. Remarks
Currently, video appears to be the only useful application for FIR, as it appears to be the only RTP payload widely deployed that relies heavily on media prediction across RTP packet boundaries. However, use of FIR could also reasonably be envisioned for other media types that share essential properties with compressed video, namely, cross-frame prediction (whatever a frame may be for that media type). One possible example may be the dynamic updates of MPEG-4 scene descriptions. It is suggested that payload formats for such media types refer to FIR and other message types defined in this specification and in AVPF [RFC4585], instead of creating similar mechanisms in the payload specifications. The payload specifications may have to explain how the payload-specific terminologies map to the video-centric terminology used herein. In conjunction with video codecs, FIR messages typically trigger the sending of full intra or IDR pictures. Both are several times larger than predicted (inter) pictures. Their size is independent of the time they are generated. In most environments, especially when employing bandwidth-limited links, the use of an intra picture implies an allowed delay that is a significant multiple of the typical frame duration. An example: if the sending frame rate is 10 fps, and an intra picture is assumed to be 10 times as big as an
Top   ToC   RFC5104 - Page 45
   inter picture, then a full second of latency has to be accepted.  In
   such an environment, there is no need for a particularly short delay
   in sending the FIR message.  Hence, waiting for the next possible
   time slot allowed by RTCP timing rules as per [RFC4585] should not
   have an overly negative impact on the system performance.

   Mandating a maximum delay for completing the sending of a decoder
   refresh point would be desirable from an application viewpoint, but
   is problematic from a congestion control point of view.  "As soon as
   possible" as mentioned above appears to be a reasonable compromise.

   In environments where the sender has no control over the codec (e.g.,
   when streaming pre-recorded and pre-coded content), the reaction to
   this command cannot be specified.  One suitable reaction of a sender
   would be to skip forward in the video bit stream to the next decoder
   refresh point.  In other scenarios, it may be preferable not to react
   to the command at all, e.g., when streaming to a large multicast
   group.  Other reactions may also be possible.  When deciding on a
   strategy, a sender could take into account factors such as the size
   of the receiving group, the "importance" of the sender of the FIR
   message (however "importance" may be defined in this specific
   application), the frequency of decoder refresh points in the content,
   and so on.  However, a session that predominantly handles pre-coded
   content is not expected to use FIR at all.

   The relationship between the Picture Loss Indication and FIR is as
   follows.  As discussed in section 6.3.1 of AVPF [RFC4585], a Picture
   Loss Indication informs the decoder about the loss of a picture and
   hence the likelihood of misalignment of the reference pictures
   between the encoder and decoder.  Such a scenario is normally related
   to losses in an ongoing connection.  In point-to-point scenarios, and
   without the presence of advanced error resilience tools, one possible
   option for an encoder consists in sending a decoder refresh point.
   However, there are other options.  One example is that the media
   sender ignores the PLI, because the embedded stream redundancy is
   likely to clean up the reproduced picture within a reasonable amount
   of time.  The FIR, in contrast, leaves a (real-time) encoder no
   choice but to send a decoder refresh point.  It does not allow the
   encoder to take into account any considerations such as the ones
   mentioned above.

4.3.2. Temporal-Spatial Trade-off Request (TSTR)

The TSTR feedback message is identified by RTCP packet type value PT=PSFB and FMT=5. The FCI field MUST contain one or more TSTR FCI entries.
Top   ToC   RFC5104 - Page 46
4.3.2.1. Message Format
The content of the FCI entry for the Temporal-Spatial Trade-off Request is depicted in Figure 5. The length of the feedback message MUST be set to 2+2*N, where N is the number of FCI entries included. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | Reserved | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5 - Syntax of an FCI Entry in the TSTR Message SSRC (32 bits): The SSRC of the media sender that is requested to apply the trade-off value given in Index. Seq nr. (8 bits): Request sequence number. The sequence number space is unique for pairing of the SSRC of request source and the SSRC of the request target. The sequence number SHALL be increased by 1 modulo 256 for each new command. A repetition SHALL NOT increase the sequence number. The initial value is arbitrary. Reserved (19 bits): All bits SHALL be set to 0 by the sender and SHALL be ignored on reception. Index (5 bits): An integer value between 0 and 31 that indicates the relative trade-off that is requested. An index value of 0 indicates the highest possible spatial quality, while 31 indicates the highest possible temporal resolution.
4.3.2.2. Semantics
A decoder can suggest a temporal-spatial trade-off level by sending a TSTR message to an encoder. If the encoder is capable of adjusting its temporal-spatial trade-off, it SHOULD take into account the received TSTR message for future coding of pictures. A value of 0 suggests a high spatial quality and a value of 31 suggests a high frame rate. The progression of values from 0 to 31 indicates monotonically a desire for higher frame rate. The index values do not correspond to precise values of spatial quality or frame rate.
Top   ToC   RFC5104 - Page 47
   The reaction to the reception of more than one TSTR message by a
   media sender from different media receivers is left open to the
   implementation.  The selected trade-off SHALL be communicated to the
   media receivers by means of the TSTN message.

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the request, and the "SSRC of media source"
   is not used and SHALL be set to 0.  The SSRCs of the media senders to
   which the TSTR applies are in the corresponding FCI entries.

   A TSTR message MAY contain requests to multiple media senders, using
   one FCI entry per target media sender.

4.3.2.3. Timing Rules
The timing follows the rules outlined in section 3 of [RFC4585]. This request message is not time critical and SHOULD be sent using regular RTCP timing. Only if it is known that the user interface requires quick feedback, the message MAY be sent with early or immediate feedback timing.
4.3.2.4. Handling of Message in Mixers and Translators
A mixer or media translator that encodes content sent to the session participant issuing the TSTR SHALL consider the request to determine if it can fulfill it by changing its own encoding parameters. A media translator unable to fulfill the request MAY forward the request unaltered towards the media sender. A mixer encoding for multiple session participants will need to consider the joint needs of these participants before generating a TSTR on its own behalf towards the media sender. See also the discussion in section 3.5.2.
4.3.2.5. Remarks
The term "spatial quality" does not necessarily refer to the resolution as measured by the number of pixels the reconstructed video is using. In fact, in most scenarios the video resolution stays constant during the lifetime of a session. However, all video compression standards have means to adjust the spatial quality at a given resolution, often influenced by the Quantizer Parameter or QP. A numerically low QP results in a good reconstructed picture quality, whereas a numerically high QP yields a coarse picture. The typical reaction of an encoder to this request is to change its rate control parameters to use a lower frame rate and a numerically lower (on average) QP, or vice versa. The precise mapping of Index value to
Top   ToC   RFC5104 - Page 48
   frame rate and QP is intentionally left open here, as it depends on
   factors such as the compression standard employed, spatial
   resolution, content, bit rate, and so on.

4.3.3. Temporal-Spatial Trade-off Notification (TSTN)

The TSTN message is identified by RTCP packet type value PT=PSFB and FMT=6. The FCI field SHALL contain one or more TSTN FCI entries.
4.3.3.1. Message Format
The content of an FCI entry for the Temporal-Spatial Trade-off Notification is depicted in Figure 6. The length of the TSTN message MUST be set to 2+2*N, where N is the number of FCI entries. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | Reserved | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6 - Syntax of the TSTN SSRC (32 bits): The SSRC of the source of the TSTR that resulted in this Notification. Seq nr. (8 bits): The sequence number value from the TSTR that is being acknowledged. Reserved (19 bits): All bits SHALL be set to 0 by the sender and SHALL be ignored on reception. Index (5 bits): The trade-off value the media sender is using henceforth. Informative note: The returned trade-off value (Index) may differ from the requested one, for example, in cases where a media encoder cannot tune its trade-off, or when pre-recorded content is used.
Top   ToC   RFC5104 - Page 49
4.3.3.2. Semantics
This feedback message is used to acknowledge the reception of a TSTR. For each TSTR received targeted at the session participant, a TSTN FCI entry SHALL be sent in a TSTN feedback message. A single TSTN message MAY acknowledge multiple requests using multiple FCI entries. The index value included SHALL be the same in all FCI entries of the TSTN message. Including a FCI for each requestor allows each requesting entity to determine that the media sender received the request. The Notification SHALL also be sent in response to TSTR repetitions received. If the request receiver has received TSTR with several different sequence numbers from a single requestor, it SHALL only respond to the request with the highest (modulo 256) sequence number. Note that the highest sequence number may be a smaller integer value due to the wrapping of the field. Appendix A.1 of [RFC3550] has an algorithm for keeping track of the highest received sequence number for RTP packets; it could be adapted for this usage. The TSTN SHALL include the Temporal-Spatial Trade-off index that will be used as a result of the request. This is not necessarily the same index as requested, as the media sender may need to aggregate requests from several requesting session participants. It may also have some other policies or rules that limit the selection. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the Notification, and the "SSRC of media source" is not used and SHALL be set to 0. The SSRCs of the requesting entities to which the Notification applies are in the corresponding FCI entries.
4.3.3.3. Timing Rules
The timing follows the rules outlined in section 3 of [RFC4585]. This acknowledgement message is not extremely time critical and SHOULD be sent using regular RTCP timing.
4.3.3.4. Handling of TSTN in Mixers and Translators
A mixer or translator that acts upon a TSTR SHALL also send the corresponding TSTN. In cases where it needs to forward a TSTR itself, the notification message MAY need to be delayed until the TSTR has been responded to.
4.3.3.5. Remarks
None.
Top   ToC   RFC5104 - Page 50

4.3.4. H.271 Video Back Channel Message (VBCM)

The VBCM is identified by RTCP packet type value PT=PSFB and FMT=7. The FCI field MUST contain one or more VBCM FCI entries.
4.3.4.1. Message Format
The syntax of an FCI entry within the VBCM indication is depicted in Figure 7. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. |0| Payload Type| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VBCM Octet String.... | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7 - Syntax of an FCI Entry in the VBCM SSRC (32 bits): The SSRC value of the media sender that is requested to instruct its encoder to react to the VBCM. Seq nr. (8 bits): Command sequence number. The sequence number space is unique for pairing of the SSRC of the command source and the SSRC of the command target. The sequence number SHALL be increased by 1 modulo 256 for each new command. A repetition SHALL NOT increase the sequence number. The initial value is arbitrary. 0: Must be set to 0 by the sender and should not be acted upon by the message receiver. Payload Type (7 bits): The RTP payload type for which the VBCM bit stream must be interpreted. Length (16 bits): The length of the VBCM octet string in octets exclusive of any padding octets. VBCM Octet String (variable length): This is the octet string generated by the decoder carrying a specific feedback sub- message. Padding (variable length): Bits set to 0 to make up a 32-bit boundary.
Top   ToC   RFC5104 - Page 51
4.3.4.2. Semantics
The "payload" of the VBCM indication carries different types of codec-specific, feedback information. The type of feedback information can be classified as a 'status report' (such as an indication that a bit stream was received without errors, or that a partial or complete picture or block was lost) or 'update requests' (such as complete refresh of the bit stream). Note: There are possible overlaps between the VBCM sub- messages and CCM/AVPF feedback messages, such as FIR. Please see section 3.5.3 for further discussion. The different types of feedback sub-messages carried in the VBCM are indicated by the "payloadType" as defined in [H.271]. These sub- message types are reproduced below for convenience. "payloadType", in ITU-T Rec. H.271 terminology, refers to the sub-type of the H.271 message and should not be confused with an RTP payload type. Payload Message Content Type --------------------------------------------------------------------- 0 One or more pictures without detected bit stream error mismatch 1 One or more pictures that are entirely or partially lost 2 A set of blocks of one picture that is entirely or partially lost 3 CRC for one parameter set 4 CRC for all parameter sets of a certain type 5 A "reset" request indicating that the sender should completely refresh the video bit stream as if no prior bit stream data had been received > 5 Reserved for future use by ITU-T Table 2: H.271 message types ("payloadTypes") The bit string or the "payload" of a VBCM is of variable length and is self-contained and coded in a variable-length, binary format. The media sender necessarily has to be able to parse this optimized binary format to make use of VBCMs. Each of the different types of sub-messages (indicated by payloadType) may have different semantics depending on the codec used. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source"
Top   ToC   RFC5104 - Page 52
   is not used and SHALL be set to 0.  The SSRCs of the media senders to
   which the VBCM applies are in the corresponding FCI entries.  The
   sender of the VBCM MAY send H.271 messages to multiple media senders
   and MAY send more than one H.271 message to the same media sender
   within the same VBCM.

4.3.4.3. Timing Rules
The timing follows the rules outlined in section 3 of [RFC4585]. The different sub-message types may have different properties in regards to the timing of messages that should be used. If several different types are included in the same feedback packet, then the requirements for the sub-message type with the most stringent requirements should be followed.
4.3.4.4. Handling of Message in Mixers or Translators
The handling of a VBCM in a mixer or translator is sub-message type dependent.
4.3.4.5. Remarks
Please see section 3.5.3 for a discussion of the usage of H.271 messages and messages defined in AVPF [RFC4585] and this memo with similar functionality. Note: There has been some discussion whether the RTP payload type field in this message is needed. It will be needed if there is potentially more than one VBCM-capable RTP payload type in the same session, and the semantics of a given VBCM changes between payload types. For example, the picture identification mechanism in messages of H.271 type 0 is fundamentally different between H.263 and H.264 (although both use the same syntax). Therefore, the payload field is justified here. There was a further comment that for TSTR and FIR such a need does not exist, because the semantics of TSTR and FIR are either loosely enough defined, or generic enough, to apply to all video payloads currently in existence/envisioned.


(page 52 continued on part 4)

Next Section