RFC 5104

Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)

Pages: 64
Proposed Standard
Updated by: 7728 8082

Part 3 of 4 – Pages 32 to 52

RFC5104 - Page 32 prevText

4.  RTCP Receiver Report Extensions

   This memo specifies six new feedback messages.  The Full Intra
   Request (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal-
   Spatial Trade-off Notification (TSTN), and Video Back Channel Message
   (VBCM) are "Payload Specific Feedback Messages" as defined in section
   6.3 of AVPF [RFC4585].  The Temporary Maximum Media Stream Bit Rate
   Request (TMMBR) and Temporary Maximum Media Stream Bit Rate
   Notification (TMMBN) are "Transport Layer Feedback Messages" as
   defined in section 6.2 of AVPF.

   The new feedback messages are defined in the following subsections,
   following a similar structure to that in sections 6.2 and 6.3 of the
   AVPF specification [RFC4585].

4.1.  Design Principles of the Extension Mechanism

   RTCP was originally introduced as a channel to convey presence,
   reception quality statistics and hints on the desired media coding.
   A limited set of media control mechanisms was introduced in early RTP
   payload formats for video formats, for example, in RFC 2032 [RFC2032]
   (which was obsoleted by RFC 4587 [RFC4587]).  However, this
   specification, for the first time, suggests a two-way handshake for
   some of its messages.  There is danger that this introduction could
   be misunderstood as a precedent for the use of RTCP as an RTP session
   control protocol.  To prevent such a misunderstanding, this
   subsection attempts to clarify the scope of the extensions specified
   in this memo, and it strongly suggests that future extensions follow
   the rationale spelled out here, or compellingly explain why they
   divert from the rationale.

   In this memo, and in AVPF [RFC4585], only such messages have been
   included as:

   a) have comparatively strict real-time constraints, which prevent the
      use of mechanisms such as a SIP re-invite in most application
      scenarios (the real-time constraints are explained separately for
      each message where necessary);

   b) are multicast-safe in that the reaction to potentially
      contradicting feedback messages is specified, as necessary for
      each message; and

   c) are directly related to activities of a certain media codec, class
      of media codecs (e.g., video codecs), or a given RTP packet
      stream.

RFC5104 - Page 33

   In this memo, a two-way handshake is introduced only for messages for
   which:

   a) a notification or acknowledgement is required due to their nature.
      An analysis to determine whether this requirement exists has been
      performed separately for each message.

   b) the notification or acknowledgement cannot be easily derived from
      the media bit stream.

   All messages in AVPF [RFC4585] and in this memo present their
   contents in a simple, fixed binary format.  This accommodates media
   receivers that have not implemented higher control protocol
   functionalities (SDP, XML parsers, and such) in their media path.

   Messages that do not conform to the design principles just described
   are not an appropriate use of RTCP or of the Codec Control Framework
   defined in this document.

4.2.  Transport Layer Feedback Messages

   As specified in section 6.1 of RFC 4585 [RFC4585], transport layer
   feedback messages are identified by the RTCP packet type value RTPFB
   (205).

   In AVPF, one message of this category had been defined.  This memo
   specifies two more such messages.  They are identified by means of
   the feedback message type (FMT) parameter as follows:

   Assigned in AVPF [RFC4585]:

      1:    Generic NACK
      31:   reserved for future expansion of the identifier number space

   Assigned in this memo:

      2:    reserved (see note below)
      3:    Temporary Maximum Media Stream Bit Rate Request (TMMBR)
      4:    Temporary Maximum Media Stream Bit Rate Notification (TMMBN)

          Note: early versions of AVPF [RFC4585] reserved FMT=2 for a
          code point that has later been removed.  It has been pointed
          out that there may be implementations in the field using this
          value in accordance with the expired document.  As there is
          sufficient numbering space available, we mark FMT=2 as
          reserved so to avoid possible interoperability problems with
          any such early implementations.

RFC5104 - Page 34

   Available for assignment:

      0:    unassigned
      5-30: unassigned

   The following subsection defines the formats of the Feedback Control
   Information (FCI) entries for the TMMBR and TMMBN messages,
   respectively, and specifies the associated behaviour at the media
   sender and receiver.

4.2.1.  Temporary Maximum Media Stream Bit Rate Request (TMMBR)

   The Temporary Maximum Media Stream Bit Rate Request is identified by
   RTCP packet type value PT=RTPFB and FMT=3.

   The FCI field of a Temporary Maximum Media Stream Bit Rate Request
   (TMMBR) message SHALL contain one or more FCI entries.

4.2.1.1.  Message Format

   The Feedback Control Information (FCI) consists of one or more TMMBR
   FCI entries with the following syntax:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              SSRC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | MxTBR Exp |  MxTBR Mantissa                 |Measured Overhead|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Figure 2 - Syntax of an FCI Entry in the TMMBR Message

     SSRC (32 bits): The SSRC value of the media sender that is
              requested to obey the new maximum bit rate.

     MxTBR Exp (6 bits): The exponential scaling of the mantissa for the
              maximum total media bit rate value.  The value is an
              unsigned integer [0..63].

     MxTBR Mantissa (17 bits): The mantissa of the maximum total media
              bit rate value as an unsigned integer.

     Measured Overhead (9 bits): The measured average packet overhead
              value in bytes.  The measurement SHALL be done according
              to the description in section 4.2.1.2. The value is an
              unsigned integer [0..511].

RFC5104 - Page 35

   The maximum total media bit rate (MxTBR) value in bits per second is
   calculated from the MxTBR exponent (exp) and mantissa in the
   following way:

      MxTBR = mantissa * 2^exp

   This allows for 17 bits of resolution in the range 0 to 131072*2^63
   (approximately 1.2*10^24).

   The length of the TMMBR feedback message SHALL be set to 2+2*N where
   N is the number of TMMBR FCI entries.

4.2.1.2.  Semantics

   Behaviour at the Media Receiver (Sender of the TMMBR)

   TMMBR is used to indicate a transport-related limitation at the
   reporting entity acting as a media receiver.  TMMBR has the form of a
   tuple containing two components.  The first value is the highest bit
   rate per sender of a media stream, available at a receiver-chosen
   protocol layer, which the receiver currently supports in this RTP
   session.  The second value is the measured header overhead in bytes
   as defined in section 2.2 and measured at the chosen protocol layer
   in the packets received for the stream.  The measurement of the
   overhead is a running average that is updated for each packet
   received for this particular media source (SSRC), using the following
   formula:

       avg_OH (new) = 15/16*avg_OH (old) + 1/16*pckt_OH,

   where avg_OH is the running (exponentially smoothed) average and
   pckt_OH is the overhead observed in the latest packet.

   If a maximum bit rate has been negotiated through signaling, the
   maximum total media bit rate that the receiver reports in a TMMBR
   message MUST NOT exceed the negotiated value converted to a common
   basis (i.e., with overheads adjusted to bring it to the same
   reference protocol layer).

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the request, and the "SSRC of media source"
   is not used and SHALL be set to 0.  Within a particular TMMBR FCI
   entry, the "SSRC of media source" in the FCI field denotes the media
   sender that the tuple applies to.  This is useful in the multicast or
   translator topologies where the reporting entity may address all of
   the media senders in a single TMMBR message using multiple FCI
   entries.

RFC5104 - Page 36

   The media receiver SHALL save the contents of the latest TMMBN
   message received from each media sender.

   The media receiver MAY send a TMMBR FCI entry to a particular media
   sender under the following circumstances:

     o   before any TMMBN message has been received from that media
         sender;

     o   when the media receiver has been identified as the source of a
         bounding tuple within the latest TMMBN message received from
         that media sender, and the value of the maximum total media bit
         rate or the overhead relating to that media sender has changed;

     o   when the media receiver has not been identified as the source
         of a bounding tuple within the latest TMMBN message received
         from that media sender, and, after the media receiver applies
         the incremental algorithm from section 3.5.4.2 or a stricter
         equivalent, the media receiver's tuple relating to that media
         sender is determined to belong to the bounding set.

   A TMMBR FCI entry MAY be repeated in subsequent TMMBR messages if no
   Temporary Maximum Media Stream Bit Rate Notification (TMMBN) FCI has
   been received from the media sender at the time of transmission of
   the next RTCP packet.  The bit rate value of a TMMBR FCI entry MAY be
   changed from one TMMBR message to the next.  The overhead measurement
   SHALL be updated to the current value of avg_OH each time the entry
   is sent.

   If the value set by a TMMBR message is expected to be permanent, the
   TMMBR setting party SHOULD renegotiate the session parameters to
   reflect that using session setup signaling, e.g., a SIP re-invite.

   Behaviour at the Media Sender (Receiver of the TMMBR)

   When it receives a TMMBR message containing an FCI entry relating to
   it, the media sender SHALL use an initial or incremental algorithm as
   applicable to determine the bounding set of tuples based on the new
   information.  The algorithm used SHALL be at least as strict as the
   corresponding algorithm defined in section 3.5.4.2.  The media sender
   MAY accumulate TMMBRs over a small interval (relative to the RTCP
   sending interval) before making this calculation.

   Once it has determined the bounding set of tuples, the media sender
   MAY use any combination of packet rate and net media bit rate within
   the feasible region that these tuples describe to produce a lower

RFC5104 - Page 37

   total media stream bit rate, as it may need to address a congestion
   situation or other limiting factors.  See section 5 (congestion
   control) for more discussion.

   If the media sender concludes that it can increase the maximum total
   media bit rate value, it SHALL wait before actually doing so, for a
   period long enough to allow a media receiver to respond to the TMMBN
   if it determines that its tuple belongs in the bounding set.  This
   delay period is estimated by the formula:

      2 * RTT + T_Dither_Max,

   where RTT is the longest round trip time known to the media sender
   and T_Dither_Max is defined in section 3.4 of [RFC4585].  Even in
   point-to-point sessions, a media sender MUST obey the aforementioned
   rule, as it is not guaranteed that a participant is able to determine
   correctly whether all the sources are co-located in a single node,
   and are coordinated.

   A TMMBN message SHALL be sent by the media sender at the earliest
   possible point in time, in response to any TMMBR messages received
   since the last sending of TMMBN.  The TMMBN message indicates the
   calculated set of bounding tuples and the owners of those tuples at
   the time of the transmission of the message.

   An SSRC may time out according to the default rules for RTP session
   participants, i.e., the media sender has not received any RTP or RTCP
   packets from the owner for the last five regular reporting intervals.
   An SSRC may also explicitly leave the session, with the participant
   indicating this through the transmission of an RTCP BYE packet or
   using an external signaling channel.  If the media sender determines
   that the owner of a tuple in the bounding set has left the session,
   the media sender SHALL transmit a new TMMBN containing the previously
   determined set of bounding tuples but with the tuple belonging to the
   departed owner removed.

   A media sender MAY proactively initiate the equivalent to a TMMBR
   message to itself, when it is aware that its transmission path is
   more restrictive than the current limitations.  As a result, a TMMBN
   indicating the media source itself as the owner of a tuple is being
   sent, thereby avoiding unnecessary TMMBR messages from other
   participants.  However, like any other participant, when the media
   sender becomes aware of changed limitations, it is required to change
   the tuple, and to send a corresponding TMMBN.

RFC5104 - Page 38

   Discussion

   Due to the unreliable nature of transport of TMMBR and TMMBN, the
   above rules may lead to the sending of TMMBR messages that appear to
   disobey those rules.  Furthermore, in multicast scenarios it can
   happen that more than one "non-owning" session participant may
   determine, rightly or wrongly, that its tuple belongs in the bounding
   set.  This is not critical for a number of reasons:

   a) If a TMMBR message is lost in transmission, either the media
      sender sends a new TMMBN message in response to some other media
      receiver or it does not send a new TMMBN message at all.  In the
      first case, the media receiver applies the incremental algorithm
      and, if it determines that its tuple should be part of the
      bounding set, sends out another TMMBR.  In the second case, it
      repeats the sending of a TMMBR unconditionally.  Either way, the
      media sender eventually gets the information it needs.

   b) Similarly, if a TMMBN message gets lost, the media receiver that
      has sent the corresponding TMMBR does not receive the notification
      and is expected to re-send the request and trigger the
      transmission of another TMMBN.

   c) If multiple competing TMMBR messages are sent by different session
      participants, then the algorithm can be applied taking all of
      these messages into account, and the resulting TMMBN provides the
      participants with an updated view of how their tuples compare with
      the bounded set.

   d) If more than one session participant happens to send TMMBR
      messages at the same time and with the same tuple component
      values, it does not matter which of those tuples is taken into the
      bounding set.  The losing session participant will determine,
      after applying the algorithm, that its tuple does not enter the
      bounding set, and will therefore stop sending its TMMBR.

   It is important to consider the security risks involved with faked
   TMMBRs.  See the security considerations in section 6.

   As indicated already, the feedback messages may be used in both
   multicast and unicast sessions in any of the specified topologies.
   However, for sessions with a large number of participants, using the
   lowest common denominator, as required by this mechanism, may not be
   the most suitable course of action.  Large sessions may need to
   consider other ways to adapt the bit rate to participants'
   capabilities, such as partitioning the session into different quality
   tiers or using some other method of achieving bit rate scalability.

RFC5104 - Page 39

4.2.1.3.  Timing Rules

   The first transmission of the TMMBR message MAY use early or
   immediate feedback in cases when timeliness is desirable.  Any
   repetition of a request message SHOULD use regular RTCP mode for its
   transmission timing.

4.2.1.4.  Handling in Translators and Mixers

   Media translators and mixers will need to receive and respond to
   TMMBR messages as they are part of the chain that provides a certain
   media stream to the receiver.  The mixer or translator may act
   locally on the TMMBR and thus generate a TMMBN to indicate that it
   has done so.  Alternatively, in the case of a media translator it can
   forward the request, or in the case of a mixer generate one of its
   own and pass it forward.  In the latter case, the mixer will need to
   send a TMMBN back to the original requestor to indicate that it is
   handling the request.

4.2.2.  Temporary Maximum Media Stream Bit Rate Notification (TMMBN)

   The Temporary Maximum Media Stream Bit Rate Notification is
   identified by RTCP packet type value PT=RTPFB and FMT=4.

   The FCI field of the TMMBN feedback message may contain zero, one, or
   more TMMBN FCI entries.

4.2.2.1.  Message Format

   The Feedback Control Information (FCI) consists of zero, one, or more
   TMMBN FCI entries with the following syntax:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              SSRC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | MxTBR Exp |  MxTBR Mantissa                 |Measured Overhead|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Figure 3 - Syntax of an FCI Entry in the TMMBN Message

     SSRC (32 bits): The SSRC value of the "owner" of this tuple.

     MxTBR Exp (6 bits): The exponential scaling of the mantissa for the
              maximum total media bit rate value.  The value is an
              unsigned integer [0..63].

RFC5104 - Page 40

     MxTBR Mantissa (17 bits): The mantissa of the maximum total media
              bit rate value as an unsigned integer.

     Measured Overhead (9 bits): The measured average packet overhead
              value in bytes represented as an unsigned integer
              [0..511].

   Thus, the FCI within the TMMBN message contains entries indicating
   the bounding tuples.  For each tuple, the entry gives the owner by
   the SSRC, followed by the applicable maximum total media bit rate and
   overhead value.

   The length of the TMMBN message SHALL be set to 2+2*N where N is the
   number of TMMBN FCI entries.

4.2.2.2.  Semantics

   This feedback message is used to notify the senders of any TMMBR
   message that one or more TMMBR messages have been received or that an
   owner has left the session.  It indicates to all participants the
   current set of bounding tuples and the "owners" of those tuples.

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the notification.  The "SSRC of media source"
   is not used and SHALL be set to 0.

   A TMMBN message SHALL be scheduled for transmission after the
   reception of a TMMBR message with an FCI entry identifying this media
   sender.  Only a single TMMBN SHALL be sent, even if more than one
   TMMBR message is received between the scheduling of the transmission
   and the actual transmission of the TMMBN message.  The TMMBN message
   indicates the bounding tuples and their owners at the time of
   transmitting the message.  The bounding tuples included SHALL be the
   set arrived at through application of the applicable algorithm of
   section 3.5.4.2 or an equivalent, applied to the previous bounding
   set, if any, and tuples received in TMMBR messages since the last
   TMMBN was transmitted.

   The reception of a TMMBR message SHALL still result in the
   transmission of a TMMBN message even if, after application of the
   algorithm, the newly reported TMMBR tuple is not accepted into the
   bounding set.  In such a case, the bounding tuples and their owners
   are not changed, unless the TMMBR was from an owner of a tuple within
   the previously calculated bounding set.  This procedure allows
   session participants that did not see the last TMMBN message to get a
   correct view of this media sender's state.

RFC5104 - Page 41

   As indicated in section 4.2.1.2, when a media sender determines that
   an "owner" of a bounding tuple has left the session, then that tuple
   is removed from the bounding set, and the media sender SHALL send a
   TMMBN message indicating the remaining bounding tuples.  If there are
   no remaining bounding tuples, a TMMBN without any FCI SHALL be sent
   to indicate this.  Without a remaining bounding tuple, the maximum
   media bit rate and maximum packet rate negotiated in session
   signaling, if any, apply.

     Note: if any media receivers remain in the session, this last will
     be a temporary situation.  The empty TMMBN will cause every
     remaining media receiver to determine that its limitation belongs
     in the bounding set and send a TMMBR in consequence.

   In unicast scenarios (i.e., where a single sender talks to a single
   receiver), the aforementioned algorithm to determine ownership
   degenerates to the media receiver becoming the "owner" of the one
   bounding tuple as soon as the media receiver has issued the first
   TMMBR message.

4.2.2.3.  Timing Rules

   The TMMBN acknowledgement SHOULD be sent as soon as allowed by the
   applied timing rules for the session.  Immediate or early feedback
   mode SHOULD be used for these messages.

4.2.2.4.  Handling by Translators and Mixers

   As discussed in section 4.2.1.4, mixers or translators may need to
   issue TMMBN messages as responses to TMMBR messages for SSRCs handled
   by them.

4.3.  Payload-Specific Feedback Messages

   As specified by section 6.1 of RFC 4585 [RFC4585], Payload-Specific
   FB messages are identified by the RTCP packet type value PSFB (206).

   AVPF [RFC4585] defines three payload-specific feedback messages and
   one application layer feedback message.  This memo specifies four
   additional payload-specific feedback messages.  All are identified by
   means of the FMT parameter as follows:

RFC5104 - Page 42

   Assigned in [RFC4585]:

     1:     Picture Loss Indication (PLI)
     2:     Slice Lost Indication (SLI)
     3:     Reference Picture Selection Indication (RPSI)
     15:    Application layer FB message
     31:    reserved for future expansion of the number space

   Assigned in this memo:

     4:     Full Intra Request (FIR) Command
     5:     Temporal-Spatial Trade-off Request (TSTR)
     6:     Temporal-Spatial Trade-off Notification (TSTN)
     7:     Video Back Channel Message (VBCM)

   Unassigned:

         0: unassigned
      8-14: unassigned
     16-30: unassigned

   The following subsections define the new FCI formats for the
   payload-specific feedback messages.

4.3.1.  Full Intra Request (FIR)

   The FIR message is identified by RTCP packet type value PT=PSFB and
   FMT=4.

   The FCI field MUST contain one or more FIR entries.  Each entry
   applies to a different media sender, identified by its SSRC.

4.3.1.1.  Message Format

   The Feedback Control Information (FCI) for the Full Intra Request
   consists of one or more FCI entries, the content of which is depicted
   in Figure 4.  The length of the FIR feedback message MUST be set to
   2+2*N, where N is the number of FCI entries.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              SSRC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Seq nr.       |    Reserved                                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         Figure 4 - Syntax of an FCI Entry in the FIR Message

RFC5104 - Page 43

     SSRC (32 bits): The SSRC value of the media sender that is
              requested to send a decoder refresh point.

     Seq nr. (8 bits): Command sequence number.  The sequence number
              space is unique for each pairing of the SSRC of command
              source and the SSRC of the command target.  The sequence
              number SHALL be increased by 1 modulo 256 for each new
              command.  A repetition SHALL NOT increase the sequence
              number.  The initial value is arbitrary.

     Reserved (24 bits): All bits SHALL be set to 0 by the sender and
              SHALL be ignored on reception.

   The semantics of this feedback message is independent of the RTP
   payload type.

4.3.1.2.  Semantics

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the request, and the "SSRC of media source"
   is not used and SHALL be set to 0.  The SSRCs of the media senders to
   which the FIR command applies are in the corresponding FCI entries.
   A FIR message MAY contain requests to multiple media senders, using
   one FCI entry per target media sender.

   Upon reception of FIR, the encoder MUST send a decoder refresh point
   (see section 2.2) as soon as possible.

   The sender MUST consider congestion control as outlined in section 5,
   which MAY restrict its ability to send a decoder refresh point
   quickly.

   FIR SHALL NOT be sent as a reaction to picture losses -- it is
   RECOMMENDED to use PLI [RFC4585] instead.  FIR SHOULD be used only in
   situations where not sending a decoder refresh point would render the
   video unusable for the users.

   A typical example where sending FIR is appropriate is when, in a
   multipoint conference, a new user joins the session and no regular
   decoder refresh point interval is established.  Another example would
   be a video switching MCU that changes streams.  Here, normally, the
   MCU issues a FIR to the new sender so to force it to emit a decoder
   refresh point.  The decoder refresh point normally includes a Freeze
   Picture Release (defined outside this specification), which re-starts
   the rendering process of the receivers.  Both techniques mentioned
   are commonly used in MCU-based multipoint conferences.

RFC5104 - Page 44

   Other RTP payload specifications such as RFC 2032 [RFC2032] already
   define a feedback mechanism for certain codecs.  An application
   supporting both schemes MUST use the feedback mechanism defined in
   this specification when sending feedback.  For backward-compatibility
   reasons, such an application SHOULD also be capable of receiving and
   reacting to the feedback scheme defined in the respective RTP payload
   format, if this is required by that payload format.

4.3.1.3.  Timing Rules

   The timing follows the rules outlined in section 3 of [RFC4585].  FIR
   commands MAY be used with early or immediate feedback.  The FIR
   feedback message MAY be repeated.  If using immediate feedback mode,
   the repetition SHOULD wait at least one RTT before being sent.  In
   early or regular RTCP mode, the repetition is sent in the next
   regular RTCP packet.

4.3.1.4.  Handling of FIR Message in Mixers and Translators

   A media translator or a mixer performing media encoding of the
   content for which the session participant has issued a FIR is
   responsible for acting upon it.  A mixer acting upon a FIR SHOULD NOT
   forward the message unaltered; instead, it SHOULD issue a FIR itself.

4.3.1.5. Remarks

   Currently, video appears to be the only useful application for FIR,
   as it appears to be the only RTP payload widely deployed that relies
   heavily on media prediction across RTP packet boundaries.  However,
   use of FIR could also reasonably be envisioned for other media types
   that share essential properties with compressed video, namely,
   cross-frame prediction (whatever a frame may be for that media type).
   One possible example may be the dynamic updates of MPEG-4 scene
   descriptions.  It is suggested that payload formats for such media
   types refer to FIR and other message types defined in this
   specification and in AVPF [RFC4585], instead of creating similar
   mechanisms in the payload specifications.  The payload specifications
   may have to explain how the payload-specific terminologies map to the
   video-centric terminology used herein.

   In conjunction with video codecs, FIR messages typically trigger the
   sending of full intra or IDR pictures.  Both are several times larger
   than predicted (inter) pictures.  Their size is independent of the
   time they are generated.  In most environments, especially when
   employing bandwidth-limited links, the use of an intra picture
   implies an allowed delay that is a significant multiple of the
   typical frame duration.  An example: if the sending frame rate is 10
   fps, and an intra picture is assumed to be 10 times as big as an

RFC5104 - Page 45

   inter picture, then a full second of latency has to be accepted.  In
   such an environment, there is no need for a particularly short delay
   in sending the FIR message.  Hence, waiting for the next possible
   time slot allowed by RTCP timing rules as per [RFC4585] should not
   have an overly negative impact on the system performance.

   Mandating a maximum delay for completing the sending of a decoder
   refresh point would be desirable from an application viewpoint, but
   is problematic from a congestion control point of view.  "As soon as
   possible" as mentioned above appears to be a reasonable compromise.

   In environments where the sender has no control over the codec (e.g.,
   when streaming pre-recorded and pre-coded content), the reaction to
   this command cannot be specified.  One suitable reaction of a sender
   would be to skip forward in the video bit stream to the next decoder
   refresh point.  In other scenarios, it may be preferable not to react
   to the command at all, e.g., when streaming to a large multicast
   group.  Other reactions may also be possible.  When deciding on a
   strategy, a sender could take into account factors such as the size
   of the receiving group, the "importance" of the sender of the FIR
   message (however "importance" may be defined in this specific
   application), the frequency of decoder refresh points in the content,
   and so on.  However, a session that predominantly handles pre-coded
   content is not expected to use FIR at all.

   The relationship between the Picture Loss Indication and FIR is as
   follows.  As discussed in section 6.3.1 of AVPF [RFC4585], a Picture
   Loss Indication informs the decoder about the loss of a picture and
   hence the likelihood of misalignment of the reference pictures
   between the encoder and decoder.  Such a scenario is normally related
   to losses in an ongoing connection.  In point-to-point scenarios, and
   without the presence of advanced error resilience tools, one possible
   option for an encoder consists in sending a decoder refresh point.
   However, there are other options.  One example is that the media
   sender ignores the PLI, because the embedded stream redundancy is
   likely to clean up the reproduced picture within a reasonable amount
   of time.  The FIR, in contrast, leaves a (real-time) encoder no
   choice but to send a decoder refresh point.  It does not allow the
   encoder to take into account any considerations such as the ones
   mentioned above.

4.3.2.  Temporal-Spatial Trade-off Request (TSTR)

   The TSTR feedback message is identified by RTCP packet type value
   PT=PSFB and FMT=5.

   The FCI field MUST contain one or more TSTR FCI entries.

RFC5104 - Page 46

4.3.2.1.  Message Format

   The content of the FCI entry for the Temporal-Spatial Trade-off
   Request is depicted in Figure 5.  The length of the feedback message
   MUST be set to 2+2*N, where N is the number of FCI entries included.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              SSRC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Seq nr.      |  Reserved                           | Index   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

         Figure 5 - Syntax of an FCI Entry in the TSTR Message

     SSRC (32 bits): The SSRC of the media sender that is requested to
              apply the trade-off value given in Index.

     Seq nr. (8 bits): Request sequence number.  The sequence number
              space is unique for pairing of the SSRC of request source
              and the SSRC of the request target.  The sequence number
              SHALL be increased by 1 modulo 256 for each new command.
              A repetition SHALL NOT increase the sequence number.  The
              initial value is arbitrary.

     Reserved (19 bits): All bits SHALL be set to 0 by the sender and
              SHALL be ignored on reception.

     Index (5 bits): An integer value between 0 and 31 that indicates
              the relative trade-off that is requested.  An index value
              of 0 indicates the highest possible spatial quality, while
              31 indicates the highest possible temporal resolution.

4.3.2.2.  Semantics

   A decoder can suggest a temporal-spatial trade-off level by sending a
   TSTR message to an encoder.  If the encoder is capable of adjusting
   its temporal-spatial trade-off, it SHOULD take into account the
   received TSTR message for future coding of pictures.  A value of 0
   suggests a high spatial quality and a value of 31 suggests a high
   frame rate.  The progression of values from 0 to 31 indicates
   monotonically a desire for higher frame rate.  The index values do
   not correspond to precise values of spatial quality or frame rate.

RFC5104 - Page 47

   The reaction to the reception of more than one TSTR message by a
   media sender from different media receivers is left open to the
   implementation.  The selected trade-off SHALL be communicated to the
   media receivers by means of the TSTN message.

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the request, and the "SSRC of media source"
   is not used and SHALL be set to 0.  The SSRCs of the media senders to
   which the TSTR applies are in the corresponding FCI entries.

   A TSTR message MAY contain requests to multiple media senders, using
   one FCI entry per target media sender.

4.3.2.3.  Timing Rules

   The timing follows the rules outlined in section 3 of [RFC4585].
   This request message is not time critical and SHOULD be sent using
   regular RTCP timing.  Only if it is known that the user interface
   requires quick feedback, the message MAY be sent with early or
   immediate feedback timing.

4.3.2.4.  Handling of Message in Mixers and Translators

   A mixer or media translator that encodes content sent to the session
   participant issuing the TSTR SHALL consider the request to determine
   if it can fulfill it by changing its own encoding parameters.  A
   media translator unable to fulfill the request MAY forward the
   request unaltered towards the media sender.  A mixer encoding for
   multiple session participants will need to consider the joint needs
   of these participants before generating a TSTR on its own behalf
   towards the media sender.  See also the discussion in section 3.5.2.

4.3.2.5.  Remarks

   The term "spatial quality" does not necessarily refer to the
   resolution as measured by the number of pixels the reconstructed
   video is using.  In fact, in most scenarios the video resolution
   stays constant during the lifetime of a session.  However, all video
   compression standards have means to adjust the spatial quality at a
   given resolution, often influenced by the Quantizer Parameter or QP.
   A numerically low QP results in a good reconstructed picture quality,
   whereas a numerically high QP yields a coarse picture.  The typical
   reaction of an encoder to this request is to change its rate control
   parameters to use a lower frame rate and a numerically lower (on
   average) QP, or vice versa.  The precise mapping of Index value to

RFC5104 - Page 48

   frame rate and QP is intentionally left open here, as it depends on
   factors such as the compression standard employed, spatial
   resolution, content, bit rate, and so on.

4.3.3.  Temporal-Spatial Trade-off Notification (TSTN)

   The TSTN message is identified by RTCP packet type value PT=PSFB and
   FMT=6.

   The FCI field SHALL contain one or more TSTN FCI entries.

4.3.3.1.  Message Format

   The content of an FCI entry for the Temporal-Spatial Trade-off
   Notification is depicted in Figure 6.  The length of the TSTN message
   MUST be set to 2+2*N, where N is the number of FCI entries.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              SSRC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Seq nr.      |  Reserved                           | Index   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 6 - Syntax of the TSTN

     SSRC (32 bits): The SSRC of the source of the TSTR that resulted in
              this Notification.

     Seq nr. (8 bits): The sequence number value from the TSTR that is
              being acknowledged.

     Reserved (19 bits): All bits SHALL be set to 0 by the sender and
              SHALL be ignored on reception.

     Index (5 bits): The trade-off value the media sender is using
              henceforth.

      Informative note: The returned trade-off value (Index) may differ
      from the requested one, for example, in cases where a media
      encoder cannot tune its trade-off, or when pre-recorded content is
      used.

RFC5104 - Page 49

4.3.3.2.  Semantics

   This feedback message is used to acknowledge the reception of a TSTR.
   For each TSTR received targeted at the session participant, a TSTN
   FCI entry SHALL be sent in a TSTN feedback message.  A single TSTN
   message MAY acknowledge multiple requests using multiple FCI entries.
   The index value included SHALL be the same in all FCI entries of the
   TSTN message.  Including a FCI for each requestor allows each
   requesting entity to determine that the media sender received the
   request.  The Notification SHALL also be sent in response to TSTR
   repetitions received.  If the request receiver has received TSTR with
   several different sequence numbers from a single requestor, it SHALL
   only respond to the request with the highest (modulo 256) sequence
   number.  Note that the highest sequence number may be a smaller
   integer value due to the wrapping of the field.  Appendix A.1 of
   [RFC3550] has an algorithm for keeping track of the highest received
   sequence number for RTP packets; it could be adapted for this usage.

   The TSTN SHALL include the Temporal-Spatial Trade-off index that will
   be used as a result of the request.  This is not necessarily the same
   index as requested, as the media sender may need to aggregate
   requests from several requesting session participants.  It may also
   have some other policies or rules that limit the selection.

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the Notification, and the "SSRC of media
   source" is not used and SHALL be set to 0.  The SSRCs of the
   requesting entities to which the Notification applies are in the
   corresponding FCI entries.

4.3.3.3.  Timing Rules

   The timing follows the rules outlined in section 3 of [RFC4585].
   This acknowledgement message is not extremely time critical and
   SHOULD be sent using regular RTCP timing.

4.3.3.4.  Handling of TSTN in Mixers and Translators

   A mixer or translator that acts upon a TSTR SHALL also send the
   corresponding TSTN.  In cases where it needs to forward a TSTR
   itself, the notification message MAY need to be delayed until the
   TSTR has been responded to.

4.3.3.5.  Remarks

   None.

RFC5104 - Page 50

4.3.4.  H.271 Video Back Channel Message (VBCM)

   The VBCM is identified by RTCP packet type value PT=PSFB and FMT=7.

   The FCI field MUST contain one or more VBCM FCI entries.

4.3.4.1.  Message Format

   The syntax of an FCI entry within the VBCM indication is depicted in
   Figure 7.

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              SSRC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Seq nr.       |0| Payload Type| Length                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    VBCM Octet String....      |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 7 - Syntax of an FCI Entry in the VBCM

   SSRC (32 bits): The SSRC value of the media sender that is requested
          to instruct its encoder to react to the VBCM.

   Seq nr. (8 bits): Command sequence number.  The sequence number space
          is unique for pairing of the SSRC of the command source and
          the SSRC of the command target.  The sequence number SHALL be
          increased by 1 modulo 256 for each new command.  A repetition
          SHALL NOT increase the sequence number.  The initial value is
          arbitrary.

   0: Must be set to 0 by the sender and should not be acted upon by the
          message receiver.

   Payload Type (7 bits): The RTP payload type for which the VBCM bit
          stream must be interpreted.

   Length (16 bits): The length of the VBCM octet string in octets
          exclusive of any padding octets.

   VBCM Octet String (variable length): This is the octet string
          generated by the decoder carrying a specific feedback sub-
          message.

   Padding (variable length): Bits set to 0 to make up a 32-bit
          boundary.

RFC5104 - Page 51

4.3.4.2.  Semantics

   The "payload" of the VBCM indication carries different types of
   codec-specific, feedback information.  The type of feedback
   information can be classified as a 'status report' (such as an
   indication that a bit stream was received without errors, or that a
   partial or complete picture or block was lost) or 'update requests'
   (such as complete refresh of the bit stream).

          Note: There are possible overlaps between the VBCM sub-
          messages and CCM/AVPF feedback messages, such as FIR.  Please
          see section 3.5.3 for further discussion.

   The different types of feedback sub-messages carried in the VBCM are
   indicated by the "payloadType" as defined in [H.271].  These sub-
   message types are reproduced below for convenience.  "payloadType",
   in ITU-T Rec. H.271 terminology, refers to the sub-type of the H.271
   message and should not be confused with an RTP payload type.

   Payload          Message Content
   Type
   ---------------------------------------------------------------------
   0      One or more pictures without detected bit stream error
          mismatch
   1      One or more pictures that are entirely or partially lost
   2      A set of blocks of one picture that is entirely or partially
          lost
   3      CRC for one parameter set
   4      CRC for all parameter sets of a certain type
   5      A "reset" request indicating that the sender should completely
          refresh the video bit stream as if no prior bit stream data
          had been received
   > 5    Reserved for future use by ITU-T

   Table 2: H.271 message types ("payloadTypes")

   The bit string or the "payload" of a VBCM is of variable length and
   is self-contained and coded in a variable-length, binary format.  The
   media sender necessarily has to be able to parse this optimized
   binary format to make use of VBCMs.

   Each of the different types of sub-messages (indicated by
   payloadType) may have different semantics depending on the codec
   used.

   Within the common packet header for feedback messages (as defined in
   section 6.1 of [RFC4585]), the "SSRC of packet sender" field
   indicates the source of the request, and the "SSRC of media source"

RFC5104 - Page 52

   is not used and SHALL be set to 0.  The SSRCs of the media senders to
   which the VBCM applies are in the corresponding FCI entries.  The
   sender of the VBCM MAY send H.271 messages to multiple media senders
   and MAY send more than one H.271 message to the same media sender
   within the same VBCM.

4.3.4.3.  Timing Rules

   The timing follows the rules outlined in section 3 of [RFC4585].  The
   different sub-message types may have different properties in regards
   to the timing of messages that should be used.  If several different
   types are included in the same feedback packet, then the requirements
   for the sub-message type with the most stringent requirements should
   be followed.

4.3.4.4.  Handling of Message in Mixers or Translators

   The handling of a VBCM in a mixer or translator is sub-message type
   dependent.

4.3.4.5.  Remarks

   Please see section 3.5.3 for a discussion of the usage of H.271
   messages and messages defined in AVPF [RFC4585] and this memo with
   similar functionality.

     Note: There has been some discussion whether the RTP payload type
     field in this message is needed.  It will be needed if there is
     potentially more than one VBCM-capable RTP payload type in the same
     session, and the semantics of a given VBCM changes between payload
     types.  For example, the picture identification mechanism in
     messages of H.271 type 0 is fundamentally different between H.263
     and H.264 (although both use the same syntax).  Therefore, the
     payload field is justified here.  There was a further comment that
     for TSTR and FIR such a need does not exist, because the semantics
     of TSTR and FIR are either loosely enough defined, or generic
     enough, to apply to all video payloads currently in
     existence/envisioned.

(page 52 continued on part 4)