4. RTCP Receiver Report Extensions This memo specifies six new feedback messages. The Full Intra Request (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal- Spatial Trade-off Notification (TSTN), and Video Back Channel Message (VBCM) are "Payload Specific Feedback Messages" as defined in section 6.3 of AVPF [RFC4585]. The Temporary Maximum Media Stream Bit Rate Request (TMMBR) and Temporary Maximum Media Stream Bit Rate Notification (TMMBN) are "Transport Layer Feedback Messages" as defined in section 6.2 of AVPF. The new feedback messages are defined in the following subsections, following a similar structure to that in sections 6.2 and 6.3 of the AVPF specification [RFC4585]. 4.1. Design Principles of the Extension Mechanism RTCP was originally introduced as a channel to convey presence, reception quality statistics and hints on the desired media coding. A limited set of media control mechanisms was introduced in early RTP payload formats for video formats, for example, in RFC 2032 [RFC2032] (which was obsoleted by RFC 4587 [RFC4587]). However, this specification, for the first time, suggests a two-way handshake for some of its messages. There is danger that this introduction could be misunderstood as a precedent for the use of RTCP as an RTP session control protocol. To prevent such a misunderstanding, this subsection attempts to clarify the scope of the extensions specified in this memo, and it strongly suggests that future extensions follow the rationale spelled out here, or compellingly explain why they divert from the rationale. In this memo, and in AVPF [RFC4585], only such messages have been included as: a) have comparatively strict real-time constraints, which prevent the use of mechanisms such as a SIP re-invite in most application scenarios (the real-time constraints are explained separately for each message where necessary); b) are multicast-safe in that the reaction to potentially contradicting feedback messages is specified, as necessary for each message; and c) are directly related to activities of a certain media codec, class of media codecs (e.g., video codecs), or a given RTP packet stream.
In this memo, a two-way handshake is introduced only for messages for which: a) a notification or acknowledgement is required due to their nature. An analysis to determine whether this requirement exists has been performed separately for each message. b) the notification or acknowledgement cannot be easily derived from the media bit stream. All messages in AVPF [RFC4585] and in this memo present their contents in a simple, fixed binary format. This accommodates media receivers that have not implemented higher control protocol functionalities (SDP, XML parsers, and such) in their media path. Messages that do not conform to the design principles just described are not an appropriate use of RTCP or of the Codec Control Framework defined in this document. 4.2. Transport Layer Feedback Messages As specified in section 6.1 of RFC 4585 [RFC4585], transport layer feedback messages are identified by the RTCP packet type value RTPFB (205). In AVPF, one message of this category had been defined. This memo specifies two more such messages. They are identified by means of the feedback message type (FMT) parameter as follows: Assigned in AVPF [RFC4585]: 1: Generic NACK 31: reserved for future expansion of the identifier number space Assigned in this memo: 2: reserved (see note below) 3: Temporary Maximum Media Stream Bit Rate Request (TMMBR) 4: Temporary Maximum Media Stream Bit Rate Notification (TMMBN) Note: early versions of AVPF [RFC4585] reserved FMT=2 for a code point that has later been removed. It has been pointed out that there may be implementations in the field using this value in accordance with the expired document. As there is sufficient numbering space available, we mark FMT=2 as reserved so to avoid possible interoperability problems with any such early implementations.
Available for assignment: 0: unassigned 5-30: unassigned The following subsection defines the formats of the Feedback Control Information (FCI) entries for the TMMBR and TMMBN messages, respectively, and specifies the associated behaviour at the media sender and receiver. 4.2.1. Temporary Maximum Media Stream Bit Rate Request (TMMBR) The Temporary Maximum Media Stream Bit Rate Request is identified by RTCP packet type value PT=RTPFB and FMT=3. The FCI field of a Temporary Maximum Media Stream Bit Rate Request (TMMBR) message SHALL contain one or more FCI entries. 22.214.171.124. Message Format The Feedback Control Information (FCI) consists of one or more TMMBR FCI entries with the following syntax: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MxTBR Exp | MxTBR Mantissa |Measured Overhead| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2 - Syntax of an FCI Entry in the TMMBR Message SSRC (32 bits): The SSRC value of the media sender that is requested to obey the new maximum bit rate. MxTBR Exp (6 bits): The exponential scaling of the mantissa for the maximum total media bit rate value. The value is an unsigned integer [0..63]. MxTBR Mantissa (17 bits): The mantissa of the maximum total media bit rate value as an unsigned integer. Measured Overhead (9 bits): The measured average packet overhead value in bytes. The measurement SHALL be done according to the description in section 126.96.36.199. The value is an unsigned integer [0..511].
The maximum total media bit rate (MxTBR) value in bits per second is calculated from the MxTBR exponent (exp) and mantissa in the following way: MxTBR = mantissa * 2^exp This allows for 17 bits of resolution in the range 0 to 131072*2^63 (approximately 1.2*10^24). The length of the TMMBR feedback message SHALL be set to 2+2*N where N is the number of TMMBR FCI entries. 188.8.131.52. Semantics Behaviour at the Media Receiver (Sender of the TMMBR) TMMBR is used to indicate a transport-related limitation at the reporting entity acting as a media receiver. TMMBR has the form of a tuple containing two components. The first value is the highest bit rate per sender of a media stream, available at a receiver-chosen protocol layer, which the receiver currently supports in this RTP session. The second value is the measured header overhead in bytes as defined in section 2.2 and measured at the chosen protocol layer in the packets received for the stream. The measurement of the overhead is a running average that is updated for each packet received for this particular media source (SSRC), using the following formula: avg_OH (new) = 15/16*avg_OH (old) + 1/16*pckt_OH, where avg_OH is the running (exponentially smoothed) average and pckt_OH is the overhead observed in the latest packet. If a maximum bit rate has been negotiated through signaling, the maximum total media bit rate that the receiver reports in a TMMBR message MUST NOT exceed the negotiated value converted to a common basis (i.e., with overheads adjusted to bring it to the same reference protocol layer). Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source" is not used and SHALL be set to 0. Within a particular TMMBR FCI entry, the "SSRC of media source" in the FCI field denotes the media sender that the tuple applies to. This is useful in the multicast or translator topologies where the reporting entity may address all of the media senders in a single TMMBR message using multiple FCI entries.
The media receiver SHALL save the contents of the latest TMMBN message received from each media sender. The media receiver MAY send a TMMBR FCI entry to a particular media sender under the following circumstances: o before any TMMBN message has been received from that media sender; o when the media receiver has been identified as the source of a bounding tuple within the latest TMMBN message received from that media sender, and the value of the maximum total media bit rate or the overhead relating to that media sender has changed; o when the media receiver has not been identified as the source of a bounding tuple within the latest TMMBN message received from that media sender, and, after the media receiver applies the incremental algorithm from section 184.108.40.206 or a stricter equivalent, the media receiver's tuple relating to that media sender is determined to belong to the bounding set. A TMMBR FCI entry MAY be repeated in subsequent TMMBR messages if no Temporary Maximum Media Stream Bit Rate Notification (TMMBN) FCI has been received from the media sender at the time of transmission of the next RTCP packet. The bit rate value of a TMMBR FCI entry MAY be changed from one TMMBR message to the next. The overhead measurement SHALL be updated to the current value of avg_OH each time the entry is sent. If the value set by a TMMBR message is expected to be permanent, the TMMBR setting party SHOULD renegotiate the session parameters to reflect that using session setup signaling, e.g., a SIP re-invite. Behaviour at the Media Sender (Receiver of the TMMBR) When it receives a TMMBR message containing an FCI entry relating to it, the media sender SHALL use an initial or incremental algorithm as applicable to determine the bounding set of tuples based on the new information. The algorithm used SHALL be at least as strict as the corresponding algorithm defined in section 220.127.116.11. The media sender MAY accumulate TMMBRs over a small interval (relative to the RTCP sending interval) before making this calculation. Once it has determined the bounding set of tuples, the media sender MAY use any combination of packet rate and net media bit rate within the feasible region that these tuples describe to produce a lower
total media stream bit rate, as it may need to address a congestion situation or other limiting factors. See section 5 (congestion control) for more discussion. If the media sender concludes that it can increase the maximum total media bit rate value, it SHALL wait before actually doing so, for a period long enough to allow a media receiver to respond to the TMMBN if it determines that its tuple belongs in the bounding set. This delay period is estimated by the formula: 2 * RTT + T_Dither_Max, where RTT is the longest round trip time known to the media sender and T_Dither_Max is defined in section 3.4 of [RFC4585]. Even in point-to-point sessions, a media sender MUST obey the aforementioned rule, as it is not guaranteed that a participant is able to determine correctly whether all the sources are co-located in a single node, and are coordinated. A TMMBN message SHALL be sent by the media sender at the earliest possible point in time, in response to any TMMBR messages received since the last sending of TMMBN. The TMMBN message indicates the calculated set of bounding tuples and the owners of those tuples at the time of the transmission of the message. An SSRC may time out according to the default rules for RTP session participants, i.e., the media sender has not received any RTP or RTCP packets from the owner for the last five regular reporting intervals. An SSRC may also explicitly leave the session, with the participant indicating this through the transmission of an RTCP BYE packet or using an external signaling channel. If the media sender determines that the owner of a tuple in the bounding set has left the session, the media sender SHALL transmit a new TMMBN containing the previously determined set of bounding tuples but with the tuple belonging to the departed owner removed. A media sender MAY proactively initiate the equivalent to a TMMBR message to itself, when it is aware that its transmission path is more restrictive than the current limitations. As a result, a TMMBN indicating the media source itself as the owner of a tuple is being sent, thereby avoiding unnecessary TMMBR messages from other participants. However, like any other participant, when the media sender becomes aware of changed limitations, it is required to change the tuple, and to send a corresponding TMMBN.
Discussion Due to the unreliable nature of transport of TMMBR and TMMBN, the above rules may lead to the sending of TMMBR messages that appear to disobey those rules. Furthermore, in multicast scenarios it can happen that more than one "non-owning" session participant may determine, rightly or wrongly, that its tuple belongs in the bounding set. This is not critical for a number of reasons: a) If a TMMBR message is lost in transmission, either the media sender sends a new TMMBN message in response to some other media receiver or it does not send a new TMMBN message at all. In the first case, the media receiver applies the incremental algorithm and, if it determines that its tuple should be part of the bounding set, sends out another TMMBR. In the second case, it repeats the sending of a TMMBR unconditionally. Either way, the media sender eventually gets the information it needs. b) Similarly, if a TMMBN message gets lost, the media receiver that has sent the corresponding TMMBR does not receive the notification and is expected to re-send the request and trigger the transmission of another TMMBN. c) If multiple competing TMMBR messages are sent by different session participants, then the algorithm can be applied taking all of these messages into account, and the resulting TMMBN provides the participants with an updated view of how their tuples compare with the bounded set. d) If more than one session participant happens to send TMMBR messages at the same time and with the same tuple component values, it does not matter which of those tuples is taken into the bounding set. The losing session participant will determine, after applying the algorithm, that its tuple does not enter the bounding set, and will therefore stop sending its TMMBR. It is important to consider the security risks involved with faked TMMBRs. See the security considerations in section 6. As indicated already, the feedback messages may be used in both multicast and unicast sessions in any of the specified topologies. However, for sessions with a large number of participants, using the lowest common denominator, as required by this mechanism, may not be the most suitable course of action. Large sessions may need to consider other ways to adapt the bit rate to participants' capabilities, such as partitioning the session into different quality tiers or using some other method of achieving bit rate scalability.
18.104.22.168. Timing Rules The first transmission of the TMMBR message MAY use early or immediate feedback in cases when timeliness is desirable. Any repetition of a request message SHOULD use regular RTCP mode for its transmission timing. 22.214.171.124. Handling in Translators and Mixers Media translators and mixers will need to receive and respond to TMMBR messages as they are part of the chain that provides a certain media stream to the receiver. The mixer or translator may act locally on the TMMBR and thus generate a TMMBN to indicate that it has done so. Alternatively, in the case of a media translator it can forward the request, or in the case of a mixer generate one of its own and pass it forward. In the latter case, the mixer will need to send a TMMBN back to the original requestor to indicate that it is handling the request. 4.2.2. Temporary Maximum Media Stream Bit Rate Notification (TMMBN) The Temporary Maximum Media Stream Bit Rate Notification is identified by RTCP packet type value PT=RTPFB and FMT=4. The FCI field of the TMMBN feedback message may contain zero, one, or more TMMBN FCI entries. 126.96.36.199. Message Format The Feedback Control Information (FCI) consists of zero, one, or more TMMBN FCI entries with the following syntax: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MxTBR Exp | MxTBR Mantissa |Measured Overhead| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3 - Syntax of an FCI Entry in the TMMBN Message SSRC (32 bits): The SSRC value of the "owner" of this tuple. MxTBR Exp (6 bits): The exponential scaling of the mantissa for the maximum total media bit rate value. The value is an unsigned integer [0..63].
MxTBR Mantissa (17 bits): The mantissa of the maximum total media bit rate value as an unsigned integer. Measured Overhead (9 bits): The measured average packet overhead value in bytes represented as an unsigned integer [0..511]. Thus, the FCI within the TMMBN message contains entries indicating the bounding tuples. For each tuple, the entry gives the owner by the SSRC, followed by the applicable maximum total media bit rate and overhead value. The length of the TMMBN message SHALL be set to 2+2*N where N is the number of TMMBN FCI entries. 188.8.131.52. Semantics This feedback message is used to notify the senders of any TMMBR message that one or more TMMBR messages have been received or that an owner has left the session. It indicates to all participants the current set of bounding tuples and the "owners" of those tuples. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the notification. The "SSRC of media source" is not used and SHALL be set to 0. A TMMBN message SHALL be scheduled for transmission after the reception of a TMMBR message with an FCI entry identifying this media sender. Only a single TMMBN SHALL be sent, even if more than one TMMBR message is received between the scheduling of the transmission and the actual transmission of the TMMBN message. The TMMBN message indicates the bounding tuples and their owners at the time of transmitting the message. The bounding tuples included SHALL be the set arrived at through application of the applicable algorithm of section 184.108.40.206 or an equivalent, applied to the previous bounding set, if any, and tuples received in TMMBR messages since the last TMMBN was transmitted. The reception of a TMMBR message SHALL still result in the transmission of a TMMBN message even if, after application of the algorithm, the newly reported TMMBR tuple is not accepted into the bounding set. In such a case, the bounding tuples and their owners are not changed, unless the TMMBR was from an owner of a tuple within the previously calculated bounding set. This procedure allows session participants that did not see the last TMMBN message to get a correct view of this media sender's state.
As indicated in section 220.127.116.11, when a media sender determines that an "owner" of a bounding tuple has left the session, then that tuple is removed from the bounding set, and the media sender SHALL send a TMMBN message indicating the remaining bounding tuples. If there are no remaining bounding tuples, a TMMBN without any FCI SHALL be sent to indicate this. Without a remaining bounding tuple, the maximum media bit rate and maximum packet rate negotiated in session signaling, if any, apply. Note: if any media receivers remain in the session, this last will be a temporary situation. The empty TMMBN will cause every remaining media receiver to determine that its limitation belongs in the bounding set and send a TMMBR in consequence. In unicast scenarios (i.e., where a single sender talks to a single receiver), the aforementioned algorithm to determine ownership degenerates to the media receiver becoming the "owner" of the one bounding tuple as soon as the media receiver has issued the first TMMBR message. 18.104.22.168. Timing Rules The TMMBN acknowledgement SHOULD be sent as soon as allowed by the applied timing rules for the session. Immediate or early feedback mode SHOULD be used for these messages. 22.214.171.124. Handling by Translators and Mixers As discussed in section 126.96.36.199, mixers or translators may need to issue TMMBN messages as responses to TMMBR messages for SSRCs handled by them. 4.3. Payload-Specific Feedback Messages As specified by section 6.1 of RFC 4585 [RFC4585], Payload-Specific FB messages are identified by the RTCP packet type value PSFB (206). AVPF [RFC4585] defines three payload-specific feedback messages and one application layer feedback message. This memo specifies four additional payload-specific feedback messages. All are identified by means of the FMT parameter as follows:
Assigned in [RFC4585]: 1: Picture Loss Indication (PLI) 2: Slice Lost Indication (SLI) 3: Reference Picture Selection Indication (RPSI) 15: Application layer FB message 31: reserved for future expansion of the number space Assigned in this memo: 4: Full Intra Request (FIR) Command 5: Temporal-Spatial Trade-off Request (TSTR) 6: Temporal-Spatial Trade-off Notification (TSTN) 7: Video Back Channel Message (VBCM) Unassigned: 0: unassigned 8-14: unassigned 16-30: unassigned The following subsections define the new FCI formats for the payload-specific feedback messages. 4.3.1. Full Intra Request (FIR) The FIR message is identified by RTCP packet type value PT=PSFB and FMT=4. The FCI field MUST contain one or more FIR entries. Each entry applies to a different media sender, identified by its SSRC. 188.8.131.52. Message Format The Feedback Control Information (FCI) for the Full Intra Request consists of one or more FCI entries, the content of which is depicted in Figure 4. The length of the FIR feedback message MUST be set to 2+2*N, where N is the number of FCI entries. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4 - Syntax of an FCI Entry in the FIR Message
SSRC (32 bits): The SSRC value of the media sender that is requested to send a decoder refresh point. Seq nr. (8 bits): Command sequence number. The sequence number space is unique for each pairing of the SSRC of command source and the SSRC of the command target. The sequence number SHALL be increased by 1 modulo 256 for each new command. A repetition SHALL NOT increase the sequence number. The initial value is arbitrary. Reserved (24 bits): All bits SHALL be set to 0 by the sender and SHALL be ignored on reception. The semantics of this feedback message is independent of the RTP payload type. 184.108.40.206. Semantics Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source" is not used and SHALL be set to 0. The SSRCs of the media senders to which the FIR command applies are in the corresponding FCI entries. A FIR message MAY contain requests to multiple media senders, using one FCI entry per target media sender. Upon reception of FIR, the encoder MUST send a decoder refresh point (see section 2.2) as soon as possible. The sender MUST consider congestion control as outlined in section 5, which MAY restrict its ability to send a decoder refresh point quickly. FIR SHALL NOT be sent as a reaction to picture losses -- it is RECOMMENDED to use PLI [RFC4585] instead. FIR SHOULD be used only in situations where not sending a decoder refresh point would render the video unusable for the users. A typical example where sending FIR is appropriate is when, in a multipoint conference, a new user joins the session and no regular decoder refresh point interval is established. Another example would be a video switching MCU that changes streams. Here, normally, the MCU issues a FIR to the new sender so to force it to emit a decoder refresh point. The decoder refresh point normally includes a Freeze Picture Release (defined outside this specification), which re-starts the rendering process of the receivers. Both techniques mentioned are commonly used in MCU-based multipoint conferences.
Other RTP payload specifications such as RFC 2032 [RFC2032] already define a feedback mechanism for certain codecs. An application supporting both schemes MUST use the feedback mechanism defined in this specification when sending feedback. For backward-compatibility reasons, such an application SHOULD also be capable of receiving and reacting to the feedback scheme defined in the respective RTP payload format, if this is required by that payload format. 220.127.116.11. Timing Rules The timing follows the rules outlined in section 3 of [RFC4585]. FIR commands MAY be used with early or immediate feedback. The FIR feedback message MAY be repeated. If using immediate feedback mode, the repetition SHOULD wait at least one RTT before being sent. In early or regular RTCP mode, the repetition is sent in the next regular RTCP packet. 18.104.22.168. Handling of FIR Message in Mixers and Translators A media translator or a mixer performing media encoding of the content for which the session participant has issued a FIR is responsible for acting upon it. A mixer acting upon a FIR SHOULD NOT forward the message unaltered; instead, it SHOULD issue a FIR itself. 22.214.171.124. Remarks Currently, video appears to be the only useful application for FIR, as it appears to be the only RTP payload widely deployed that relies heavily on media prediction across RTP packet boundaries. However, use of FIR could also reasonably be envisioned for other media types that share essential properties with compressed video, namely, cross-frame prediction (whatever a frame may be for that media type). One possible example may be the dynamic updates of MPEG-4 scene descriptions. It is suggested that payload formats for such media types refer to FIR and other message types defined in this specification and in AVPF [RFC4585], instead of creating similar mechanisms in the payload specifications. The payload specifications may have to explain how the payload-specific terminologies map to the video-centric terminology used herein. In conjunction with video codecs, FIR messages typically trigger the sending of full intra or IDR pictures. Both are several times larger than predicted (inter) pictures. Their size is independent of the time they are generated. In most environments, especially when employing bandwidth-limited links, the use of an intra picture implies an allowed delay that is a significant multiple of the typical frame duration. An example: if the sending frame rate is 10 fps, and an intra picture is assumed to be 10 times as big as an
inter picture, then a full second of latency has to be accepted. In such an environment, there is no need for a particularly short delay in sending the FIR message. Hence, waiting for the next possible time slot allowed by RTCP timing rules as per [RFC4585] should not have an overly negative impact on the system performance. Mandating a maximum delay for completing the sending of a decoder refresh point would be desirable from an application viewpoint, but is problematic from a congestion control point of view. "As soon as possible" as mentioned above appears to be a reasonable compromise. In environments where the sender has no control over the codec (e.g., when streaming pre-recorded and pre-coded content), the reaction to this command cannot be specified. One suitable reaction of a sender would be to skip forward in the video bit stream to the next decoder refresh point. In other scenarios, it may be preferable not to react to the command at all, e.g., when streaming to a large multicast group. Other reactions may also be possible. When deciding on a strategy, a sender could take into account factors such as the size of the receiving group, the "importance" of the sender of the FIR message (however "importance" may be defined in this specific application), the frequency of decoder refresh points in the content, and so on. However, a session that predominantly handles pre-coded content is not expected to use FIR at all. The relationship between the Picture Loss Indication and FIR is as follows. As discussed in section 6.3.1 of AVPF [RFC4585], a Picture Loss Indication informs the decoder about the loss of a picture and hence the likelihood of misalignment of the reference pictures between the encoder and decoder. Such a scenario is normally related to losses in an ongoing connection. In point-to-point scenarios, and without the presence of advanced error resilience tools, one possible option for an encoder consists in sending a decoder refresh point. However, there are other options. One example is that the media sender ignores the PLI, because the embedded stream redundancy is likely to clean up the reproduced picture within a reasonable amount of time. The FIR, in contrast, leaves a (real-time) encoder no choice but to send a decoder refresh point. It does not allow the encoder to take into account any considerations such as the ones mentioned above. 4.3.2. Temporal-Spatial Trade-off Request (TSTR) The TSTR feedback message is identified by RTCP packet type value PT=PSFB and FMT=5. The FCI field MUST contain one or more TSTR FCI entries.
126.96.36.199. Message Format The content of the FCI entry for the Temporal-Spatial Trade-off Request is depicted in Figure 5. The length of the feedback message MUST be set to 2+2*N, where N is the number of FCI entries included. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | Reserved | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5 - Syntax of an FCI Entry in the TSTR Message SSRC (32 bits): The SSRC of the media sender that is requested to apply the trade-off value given in Index. Seq nr. (8 bits): Request sequence number. The sequence number space is unique for pairing of the SSRC of request source and the SSRC of the request target. The sequence number SHALL be increased by 1 modulo 256 for each new command. A repetition SHALL NOT increase the sequence number. The initial value is arbitrary. Reserved (19 bits): All bits SHALL be set to 0 by the sender and SHALL be ignored on reception. Index (5 bits): An integer value between 0 and 31 that indicates the relative trade-off that is requested. An index value of 0 indicates the highest possible spatial quality, while 31 indicates the highest possible temporal resolution. 188.8.131.52. Semantics A decoder can suggest a temporal-spatial trade-off level by sending a TSTR message to an encoder. If the encoder is capable of adjusting its temporal-spatial trade-off, it SHOULD take into account the received TSTR message for future coding of pictures. A value of 0 suggests a high spatial quality and a value of 31 suggests a high frame rate. The progression of values from 0 to 31 indicates monotonically a desire for higher frame rate. The index values do not correspond to precise values of spatial quality or frame rate.
The reaction to the reception of more than one TSTR message by a media sender from different media receivers is left open to the implementation. The selected trade-off SHALL be communicated to the media receivers by means of the TSTN message. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source" is not used and SHALL be set to 0. The SSRCs of the media senders to which the TSTR applies are in the corresponding FCI entries. A TSTR message MAY contain requests to multiple media senders, using one FCI entry per target media sender. 184.108.40.206. Timing Rules The timing follows the rules outlined in section 3 of [RFC4585]. This request message is not time critical and SHOULD be sent using regular RTCP timing. Only if it is known that the user interface requires quick feedback, the message MAY be sent with early or immediate feedback timing. 220.127.116.11. Handling of Message in Mixers and Translators A mixer or media translator that encodes content sent to the session participant issuing the TSTR SHALL consider the request to determine if it can fulfill it by changing its own encoding parameters. A media translator unable to fulfill the request MAY forward the request unaltered towards the media sender. A mixer encoding for multiple session participants will need to consider the joint needs of these participants before generating a TSTR on its own behalf towards the media sender. See also the discussion in section 3.5.2. 18.104.22.168. Remarks The term "spatial quality" does not necessarily refer to the resolution as measured by the number of pixels the reconstructed video is using. In fact, in most scenarios the video resolution stays constant during the lifetime of a session. However, all video compression standards have means to adjust the spatial quality at a given resolution, often influenced by the Quantizer Parameter or QP. A numerically low QP results in a good reconstructed picture quality, whereas a numerically high QP yields a coarse picture. The typical reaction of an encoder to this request is to change its rate control parameters to use a lower frame rate and a numerically lower (on average) QP, or vice versa. The precise mapping of Index value to
frame rate and QP is intentionally left open here, as it depends on factors such as the compression standard employed, spatial resolution, content, bit rate, and so on. 4.3.3. Temporal-Spatial Trade-off Notification (TSTN) The TSTN message is identified by RTCP packet type value PT=PSFB and FMT=6. The FCI field SHALL contain one or more TSTN FCI entries. 22.214.171.124. Message Format The content of an FCI entry for the Temporal-Spatial Trade-off Notification is depicted in Figure 6. The length of the TSTN message MUST be set to 2+2*N, where N is the number of FCI entries. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. | Reserved | Index | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6 - Syntax of the TSTN SSRC (32 bits): The SSRC of the source of the TSTR that resulted in this Notification. Seq nr. (8 bits): The sequence number value from the TSTR that is being acknowledged. Reserved (19 bits): All bits SHALL be set to 0 by the sender and SHALL be ignored on reception. Index (5 bits): The trade-off value the media sender is using henceforth. Informative note: The returned trade-off value (Index) may differ from the requested one, for example, in cases where a media encoder cannot tune its trade-off, or when pre-recorded content is used.
126.96.36.199. Semantics This feedback message is used to acknowledge the reception of a TSTR. For each TSTR received targeted at the session participant, a TSTN FCI entry SHALL be sent in a TSTN feedback message. A single TSTN message MAY acknowledge multiple requests using multiple FCI entries. The index value included SHALL be the same in all FCI entries of the TSTN message. Including a FCI for each requestor allows each requesting entity to determine that the media sender received the request. The Notification SHALL also be sent in response to TSTR repetitions received. If the request receiver has received TSTR with several different sequence numbers from a single requestor, it SHALL only respond to the request with the highest (modulo 256) sequence number. Note that the highest sequence number may be a smaller integer value due to the wrapping of the field. Appendix A.1 of [RFC3550] has an algorithm for keeping track of the highest received sequence number for RTP packets; it could be adapted for this usage. The TSTN SHALL include the Temporal-Spatial Trade-off index that will be used as a result of the request. This is not necessarily the same index as requested, as the media sender may need to aggregate requests from several requesting session participants. It may also have some other policies or rules that limit the selection. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the Notification, and the "SSRC of media source" is not used and SHALL be set to 0. The SSRCs of the requesting entities to which the Notification applies are in the corresponding FCI entries. 188.8.131.52. Timing Rules The timing follows the rules outlined in section 3 of [RFC4585]. This acknowledgement message is not extremely time critical and SHOULD be sent using regular RTCP timing. 184.108.40.206. Handling of TSTN in Mixers and Translators A mixer or translator that acts upon a TSTR SHALL also send the corresponding TSTN. In cases where it needs to forward a TSTR itself, the notification message MAY need to be delayed until the TSTR has been responded to. 220.127.116.11. Remarks None.
4.3.4. H.271 Video Back Channel Message (VBCM) The VBCM is identified by RTCP packet type value PT=PSFB and FMT=7. The FCI field MUST contain one or more VBCM FCI entries. 18.104.22.168. Message Format The syntax of an FCI entry within the VBCM indication is depicted in Figure 7. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. |0| Payload Type| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VBCM Octet String.... | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7 - Syntax of an FCI Entry in the VBCM SSRC (32 bits): The SSRC value of the media sender that is requested to instruct its encoder to react to the VBCM. Seq nr. (8 bits): Command sequence number. The sequence number space is unique for pairing of the SSRC of the command source and the SSRC of the command target. The sequence number SHALL be increased by 1 modulo 256 for each new command. A repetition SHALL NOT increase the sequence number. The initial value is arbitrary. 0: Must be set to 0 by the sender and should not be acted upon by the message receiver. Payload Type (7 bits): The RTP payload type for which the VBCM bit stream must be interpreted. Length (16 bits): The length of the VBCM octet string in octets exclusive of any padding octets. VBCM Octet String (variable length): This is the octet string generated by the decoder carrying a specific feedback sub- message. Padding (variable length): Bits set to 0 to make up a 32-bit boundary.
22.214.171.124. Semantics The "payload" of the VBCM indication carries different types of codec-specific, feedback information. The type of feedback information can be classified as a 'status report' (such as an indication that a bit stream was received without errors, or that a partial or complete picture or block was lost) or 'update requests' (such as complete refresh of the bit stream). Note: There are possible overlaps between the VBCM sub- messages and CCM/AVPF feedback messages, such as FIR. Please see section 3.5.3 for further discussion. The different types of feedback sub-messages carried in the VBCM are indicated by the "payloadType" as defined in [H.271]. These sub- message types are reproduced below for convenience. "payloadType", in ITU-T Rec. H.271 terminology, refers to the sub-type of the H.271 message and should not be confused with an RTP payload type. Payload Message Content Type --------------------------------------------------------------------- 0 One or more pictures without detected bit stream error mismatch 1 One or more pictures that are entirely or partially lost 2 A set of blocks of one picture that is entirely or partially lost 3 CRC for one parameter set 4 CRC for all parameter sets of a certain type 5 A "reset" request indicating that the sender should completely refresh the video bit stream as if no prior bit stream data had been received > 5 Reserved for future use by ITU-T Table 2: H.271 message types ("payloadTypes") The bit string or the "payload" of a VBCM is of variable length and is self-contained and coded in a variable-length, binary format. The media sender necessarily has to be able to parse this optimized binary format to make use of VBCMs. Each of the different types of sub-messages (indicated by payloadType) may have different semantics depending on the codec used. Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source"
is not used and SHALL be set to 0. The SSRCs of the media senders to which the VBCM applies are in the corresponding FCI entries. The sender of the VBCM MAY send H.271 messages to multiple media senders and MAY send more than one H.271 message to the same media sender within the same VBCM. 126.96.36.199. Timing Rules The timing follows the rules outlined in section 3 of [RFC4585]. The different sub-message types may have different properties in regards to the timing of messages that should be used. If several different types are included in the same feedback packet, then the requirements for the sub-message type with the most stringent requirements should be followed. 188.8.131.52. Handling of Message in Mixers or Translators The handling of a VBCM in a mixer or translator is sub-message type dependent. 184.108.40.206. Remarks Please see section 3.5.3 for a discussion of the usage of H.271 messages and messages defined in AVPF [RFC4585] and this memo with similar functionality. Note: There has been some discussion whether the RTP payload type field in this message is needed. It will be needed if there is potentially more than one VBCM-capable RTP payload type in the same session, and the semantics of a given VBCM changes between payload types. For example, the picture identification mechanism in messages of H.271 type 0 is fundamentally different between H.263 and H.264 (although both use the same syntax). Therefore, the payload field is justified here. There was a further comment that for TSTR and FIR such a need does not exist, because the semantics of TSTR and FIR are either loosely enough defined, or generic enough, to apply to all video payloads currently in existence/envisioned.