Tech-invite3GPPspaceIETF RFCsSIP
929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7798

RTP Payload Format for High Efficiency Video Coding (HEVC)

Pages: 86
Proposed Standard
Part 4 of 4 – Pages 64 to 86
First   Prev   None

Top   ToC   RFC7798 - Page 64   prevText

7.2. SDP Parameters

The receiver MUST ignore any parameter unspecified in this memo.

7.2.1. Mapping of Payload Type Parameters to SDP

The media type video/H265 string is mapped to fields in the Session Description Protocol (SDP) [RFC4566] as follows: o The media name in the "m=" line of SDP MUST be video. o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the media subtype). o The clock rate in the "a=rtpmap" line MUST be 90000. o The OPTIONAL parameters profile-space, profile-id, tier-flag, level-id, interop-constraints, profile-compatibility-indicator, sprop-sub-layer-id, recv-sub-layer-id, max-recv-level-id, tx-mode,
Top   ToC   RFC7798 - Page 65
      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc, max-
      fps, sprop-max-don-diff, sprop-depack-buf-nalus, sprop-depack-buf-
      bytes, depack-buf-cap, sprop-segmentation-id, sprop-spatial-
      segmentation-idc, dec-parallel-cap, and include-dph, when present,
      MUST be included in the "a=fmtp" line of SDP.  This parameter is
      expressed as a media type string, in the form of a semicolon-
      separated list of parameter=value pairs.

   o  The OPTIONAL parameters sprop-vps, sprop-sps, and sprop-pps, when
      present, MUST be included in the "a=fmtp" line of SDP or conveyed
      using the "fmtp" source attribute as specified in Section 6.3 of
      [RFC5576].  For a particular media format (i.e., RTP payload
      type), sprop-vps sprop-sps, or sprop-pps MUST NOT be both included
      in the "a=fmtp" line of SDP and conveyed using the "fmtp" source
      attribute.  When included in the "a=fmtp" line of SDP, these
      parameters are expressed as a media type string, in the form of a
      semicolon-separated list of parameter=value pairs.  When conveyed
      in the "a=fmtp" line of SDP for a particular payload type, the
      parameters sprop-vps, sprop-sps, and sprop-pps MUST be applied to
      each SSRC with the payload type.  When conveyed using the "fmtp"
      source attribute, these parameters are only associated with the
      given source and payload type as parts of the "fmtp" source
      attribute.

         Informative note: Conveyance of sprop-vps, sprop-sps, and
         sprop-pps using the "fmtp" source attribute allows for out-of-
         band transport of parameter sets in topologies like Topo-Video-
         switch-MCU as specified in [RFC7667].

   An example of media representation in SDP is as follows:

      m=video 49170 RTP/AVP 98
      a=rtpmap:98 H265/90000
      a=fmtp:98 profile-id=1;
                sprop-vps=<video parameter sets data>

7.2.2. Usage with SDP Offer/Answer Model

When HEVC is offered over RTP using SDP in an offer/answer model [RFC3264] for negotiation for unicast usage, the following limitations and rules apply: o The parameters identifying a media format configuration for HEVC are profile-space, profile-id, tier-flag, level-id, interop- constraints, profile-compatibility-indicator, and tx-mode. These media configuration parameters, except level-id, MUST be used symmetrically when the answerer does not include recv-sub-layer-id
Top   ToC   RFC7798 - Page 66
      in the answer for the media format (payload type) or the included
      recv-sub-layer-id is equal to sprop-sub-layer-id in the offer.
      The answerer MUST:

      1) maintain all configuration parameters with the values remaining
         the same as in the offer for the media format (payload type),
         with the exception that the value of level-id is changeable as
         long as the highest level indicated by the answer is not higher
         than that indicated by the offer;

      2) include in the answer the recv-sub-layer-id parameter, with a
         value less than the sprop-sub-layer-id parameter in the offer,
         for the media format (payload type), and maintain all
         configuration parameters with the values being the same as
         signaled in the sprop-vps for the chosen sub-layer
         representation, with the exception that the value of level-id
         is changeable as long as the highest level indicated by the
         answer is not higher than the level indicated by the sprop-vps
         in offer for the chosen sub-layer representation; or

      3) remove the media format (payload type) completely (when one or
         more of the parameter values are not supported).

            Informative note: The above requirement for symmetric use
            does not apply for level-id, and does not apply for the
            other bitstream or RTP stream properties and capability
            parameters.

   o  The profile-compatibility-indicator, when offered as sendonly,
      describes bitstream properties.  The answerer MAY accept an RTP
      payload type even if the decoder is not capable of handling the
      profile indicated by the profile-space, profile-id, and interop-
      constraints parameters, but capable of any of the profiles
      indicated by the profile-space, profile-compatibility-indicator,
      and interop-constraints.  However, when the profile-compatibility-
      indicator is used in a recvonly or sendrecv media description, the
      bitstream using this RTP payload type is required to conform to
      all profiles indicated by profile-space, profile-compatibility-
      indicator, and interop-constraints.

   o  To simplify handling and matching of these configurations, the
      same RTP payload type number used in the offer SHOULD also be used
      in the answer, as specified in [RFC3264].

   o  The same RTP payload type number used in the offer for the media
      subtype H265 MUST be used in the answer when the answer includes
      recv-sub-layer-id.  When the answer does not include recv-sub-
      layer-id, the answer MUST NOT contain a payload type number used
Top   ToC   RFC7798 - Page 67
      in the offer for the media subtype H265 unless the configuration
      is exactly the same as in the offer or the configuration in the
      answer only differs from that in the offer with a different value
      of level-id.  The answer MAY contain the recv-sub-layer-id
      parameter if an HEVC bitstream contains multiple operation points
      (using temporal scalability and sub-layers) and sprop-vps is
      included in the offer where information of sub-layers are present
      in the first video parameter set contained in sprop-vps.  If the
      sprop-vps is provided in an offer, an answerer MAY select a
      particular operation point indicated in the first video parameter
      set contained in sprop-vps.  When the answer includes a recv-sub-
      layer-id that is less than a sprop-sub-layer-id in the offer, all
      video parameter sets contained in the sprop-vps parameter in the
      SDP answer and all video parameter sets sent in-band for either
      the offerer-to-answerer direction or the answerer-to-offerer
      direction MUST be consistent with the first video parameter set in
      the sprop-vps parameter of the offer (see the semantics of sprop-
      vps in Section 7.1 of this document on one video parameter set
      being consistent with another video parameter set), and the
      bitstream sent in either direction MUST conform to the profile,
      tier, level, and constraints of the chosen sub-layer
      representation as indicated by the first profile_tier_level( )
      syntax structure in the first video parameter set in the sprop-vps
      parameter of the offer.

         Informative note: When an offerer receives an answer that does
         not include recv-sub-layer-id, it has to compare payload types
         not declared in the offer based on the media type (i.e.,
         video/H265) and the above media configuration parameters with
         any payload types it has already declared.  This will enable it
         to determine whether the configuration in question is new or if
         it is equivalent to configuration already offered, since a
         different payload type number may be used in the answer.  The
         ability to perform operation point selection enables a receiver
         to utilize the temporal scalable nature of an HEVC bitstream.

   o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
      sprop-depack-buf-bytes describe the properties of an RTP stream,
      and all RTP streams the RTP stream depends on, when present, that
      the offerer or the answerer is sending for the media format
      configuration.  This differs from the normal usage of the
      offer/answer parameters: normally such parameters declare the
      properties of the bitstream or RTP stream that the offerer or the
      answerer is able to receive.  When dealing with HEVC, the offerer
      assumes that the answerer will be able to receive media encoded
      using the configuration being offered.
Top   ToC   RFC7798 - Page 68
         Informative note:  The above parameters apply for any RTP
         stream and all RTP streams the RTP stream depends on, when
         present, sent by a declaring entity with the same
         configuration.  In other words, the applicability of the above
         parameters to RTP streams depends on the source endpoint.
         Rather than being bound to the payload type, the values may
         have to be applied to another payload type when being sent, as
         they apply for the configuration.

   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb, max-
      br, max-tr, and max-tc MAY be used to declare further capabilities
      of the offerer or answerer for receiving.  These parameters MUST
      NOT be present when the direction attribute is sendonly.

   o  The capability parameter max-fps MAY be used to declare lower
      capabilities of the offerer or answerer for receiving.  The
      parameters MUST NOT be present when the direction attribute is
      sendonly.

   o  The capability parameter dec-parallel-cap MAY be used to declare
      additional decoding capabilities of the offerer or answerer for
      receiving.  Upon receiving such a declaration of a receiver, a
      sender MAY send a bitstream to the receiver utilizing those
      capabilities under the assumption that the bitstream fulfills the
      parallelism requirement.  A bitstream that is sent based on
      choosing a capability point with parallel tool type 'w' from dec-
      parallel-cap MUST have entropy_coding_sync_enabled_flag equal to 1
      and min_spatial_segmentation_idc equal to or larger than dec-
      parallel-cap.spatial-seg-idc of the capability point.  A bitstream
      that is sent based on choosing a capability point with parallel
      tool type 't' from dec-parallel-cap MUST have
      entropy_coding_sync_enabled_flag equal to 0 and
      min_spatial_segmentation_idc equal to or larger than dec-parallel-
      cap.spatial-seg-idc of the capability point.

   o  An offerer has to include the size of the de-packetization buffer,
      sprop-depack-buf-bytes, as well as sprop-max-don-diff and sprop-
      depack-buf-nalus, in the offer for an interleaved HEVC bitstream
      or for the MRST or MRMT transmission mode when sprop-max-don-diff
      is greater than 0 for at least one of the RTP streams.  To enable
      the offerer and answerer to inform each other about their
      capabilities for de-packetization buffering in receiving RTP
      streams, both parties are RECOMMENDED to include depack-buf-cap.
      For interleaved RTP streams or in MRST or MRMT, it is also
      RECOMMENDED to consider offering multiple payload types with
      different buffering requirements when the capabilities of the
      receiver are unknown.
Top   ToC   RFC7798 - Page 69
   o  The capability parameter include-dph MAY be used to declare the
      capability to utilize decoded picture hash SEI messages and which
      types of hashes in any HEVC RTP streams received by the offerer or
      answerer.

   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included in
      the "a=fmtp" line of SDP or conveyed using the "fmtp" source
      attribute as specified in Section 6.3 of [RFC5576]), are used for
      out-of-band transport of the parameter sets (VPS, SPS, or PPS,
      respectively).

   o  The answerer MAY use either out-of-band or in-band transport of
      parameter sets for the bitstream it is sending, regardless of
      whether out-of-band parameter sets transport has been used in the
      offerer-to-answerer direction.  Parameter sets included in an
      answer are independent of those parameter sets included in the
      offer, as they are used for decoding two different bitstreams, one
      from the answerer to the offerer and the other in the opposite
      direction.  In case some RTP streams are sent before the SDP
      offer/answer settles down, in-band parameter sets MUST be used for
      those RTP stream parts sent before the SDP offer/answer.

   o  The following rules apply to transport of parameter set in the
      offerer-to-answerer direction.

      +  An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
         If none of these parameters is present in the offer, then only
         in-band transport of parameter sets is used.

      +  If the level to use in the offerer-to-answerer direction is
         equal to the default level in the offer, the answerer MUST be
         prepared to use the parameter sets included in sprop-vps,
         sprop-sps, and sprop-pps (either included in the "a=fmtp" line
         of SDP or conveyed using the "fmtp" source attribute) for
         decoding the incoming bitstream, e.g., by passing these
         parameter set NAL units to the video decoder before passing any
         NAL units carried in the RTP streams.  Otherwise, the answerer
         MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
         included in the "a=fmtp" line of SDP or conveyed using the
         "fmtp" source attribute) and the offerer MUST transmit
         parameter sets in-band.

      +  In MRST or MRMT, the answerer MUST be prepared to use the
         parameter sets out-of-band transmitted for the RTP stream and
         all RTP streams the RTP stream depends on, when present, for
         decoding the incoming bitstream, e.g., by passing these
         parameter set NAL units to the video decoder before passing any
         NAL units carried in the RTP streams.
Top   ToC   RFC7798 - Page 70
   o  The following rules apply to transport of parameter set in the
      answerer-to-offerer direction.

      +  An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
         If none of these parameters is present in the answer, then only
         in-band transport of parameter sets is used.

      +  The offerer MUST be prepared to use the parameter sets included
         in sprop-vps, sprop-sps, and sprop-pps (either included in the
         "a=fmtp" line of SDP or conveyed using the "fmtp" source
         attribute) for decoding the incoming bitstream, e.g., by
         passing these parameter set NAL units to the video decoder
         before passing any NAL units carried in the RTP streams.

      +  In MRST or MRMT, the offerer MUST be prepared to use the
         parameter sets out-of-band transmitted for the RTP stream and
         all RTP streams the RTP stream depends on, when present, for
         decoding the incoming bitstream, e.g., by passing these
         parameter set NAL units to the video decoder before passing any
         NAL units carried in the RTP streams.

   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using the
      "fmtp" source attribute as specified in Section 6.3 of [RFC5576],
      the receiver of the parameters MUST store the parameter sets
      included in sprop-vps, sprop-sps, and/or sprop-pps and associate
      them with the source given as part of the "fmtp" source attribute.
      Parameter sets associated with one source (given as part of the
      "fmtp" source attribute) MUST only be used to decode NAL units
      conveyed in RTP packets from the same source (given as part of the
      "fmtp" source attribute).  When this mechanism is in use, SSRC
      collision detection and resolution MUST be performed as specified
      in [RFC5576].

   For bitstreams being delivered over multicast, the following rules
   apply:

      o  The media format configuration is identified by profile-space,
         profile-id, tier-flag, level-id, interop-constraints, profile-
         compatibility-indicator, and tx-mode.  These media format
         configuration parameters, including level-id, MUST be used
         symmetrically; that is, the answerer MUST either maintain all
         configuration parameters or remove the media format (payload
         type) completely.  Note that this implies that the level-id for
         offer/answer in multicast is not changeable.
Top   ToC   RFC7798 - Page 71
      o  To simplify the handling and matching of these configurations,
         the same RTP payload type number used in the offer SHOULD also
         be used in the answer, as specified in [RFC3264].  An answer
         MUST NOT contain a payload type number used in the offer unless
         the configuration is the same as in the offer.

      o  Parameter sets received MUST be associated with the originating
         source and MUST only be used in decoding the incoming bitstream
         from the same source.

      o  The rules for other parameters are the same as above for
         unicast as long as the three above rules are obeyed.

   Table 1 lists the interpretation of all the parameters that MUST be
   used for the various combinations of offer, answer, and direction
   attributes.  Note that the two columns wherein the recv-sub-layer-id
   parameter is used only apply to answers, whereas the other columns
   apply to both offers and answers.

   Table 1.  Interpretation of parameters for various combinations of
   offers, answers, direction attributes, with and without recv-sub-
   layer-id.  Columns that do not indicate offer or answer apply to
   both.
Top   ToC   RFC7798 - Page 72
                                       sendonly --+
         answer: recvonly, recv-sub-layer-id --+  |
           recvonly w/o recv-sub-layer-id --+  |  |
   answer: sendrecv, recv-sub-layer-id --+  |  |  |
     sendrecv w/o recv-sub-layer-id --+  |  |  |  |
                                      |  |  |  |  |
   profile-space                      C  D  C  D  P
   profile-id                         C  D  C  D  P
   tier-flag                          C  D  C  D  P
   level-id                           D  D  D  D  P
   interop-constraints                C  D  C  D  P
   profile-compatibility-indicator    C  D  C  D  P
   tx-mode                            C  C  C  C  P
   max-recv-level-id                  R  R  R  R  -
   sprop-max-don-diff                 P  P  -  -  P
   sprop-depack-buf-nalus             P  P  -  -  P
   sprop-depack-buf-bytes             P  P  -  -  P
   depack-buf-cap                     R  R  R  R  -
   sprop-segmentation-id              P  P  P  P  P
   sprop-spatial-segmentation-idc     P  P  P  P  P
   max-br                             R  R  R  R  -
   max-cpb                            R  R  R  R  -
   max-dpb                            R  R  R  R  -
   max-lsr                            R  R  R  R  -
   max-lps                            R  R  R  R  -
   max-tr                             R  R  R  R  -
   max-tc                             R  R  R  R  -
   max-fps                            R  R  R  R  -
   sprop-vps                          P  P  -  -  P
   sprop-sps                          P  P  -  -  P
   sprop-pps                          P  P  -  -  P
   sprop-sub-layer-id                 P  P  -  -  P
   recv-sub-layer-id                  X  O  X  O  -
   dec-parallel-cap                   R  R  R  R  -
   include-dph                        R  R  R  R  -

   Legend:

    C: configuration for sending and receiving bitstreams
    D: changeable configuration, same as C except possible
       to answer with a different but consistent value (see the
       semantics of the six parameters related to profile, tier,
       and level on these parameters being consistent)
    P: properties of the bitstream to be sent
    R: receiver capabilities
    O: operation point selection
    X: MUST NOT be present
    -: not usable, when present MUST be ignored
Top   ToC   RFC7798 - Page 73
   Parameters used for declaring receiver capabilities are, in general,
   downgradable; i.e., they express the upper limit for a sender's
   possible behavior.  Thus, a sender MAY select to set its encoder
   using only lower/lesser or equal values of these parameters.

   When the answer does not include a recv-sub-layer-id that is less
   than the sprop-sub-layer-id in the offer, parameters declaring a
   configuration point are not changeable, with the exception of the
   level-id parameter for unicast usage, and these parameters express
   values a receiver expects to be used and MUST be used verbatim in the
   answer as in the offer.

   When a sender's capabilities are declared with the configuration
   parameters, these parameters express a configuration that is
   acceptable for the sender to receive bitstreams.  In order to achieve
   high interoperability levels, it is often advisable to offer multiple
   alternative configurations.  It is impossible to offer multiple
   configurations in a single payload type.  Thus, when multiple
   configuration offers are made, each offer requires its own RTP
   payload type associated with the offer.  However, it is possible to
   offer multiple operation points using one configuration in a single
   payload type by including sprop-vps in the offer and recv-sub-layer-
   id in the answer.

   A receiver SHOULD understand all media type parameters, even if it
   only supports a subset of the payload format's functionality.  This
   ensures that a receiver is capable of understanding when an offer to
   receive media can be downgraded to what is supported by the receiver
   of the offer.

   An answerer MAY extend the offer with additional media format
   configurations.  However, to enable their usage, in most cases a
   second offer is required from the offerer to provide the bitstream
   property parameters that the media sender will use.  This also has
   the effect that the offerer has to be able to receive this media
   format configuration, not only to send it.

7.2.3. Usage in Declarative Session Descriptions

When HEVC over RTP is offered with SDP in a declarative style, as in Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement Protocol (SAP) [RFC2974], the following considerations are necessary.
Top   ToC   RFC7798 - Page 74
      o  All parameters capable of indicating both bitstream properties
         and receiver capabilities are used to indicate only bitstream
         properties.  For example, in this case, the parameter profile-
         tier-level-id declares the values used by the bitstream, not
         the capabilities for receiving bitstreams.  As a result, the
         following interpretation of the parameters MUST be used:

         + Declaring actual configuration or bitstream properties:
            - profile-space
            - profile-id
            - tier-flag
            - level-id
            - interop-constraints
            - profile-compatibility-indicator
            - tx-mode
            - sprop-vps
            - sprop-sps
            - sprop-pps
            - sprop-max-don-diff
            - sprop-depack-buf-nalus
            - sprop-depack-buf-bytes
            - sprop-segmentation-id
            - sprop-spatial-segmentation-idc

         + Not usable (when present, they MUST be ignored):
            - max-lps
            - max-lsr
            - max-cpb
            - max-dpb
            - max-br
            - max-tr
            - max-tc
            - max-fps
            - max-recv-level-id
            - depack-buf-cap
            - sprop-sub-layer-id
            - dec-parallel-cap
            - include-dph

      o  A receiver of the SDP is required to support all parameters and
         values of the parameters provided; otherwise, the receiver MUST
         reject (RTSP) or not participate in (SAP) the session.  It
         falls on the creator of the session to use values that are
         expected to be supported by the receiving application.
Top   ToC   RFC7798 - Page 75

7.2.4. Considerations for Parameter Sets

When out-of-band transport of parameter sets is used, parameter sets MAY still be additionally transported in-band unless explicitly disallowed by an application, and some of these additional parameter sets may update some of the out-of-band transported parameter sets. Update of a parameter set refers to the sending of a parameter set of the same type using the same parameter set ID but with different values for at least one other parameter of the parameter set.

7.2.5. Dependency Signaling in Multi-Stream Mode

If MRST or MRMT is used, the rules on signaling media decoding dependency in SDP as defined in [RFC5583] apply. The rules on "hierarchical or layered encoding" with multicast in Section 5.7 of [RFC4566] do not apply. This means that the notation for Connection Data "c=" SHALL NOT be used with more than one address, i.e., the sub-field <number of addresses> in the sub-field <connection-address> of the "c=" field, described in [RFC4566], must not be present. The order of session dependency is given from the RTP stream containing the lowest temporal sub-layer to the RTP stream containing the highest temporal sub-layer.

8. Use with Feedback Messages

The following subsections define the use of the Picture Loss Indication (PLI), Slice Lost Indication (SLI), Reference Picture Selection Indication (RPSI), and Full Intra Request (FIR) feedback messages with HEVC. The PLI, SLI, and RPSI messages are defined in [RFC4585], and the FIR message is defined in [RFC5104].

8.1. Picture Loss Indication (PLI)

As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a media sender indicates "the loss of an undefined amount of coded video data belonging to one or more pictures". Without having any specific knowledge of the setup of the bitstream (such as use and location of in-band parameter sets, non-IDR decoder refresh points, picture structures, and so forth), a reaction to the reception of an PLI by an HEVC sender SHOULD be to send an IDR picture and relevant parameter sets; potentially with sufficient redundancy so to ensure correct reception. However, sometimes information about the bitstream structure is known. For example, state could have been established outside of the mechanisms defined in this document that parameter sets are conveyed out of band only, and stay static for the duration of the session. In that case, it is obviously unnecessary to send them in-band as a result of the reception of a PLI. Other
Top   ToC   RFC7798 - Page 76
   examples could be devised based on a priori knowledge of different
   aspects of the bitstream structure.  In all cases, the timing and
   congestion control mechanisms of RFC 4585 MUST be observed.

8.2. Slice Loss Indication (SLI)

The SLI described in RFC 4585 can be used to indicate, to a sender, the loss of a number of Coded Tree Blocks (CTBs) in a CTB raster scan order of a picture. In the SLI's Feedback Control Indication (FCI) field, the subfield "First" MUST be set to the CTB address of the first lost CTB. Note that the CTB address is in CTB-raster-scan order of a picture. For the first CTB of a slice segment, the CTB address is the value of slice_segment_address when present, or 0 when the value of first_slice_segment_in_pic_flag is equal to 1; both syntax elements are in the slice segment header. The subfield "Number" MUST be set to the number of consecutive lost CTBs, again in CTB-raster-scan order of a picture. Note that due to both the "First" and "Number" being counted in CTBs in CTB-raster-scan order, of a picture, not in tile-scan order (which is the bitstream order of CTBs), multiple SLI messages may be needed to report the loss of one tile covering multiple CTB rows but less wide than the picture. The subfield "PictureID" MUST be set to the 6 least significant bits of a binary representation of the value of PicOrderCntVal, as defined in [HEVC], of the picture for which the lost CTBs are indicated. Note that for IDR pictures the syntax element slice_pic_order_cnt_lsb is not present, but then the value is inferred to be equal to 0. As described in RFC 4585, an encoder in a media sender can use this information to "clean up" the corrupted picture by sending intra information, while observing the constraints described in RFC 4585, for example, with respect to congestion control. In many cases, error tracking is required to identify the corrupted region in the receiver's state (reference pictures) because of error import in uncorrupted regions of the picture through motion compensation. Reference-picture selection can also be used to "clean up" the corrupted picture, which is usually more efficient and less likely to generate congestion than sending intra information. In contrast to the video codecs contemplated in RFCs 4585 and 5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to 16x16 luma samples, but is variable. That, however, does not create a conceptual difficulty with SLI, because the setting of the CTB size is a sequence-level functionality, and using a slice loss indication across CVS boundaries is meaningless as there is no prediction across sequence boundaries. However, a proper use of SLI messages is not as straightforward as it was with older, fixed-macroblock-sized video
Top   ToC   RFC7798 - Page 77
   codecs, as the state of the sequence parameter set (where the CTB
   size is located) has to be taken into account when interpreting the
   "First" subfield in the FCI.

8.3. Reference Picture Selection Indication (RPSI)

Feedback-based reference picture selection has been shown as a powerful tool to stop temporal error propagation for improved error resilience [Girod99][Wang05]. In one approach, the decoder side tracks errors in the decoded pictures and informs the encoder side that a particular picture that has been decoded relatively earlier is correct and still present in the decoded picture buffer; it requests the encoder to use that correct picture-availability information when encoding the next picture, so to stop further temporal error propagation. For this approach, the decoder side should use the RPSI feedback message. Encoders can encode some long-term reference pictures as specified in H.264 or HEVC for purposes described in the previous paragraph without the need of a huge decoded picture buffer. As shown in [Wang05], with a flexible reference picture management scheme, as in H.264 and HEVC, even a decoded picture buffer size of two picture storage buffers would work for the approach described in the previous paragraph. The field "Native RPSI bit string defined per codec" is a base16 [RFC4648] representation of the 8 bits consisting of the 2 most significant bits equal to 0 and 6 bits of nuh_layer_id, as defined in [HEVC], followed by the 32 bits representing the value of the PicOrderCntVal (in network byte order), as defined in [HEVC], for the picture that is indicated by the RPSI feedback message. The use of the RPSI feedback message as positive acknowledgement with HEVC is deprecated. In other words, the RPSI feedback message MUST only be used as a reference picture selection request, such that it can also be used in multicast.

8.4. Full Intra Request (FIR)

The purpose of the FIR message is to force an encoder to send an independent decoder refresh point as soon as possible (observing, for example, the congestion-control-related constraints set out in RFC 5104). Upon reception of a FIR, a sender MUST send an IDR picture. Parameter sets MUST also be sent, except when there is a priori knowledge that the parameter sets have been correctly established. A
Top   ToC   RFC7798 - Page 78
   typical example for that is an understanding between sender and
   receiver, established by means outside this document, that parameter
   sets are exclusively sent out-of-band.

9. Security Considerations

The scope of this Security Considerations section is limited to the payload format itself and to one feature of HEVC that may pose a particularly serious security risk if implemented naively. The payload format, in isolation, does not form a complete system. Implementers are advised to read and understand relevant security- related documents, especially those pertaining to RTP (see the Security Considerations section in [RFC3550]), and the security of the call-control stack chosen (that may make use of the media type registration of this memo). Implementers should also consider known security vulnerabilities of video coding and decoding implementations in general and avoid those. Within this RTP payload format, and with the exception of the user data SEI message as described below, no security threats other than those common to RTP payload formats are known. In other words, neither the various media-plane-based mechanisms, nor the signaling part of this memo, seems to pose a security risk beyond those common to all RTP-based systems. RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [RFC3550], and in any applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP Does Not Mandate a Single Media Security Solution" [RFC7202] discusses, it is not an RTP payload format's responsibility to discuss or mandate what solutions are used to meet the basic security goals like confidentiality, integrity and source authenticity for RTP in general. This responsibility lays on anyone using RTP in an application. They can find guidance on available security mechanisms and important considerations in "Options for Securing RTP Sessions" [RFC7201]. Applications SHOULD use one or more appropriate strong security mechanisms. The rest of this section discusses the security impacting properties of the payload format itself. Because the data compression used with this payload format is applied end-to-end, any encryption needs to be performed after compression. A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the bitstream that are complex to decode and that cause the receiver to be overloaded. H.265 is particularly vulnerable to such
Top   ToC   RFC7798 - Page 79
   attacks, as it is extremely simple to generate datagrams containing
   NAL units that affect the decoding process of many future NAL units.
   Therefore, the usage of data origin authentication and data integrity
   protection of at least the RTP packet is RECOMMENDED, for example,
   with SRTP [RFC3711].

   Like [H.264], HEVC includes a user data Supplemental Enhancement
   Information (SEI) message.  This SEI message allows inclusion of an
   arbitrary bitstring into the video bitstream.  Such a bitstring could
   include JavaScript, machine code, and other active content.  HEVC
   leaves the handling of this SEI message to the receiving system.  In
   order to avoid harmful side effects of the user data SEI message,
   decoder implementations cannot naively trust its content.  For
   example, it would be a bad and insecure implementation practice to
   forward any JavaScript a decoder implementation detects to a web
   browser.  The safest way to deal with user data SEI messages is to
   simply discard them, but that can have negative side effects on the
   quality of experience by the user.

   End-to-end security with authentication, integrity, or
   confidentiality protection will prevent a MANE from performing media-
   aware operations other than discarding complete packets.  In the case
   of confidentiality protection, it will even be prevented from
   discarding packets in a media-aware way.  To be allowed to perform
   such operations, a MANE is required to be a trusted entity that is
   included in the security context establishment.

10. Congestion Control

Congestion control for RTP SHALL be used in accordance with RTP [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. If best-effort service is being used, an additional requirement is that users of this payload format MUST monitor packet loss to ensure that the packet loss rate is within an acceptable range. Packet loss is considered acceptable if a TCP flow across the same network path, and experiencing the same network conditions, would achieve an average throughput, measured on a reasonable timescale, that is not less than all RTP streams combined is achieving. This condition can be satisfied by implementing congestion-control mechanisms to adapt the transmission rate, the number of layers subscribed for a layered multicast session, or by arranging for a receiver to leave the session if the loss rate is unacceptably high. The bitrate adaptation necessary for obeying the congestion control principle is easily achievable when real-time encoding is used, for example, by adequately tuning the quantization parameter.
Top   ToC   RFC7798 - Page 80
   However, when pre-encoded content is being transmitted, bandwidth
   adaptation requires the pre-coded bitstream to be tailored for such
   adaptivity.  The key mechanism available in HEVC is temporal
   scalability.  A media sender can remove NAL units belonging to higher
   temporal sub-layers (i.e., those NAL units with a high value of TID)
   until the sending bitrate drops to an acceptable range.  HEVC
   contains mechanisms that allow the lightweight identification of
   switching points in temporal enhancement layers, as discussed in
   Section 1.1.2 of this memo.  An HEVC media sender can send packets
   belonging to NAL units of temporal enhancement layers starting from
   these switching points to probe for available bandwidth and to
   utilized bandwidth that has been shown to be available.

   Above mechanisms generally work within a defined profile and level
   and, therefore, no renegotiation of the channel is required.  Only
   when non-downgradable parameters (such as profile) are required to be
   changed does it become necessary to terminate and restart the RTP
   stream(s).  This may be accomplished by using different RTP payload
   types.

   MANEs MAY remove certain unusable packets from the RTP stream when
   that RTP stream was damaged due to previous packet losses.  This can
   help reduce the network load in certain special cases.  For example,
   MANES can remove those FUs where the leading FUs belonging to the
   same NAL unit have been lost or those dependent slice segments when
   the leading slice segments belonging to the same slice have been
   lost, because the trailing FUs or dependent slice segments are
   meaningless to most decoders.  MANES can also remove higher temporal
   scalable layers if the outbound transmission (from the MANE's
   viewpoint) experiences congestion.

11. IANA Considerations

A new media type, as specified in Section 7.1 of this memo, has been registered with IANA.

12. References

12.1. Normative References

[H.264] ITU-T, "Advanced video coding for generic audiovisual services", ITU-T Recommendation H.264, April 2013. [HEVC] ITU-T, "High efficiency video coding", ITU-T Recommendation H.265, April 2013.
Top   ToC   RFC7798 - Page 81
   [ISO23008-2]
             ISO/IEC, "Information technology -- High efficiency coding
             and media delivery in heterogeneous environments -- Part 2:
             High efficiency video coding", ISO/IEC 23008-2, 2013.

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119,
             DOI 10.17487/RFC2119, March 1997,
             <http://www.rfc-editor.org/info/rfc2119>.

   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
             with Session Description Protocol (SDP)", RFC 3264,
             DOI 10.17487/RFC3264, June 2002,
             <http://www.rfc-editor.org/info/rfc3264>.

   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
             Jacobson, "RTP: A Transport Protocol for Real-Time
             Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July
             2003, <http://www.rfc-editor.org/info/rfc3550>.

   [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
             Video Conferences with Minimal Control", STD 65, RFC 3551,
             DOI 10.17487/RFC3551, July 2003,
             <http://www.rfc-editor.org/info/rfc3551>.

   [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
             Norrman, "The Secure Real-time Transport Protocol (SRTP)",
             RFC 3711, DOI 10.17487/RFC3711, March 2004,
             <http://www.rfc-editor.org/info/rfc3711>.

   [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
             Description Protocol", RFC 4566, DOI 10.17487/RFC4566, July
             2006, <http://www.rfc-editor.org/info/rfc4566>.

   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
             "Extended RTP Profile for Real-time Transport Control
             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
             DOI 10.17487/RFC4585, July 2006,
             <http://www.rfc-editor.org/info/rfc4585>.

   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
             Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
             <http://www.rfc-editor.org/info/rfc4648>.

   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
             "Codec Control Messages in the RTP Audio-Visual Profile
             with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
             February 2008, <http://www.rfc-editor.org/info/rfc5104>.
Top   ToC   RFC7798 - Page 82
   [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
             Real-time Transport Control Protocol (RTCP)-Based Feedback
             (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
             2008, <http://www.rfc-editor.org/info/rfc5124>.

   [RFC5234] Crocker, D., Ed., and P. Overell, "Augmented BNF for Syntax
             Specifications: ABNF", STD 68, RFC 5234,
             DOI 10.17487/RFC5234, January 2008,
             <http://www.rfc-editor.org/info/rfc5234>.

   [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media
             Attributes in the Session Description Protocol (SDP)",
             RFC 5576, DOI 10.17487/RFC5576, June 2009,
             <http://www.rfc-editor.org/info/rfc5576>.

   [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding
             Dependency in the Session Description Protocol (SDP)",
             RFC 5583, DOI 10.17487/RFC5583, July 2009,
             <http://www.rfc-editor.org/info/rfc5583>.

12.2. Informative References

[3GPDASH] 3GPP, "Transparent end-to-end Packet-switched Streaming Service (PSS); Progressive Download and Dynamic Adaptive Streaming over HTTP (3GP-DASH)", 3GPP TS 26.247 12.1.0, December 2013. [3GPPFF] 3GPP, "Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244 12.20, December 2013. [CABAC] Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, M., Clare, G., Henry, F., and Duenas, A., "Transform coefficient coding in HEVC", IEEE Transactions on Circuts and Systems for Video Technology, Vol. 22, No. 12, pp. 1765-1777, DOI 10.1109/TCSVT.2012.2223055, December 2012. [Girod99] Girod, B. and Faerber, F., "Feedback-based error control for mobile video transmission", Proceedings of the IEEE, Vol. 87, No. 10, pp. 1707-1723, DOI 10.1109/5.790632, October 1999. [H.265.1] ITU-T, "Conformance specification for ITU-T H.265 high efficiency video coding", ITU-T Recommendation H.265.1, October 2014.
Top   ToC   RFC7798 - Page 83
   [HEVCv2]  Flynn, D., Naccari, M., Rosewarne, C., Sharman, K., Sole,
             J., Sullivan, G. J., and T. Suzuki, "High Efficiency Video
             Coding (HEVC) Range Extensions text specification: Draft
             7", JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting,
             Valencia, Spain, March/April 2014.

   [IS014496-12]
             IS0/IEC, "Information technology - Coding of audio-visual
             objects - Part 12: ISO base media file format", IS0/IEC
             14496-12, 2015.

   [IS015444-12]
             IS0/IEC, "Information technology - JPEG 2000 image coding
             system - Part 12: ISO base media file format", IS0/IEC
             15444-12, 2015.

   [JCTVC-J0107]
             Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K.,
             "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th
             JCT-VC meeting, Stockholm, Sweden, July 2012.

   [MPEG2S]  ISO/IEC, "Information technology - Generic coding of moving
             pictures and associated audio information - Part 1:
             Systems", ISO International Standard 13818-1, 2013.

   [MPEGDASH] ISO/IEC, "Information technology - Dynamic adaptive
             streaming over HTTP (DASH) -- Part 1: Media presentation
             description and segment formats", ISO International
             Standard 23009-1, 2012.

   [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
             Streaming Protocol (RTSP)", RFC 2326, DOI 10.17487/RFC2326,
             April 1998, <http://www.rfc-editor.org/info/rfc2326>.

   [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
             Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
             October 2000, <http://www.rfc-editor.org/info/rfc2974>.

   [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
             Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010,
             <http://www.rfc-editor.org/info/rfc6051>.

   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
             Payload Format for H.264 Video", RFC 6184,
             DOI 10.17487/RFC6184, May 2011,
             <http://www.rfc-editor.org/info/rfc6184>.
Top   ToC   RFC7798 - Page 84
   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. Eleftheriadis,
             "RTP Payload Format for Scalable Video Coding", RFC 6190,
             DOI 10.17487/RFC6190, May 2011,
             <http://www.rfc-editor.org/info/rfc6190>.

   [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
             Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
             <http://www.rfc-editor.org/info/rfc7201>.

   [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP Framework:
             Why RTP Does Not Mandate a Single Media Security Solution",
             RFC 7202, DOI 10.17487/RFC7202, April 2014,
             <http://www.rfc-editor.org/info/rfc7202>.

   [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
             B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for
             Real-Time Transport Protocol (RTP) Sources", RFC 7656,
             DOI 10.17487/RFC7656, November 2015,
             <http://www.rfc-editor.org/info/rfc7656>.

   [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
             DOI 10.17487/RFC7667, November 2015,
             <http://www.rfc-editor.org/info/rfc7667>.

   [RTP-MULTI-STREAM]
             Lennox, J., Westerlund, M., Wu, Q., and C. Perkins,
             "Sending Multiple Media Streams in a Single RTP Session",
             Work in Progress, draft-ietf-avtcore-rtp-multi-stream-11,
             December 2015.

   [SDP-NEG] Holmberg, C., Alvestrand, H., and C. Jennings, "Negotiating
             Medai Multiplexing Using Session Description Protocol
             (SDP)", Work in Progress,
             draft-ietf-mmusic-sdp-bundle-negotiation-25, January 2016.

   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
             coding using flexible reference fames", Visual
             Communications and Image Processing 2005 (VCIP 2005),
             Beijing, China, July 2005.
Top   ToC   RFC7798 - Page 85

Acknowledgements

Muhammed Coban and Marta Karczewicz are thanked for discussions on the specification of the use with feedback messages and other aspects in this memo. Jonathan Lennox and Jill Boyce are thanked for their contributions to the PACI design included in this memo. Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and Tom Kristensen are thanked for their contributions to signaling related to parallel processing. Magnus Westerlund, Jonathan Lennox, Bernard Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, Danny Hong, Bo Burman, Ben Campbell, Brian Carpenter, Qin Wu, Stephen Farrell, and Min Wang made valuable review comments that led to improvements.
Top   ToC   RFC7798 - Page 86

Authors' Addresses

Ye-Kui Wang Qualcomm Incorporated 5775 Morehouse Drive San Diego, CA 92121 United States Phone: +1-858-651-8345 Email: yekui.wang@gmail.com Yago Sanchez Fraunhofer HHI Einsteinufer 37 D-10587 Berlin Germany Phone: +49 30 31002-663 Email: yago.sanchez@hhi.fraunhofer.de Thomas Schierl Fraunhofer HHI Einsteinufer 37 D-10587 Berlin Germany Phone: +49-30-31002-227 Email: thomas.schierl@hhi.fraunhofer.de Stephan Wenger Vidyo, Inc. 433 Hackensack Ave., 7th floor Hackensack, NJ 07601 United States Phone: +1-415-713-5473 Email: stewe@stewe.org Miska M. Hannuksela Nokia Corporation P.O. Box 1000 33721 Tampere Finland Phone: +358-7180-08000 Email: miska.hannuksela@nokia.com