Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 3984

RTP Payload Format for H.264 Video

Pages: 83
Obsoleted by:  6184
Part 2 of 3 – Pages 31 to 62
First   Prev   Next

ToP   noToC   RFC3984 - Page 31   prevText

6. Packetization Rules

The packetization modes are introduced in section 5.2. The packetization rules common to more than one of the packetization modes are specified in section 6.1. The packetization rules for the single NAL unit mode, the non-interleaved mode, and the interleaved mode are specified in sections 6.2, 6.3, and 6.4, respectively.

6.1. Common Packetization Rules

All senders MUST enforce the following packetization rules regardless of the packetization mode in use: o Coded slice NAL units or coded slice data partition NAL units belonging to the same coded picture (and thus sharing the same RTP timestamp value) MAY be sent in any order permitted by the applicable profile defined in [1]; however, for delay-critical systems, they SHOULD be sent in their original coding order to minimize the delay. Note that the coding order is not necessarily the scan order, but the order the NAL packets become available to the RTP stack. o Parameter sets are handled in accordance with the rules and recommendations given in section 8.4. o MANEs MUST NOT duplicate any NAL unit except for sequence or picture parameter set NAL units, as neither this memo nor the H.264 specification provides means to identify duplicated NAL units. Sequence and picture parameter set NAL units MAY be duplicated to make their correct reception more probable, but any such duplication MUST NOT affect the contents of any active sequence or picture parameter set. Duplication SHOULD be
ToP   noToC   RFC3984 - Page 32
      performed on the application layer and not by duplicating RTP
      packets (with identical sequence numbers).

   Senders using the non-interleaved mode and the interleaved mode MUST
   enforce the following packetization rule:

   o  MANEs MAY convert single NAL unit packets into one aggregation
      packet, convert an aggregation packet into several single NAL unit
      packets, or mix both concepts, in an RTP translator.  The RTP
      translator SHOULD take into account at least the following
      parameters: path MTU size, unequal protection mechanisms (e.g.,
      through packet-based FEC according to RFC 2733 [18], especially
      for sequence and picture parameter set NAL units and coded slice
      data partition A NAL units), bearable latency of the system, and
      buffering capabilities of the receiver.

      Informative note: An RTP translator is required to handle RTCP as
      per RFC 3550.

6.2. Single NAL Unit Mode

This mode is in use when the value of the OPTIONAL packetization-mode MIME parameter is equal to 0, the packetization-mode is not present, or no other packetization mode is signaled by external means. All receivers MUST support this mode. It is primarily intended for low- delay applications that are compatible with systems using ITU-T Recommendation H.241 [15] (see section 12.1). Only single NAL unit packets MAY be used in this mode. STAPs, MTAPs, and FUs MUST NOT be used. The transmission order of single NAL unit packets MUST comply with the NAL unit decoding order.

6.3. Non-Interleaved Mode

This mode is in use when the value of the OPTIONAL packetization-mode MIME parameter is equal to 1 or the mode is turned on by external means. This mode SHOULD be supported. It is primarily intended for low-delay applications. Only single NAL unit packets, STAP-As, and FU-As MAY be used in this mode. STAP-Bs, MTAPs, and FU-Bs MUST NOT be used. The transmission order of NAL units MUST comply with the NAL unit decoding order.
ToP   noToC   RFC3984 - Page 33

6.4. Interleaved Mode

This mode is in use when the value of the OPTIONAL packetization-mode MIME parameter is equal to 2 or the mode is turned on by external means. Some receivers MAY support this mode. STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used. STAP-As and single NAL unit packets MUST NOT be used. The transmission order of packets and NAL units is constrained as specified in section 5.5.

7. De-Packetization Process (Informative)

The de-packetization process is implementation dependent. Therefore, the following description should be seen as an example of a suitable implementation. Other schemes may be used as well. Optimizations relative to the described algorithms are likely possible. Section 7.1 presents the de-packetization process for the single NAL unit and non-interleaved packetization modes, whereas section 7.2 describes the process for the interleaved mode. Section 7.3 includes additional decapsulation guidelines for intelligent receivers. All normal RTP mechanisms related to buffer management apply. In particular, duplicated or outdated RTP packets (as indicated by the RTP sequences number and the RTP timestamp) are removed. To determine the exact time for decoding, factors such as a possible intentional delay to allow for proper inter-stream synchronization must be factored in.

7.1. Single NAL Unit and Non-Interleaved Mode

The receiver includes a receiver buffer to compensate for transmission delay jitter. The receiver stores incoming packets in reception order into the receiver buffer. Packets are decapsulated in RTP sequence number order. If a decapsulated packet is a single NAL unit packet, the NAL unit contained in the packet is passed directly to the decoder. If a decapsulated packet is an STAP-A, the NAL units contained in the packet are passed to the decoder in the order in which they are encapsulated in the packet. If a decapsulated packet is an FU-A, all the fragments of the fragmented NAL unit are concatenated and passed to the decoder. Informative note: If the decoder supports Arbitrary Slice Order, coded slices of a picture can be passed to the decoder in any order regardless of their reception and transmission order.
ToP   noToC   RFC3984 - Page 34

7.2. Interleaved Mode

The general concept behind these de-packetization rules is to reorder NAL units from transmission order to the NAL unit decoding order. The receiver includes a receiver buffer, which is used to compensate for transmission delay jitter and to reorder packets from transmission order to the NAL unit decoding order. In this section, the receiver operation is described under the assumption that there is no transmission delay jitter. To make a difference from a practical receiver buffer that is also used for compensation of transmission delay jitter, the receiver buffer is here after called the deinterleaving buffer in this section. Receivers SHOULD also prepare for transmission delay jitter; i.e., either reserve separate buffers for transmission delay jitter buffering and deinterleaving buffering or use a receiver buffer for both transmission delay jitter and deinterleaving. Moreover, receivers SHOULD take transmission delay jitter into account in the buffering operation; e.g., by additional initial buffering before starting of decoding and playback. This section is organized as follows: subsection 7.2.1 presents how to calculate the size of the deinterleaving buffer. Subsection 7.2.2 specifies the receiver process how to organize received NAL units to the NAL unit decoding order.

7.2.1. Size of the Deinterleaving Buffer

When SDP Offer/Answer model or any other capability exchange procedure is used in session setup, the properties of the received stream SHOULD be such that the receiver capabilities are not exceeded. In the SDP Offer/Answer model, the receiver can indicate its capabilities to allocate a deinterleaving buffer with the deint- buf-cap MIME parameter. The sender indicates the requirement for the deinterleaving buffer size with the sprop-deint-buf-req MIME parameter. It is therefore RECOMMENDED to set the deinterleaving buffer size, in terms of number of bytes, equal to or greater than the value of sprop-deint-buf-req MIME parameter. See section 8.1 for further information on deint-buf-cap and sprop-deint-buf-req MIME parameters and section 8.2.2 for further information on their use in SDP Offer/Answer model. When a declarative session description is used in session setup, the sprop-deint-buf-req MIME parameter signals the requirement for the deinterleaving buffer size. It is therefore RECOMMENDED to set the deinterleaving buffer size, in terms of number of bytes, equal to or greater than the value of sprop-deint-buf-req MIME parameter.
ToP   noToC   RFC3984 - Page 35

7.2.2. Deinterleaving Process

There are two buffering states in the receiver: initial buffering and buffering while playing. Initial buffering occurs when the RTP session is initialized. After initial buffering, decoding and playback is started, and the buffering-while-playing mode is used. Regardless of the buffering state, the receiver stores incoming NAL units, in reception order, in the deinterleaving buffer as follows. NAL units of aggregation packets are stored in the deinterleaving buffer individually. The value of DON is calculated and stored for all NAL units. The receiver operation is described below with the help of the following functions and constants: o Function AbsDON is specified in section 8.1. o Function don_diff is specified in section 5.5. o Constant N is the value of the OPTIONAL sprop-interleaving-depth MIME type parameter (see section 8.1) incremented by 1. Initial buffering lasts until one of the following conditions is fulfilled: o There are N VCL NAL units in the deinterleaving buffer. o If sprop-max-don-diff is present, don_diff(m,n) is greater than the value of sprop-max-don-diff, in which n corresponds to the NAL unit having the greatest value of AbsDON among the received NAL units and m corresponds to the NAL unit having the smallest value of AbsDON among the received NAL units. o Initial buffering has lasted for the duration equal to or greater than the value of the OPTIONAL sprop-init-buf-time MIME parameter. The NAL units to be removed from the deinterleaving buffer are determined as follows: o If the deinterleaving buffer contains at least N VCL NAL units, NAL units are removed from the deinterleaving buffer and passed to the decoder in the order specified below until the buffer contains N-1 VCL NAL units.
ToP   noToC   RFC3984 - Page 36
   o  If sprop-max-don-diff is present, all NAL units m for which
      don_diff(m,n) is greater than sprop-max-don-diff are removed from
      the deinterleaving buffer and passed to the decoder in the order
      specified below.  Herein, n corresponds to the NAL unit having the
      greatest value of AbsDON among the received NAL units.

   The order in which NAL units are passed to the decoder is specified
   as follows:

   o  Let PDON be a variable that is initialized to 0 at the beginning
      of the an RTP session.

   o  For each NAL unit associated with a value of DON, a DON distance
      is calculated as follows.  If the value of DON of the NAL unit is
      larger than the value of PDON, the DON distance is equal to DON -
      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
      + 1.

   o  NAL units are delivered to the decoder in ascending order of DON
      distance.  If several NAL units share the same value of DON
      distance, they can be passed to the decoder in any order.

   o  When a desired number of NAL units have been passed to the
      decoder, the value of PDON is set to the value of DON for the last
      NAL unit passed to the decoder.

7.3. Additional De-Packetization Guidelines

The following additional de-packetization rules may be used to implement an operational H.264 de-packetizer: o Intelligent RTP receivers (e.g., in gateways) may identify lost coded slice data partitions A (DPAs). If a lost DPA is found, a gateway may decide not to send the corresponding coded slice data partitions B and C, as their information is meaningless for H.264 decoders. In this way a MANE can reduce network load by discarding useless packets without parsing a complex bitstream. o Intelligent RTP receivers (e.g., in gateways) may identify lost FUs. If a lost FU is found, a gateway may decide not to send the following FUs of the same fragmented NAL unit, as their information is meaningless for H.264 decoders. In this way a MANE can reduce network load by discarding useless packets without parsing a complex bitstream.
ToP   noToC   RFC3984 - Page 37
   o  Intelligent receivers having to discard packets or NALUs should
      first discard all packets/NALUs in which the value of the NRI
      field of the NAL unit type octet is equal to 0.  This will
      minimize the impact on user experience and keep the reference
      pictures intact.  If more packets have to be discarded, then
      packets with a numerically lower NRI value should be discarded
      before packets with a numerically higher NRI value.  However,
      discarding any packets with an NRI bigger than 0 very likely leads
      to decoder drift and SHOULD be avoided.

8. Payload Format Parameters

This section specifies the parameters that MAY be used to select optional features of the payload format and certain features of the bitstream. The parameters are specified here as part of the MIME subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec. A mapping of the parameters into the Session Description Protocol (SDP) [5] is also provided for applications that use SDP. Equivalent parameters could be defined elsewhere for use with control protocols that do not use MIME or SDP. Some parameters provide a receiver with the properties of the stream that will be sent. The name of all these parameters starts with "sprop" for stream properties. Some of these "sprop" parameters are limited by other payload or codec configuration parameters. For example, the sprop-parameter-sets parameter is constrained by the profile-level-id parameter. The media sender selects all "sprop" parameters rather than the receiver. This uncommon characteristic of the "sprop" parameters may not be compatible with some signaling protocol concepts, in which case the use of these parameters SHOULD be avoided.

8.1. MIME Registration

The MIME subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is allocated from the IETF tree. The receiver MUST ignore any unspecified parameter. Media Type name: video Media subtype name: H264 Required parameters: none
ToP   noToC   RFC3984 - Page 38
   OPTIONAL parameters:
       profile-level-id:
                        A base16 [6] (hexadecimal) representation of
                        the following three bytes in the sequence
                        parameter set NAL unit specified in [1]: 1)
                        profile_idc, 2) a byte herein referred to as
                        profile-iop, composed of the values of
                        constraint_set0_flag, constraint_set1_flag,
                        constraint_set2_flag, and reserved_zero_5bits
                        in bit-significance order, starting from the
                        most significant bit, and 3) level_idc.  Note
                        that reserved_zero_5bits is required to be
                        equal to 0 in [1], but other values for it may
                        be specified in the future by ITU-T or ISO/IEC.

                        If the profile-level-id parameter is used to
                        indicate properties of a NAL unit stream, it
                        indicates the profile and level that a decoder
                        has to support in order to comply with [1] when
                        it decodes the stream.  The profile-iop byte
                        indicates whether the NAL unit stream also
                        obeys all constraints of the indicated profiles
                        as follows.  If bit 7 (the most significant
                        bit), bit 6, or bit 5 of profile-iop is equal
                        to 1, all constraints of the Baseline profile,
                        the Main profile, or the Extended profile,
                        respectively, are obeyed in the NAL unit
                        stream.

                        If the profile-level-id parameter is used for
                        capability exchange or session setup procedure,
                        it indicates the profile that the codec
                        supports and the highest level
                        supported for the signaled profile.  The
                        profile-iop byte indicates whether the codec
                        has additional limitations whereby only the
                        common subset of the algorithmic features and
                        limitations of the profiles signaled with the
                        profile-iop byte and of the profile indicated
                        by profile_idc is supported by the codec.  For
                        example, if a codec supports only the common
                        subset of the coding tools of the Baseline
                        profile and the Main profile at level 2.1 and
                        below, the profile-level-id becomes 42E015, in
                        which 42 stands for the Baseline profile, E0
                        indicates that only the common subset for all
                        profiles is supported, and 15 indicates level
                        2.1.
ToP   noToC   RFC3984 - Page 39
                            Informative note: Capability exchange and
                            session setup procedures should provide
                            means to list the capabilities for each
                            supported codec profile separately.  For
                            example, the one-of-N codec selection
                            procedure of the SDP Offer/Answer model can
                            be used (section 10.2 of [7]).

                        If no profile-level-id is present, the Baseline
                        Profile without additional constraints at Level
                        1 MUST be implied.

       max-mbps, max-fs, max-cpb, max-dpb, and max-br:
                        These parameters MAY be used to signal the
                        capabilities of a receiver implementation.
                        These parameters MUST NOT be used for any other
                        purpose.  The profile-level-id parameter MUST
                        be present in the same receiver capability
                        description that contains any of these
                        parameters.  The level conveyed in the value of
                        the profile-level-id parameter MUST be such
                        that the receiver is fully capable of
                        supporting.  max-mbps, max-fs, max-cpb, max-
                        dpb, and max-br MAY be used to indicate
                        capabilities of the receiver that extend the
                        required capabilities of the signaled level, as
                        specified below.

                        When more than one parameter from the set (max-
                        mbps, max-fs, max-cpb, max-dpb, max-br) is
                        present, the receiver MUST support all signaled
                        capabilities simultaneously.  For example, if
                        both max-mbps and max-br are present, the
                        signaled level with the extension of both the
                        frame rate and bit rate is supported.  That is,
                        the receiver is able to decode NAL unit
                        streams in which the macroblock processing rate
                        is up to max-mbps (inclusive), the bit rate is
                        up to max-br (inclusive), the coded picture
                        buffer size is derived as specified in the
                        semantics of the max-br parameter below, and
                        other properties comply with the level
                        specified in the value of the profile-level-id
                        parameter.

                        A receiver MUST NOT signal values of max-
                        mbps, max-fs, max-cpb, max-dpb, and max-br that
                        meet the requirements of a higher level,
ToP   noToC   RFC3984 - Page 40
                        referred to as level A herein, compared to the
                        level specified in the value of the profile-
                        level-id parameter, if the receiver can support
                        all the properties of level A.

                            Informative note: When the OPTIONAL MIME
                            type parameters are used to signal the
                            properties of a NAL unit stream, max-mbps,
                            max-fs, max-cpb, max-dpb, and max-br are
                            not present, and the value of profile-
                            level-id must always be such that the NAL
                            unit stream complies fully with the
                            specified profile and level.

       max-mbps:        The value of max-mbps is an integer indicating
                        the maximum macroblock processing rate in units
                        of macroblocks per second.  The max-mbps
                        parameter signals that the receiver is capable
                        of decoding video at a higher rate than is
                        required by the signaled level conveyed in the
                        value of the profile-level-id parameter.  When
                        max-mbps is signaled, the receiver MUST be able
                        to decode NAL unit streams that conform to the
                        signaled level, with the exception that the
                        MaxMBPS value in Table A-1 of [1] for the
                        signaled level is replaced with the value of
                        max-mbps.  The value of max-mbps MUST be
                        greater than or equal to the value of MaxMBPS
                        for the level given in Table A-1 of [1].
                        Senders MAY use this knowledge to send pictures
                        of a given size at a higher picture rate than
                        is indicated in the signaled level.

       max-fs:          The value of max-fs is an integer indicating
                        the maximum frame size in units of macroblocks.
                        The max-fs parameter signals that the receiver
                        is capable of decoding larger picture sizes
                        than are required by the signaled level conveyed
                        in the value of the profile-level-id parameter.
                        When max-fs is signaled, the receiver MUST be
                        able to decode NAL unit streams that conform to
                        the signaled level, with the exception that the
                        MaxFS value in Table A-1 of [1] for the
                        signaled level is replaced with the value of
                        max-fs.  The value of max-fs MUST be greater
                        than or equal to the value of MaxFS for the
                        level given in Table A-1 of [1].  Senders MAY
                        use this knowledge to send larger pictures at a
ToP   noToC   RFC3984 - Page 41
                        proportionally lower frame rate than is
                        indicated in the signaled level.

       max-cpb          The value of max-cpb is an integer indicating
                        the maximum coded picture buffer size in units
                        of 1000 bits for the VCL HRD parameters (see
                        A.3.1 item i of [1]) and in units of 1200 bits
                        for the NAL HRD parameters (see A.3.1 item j of
                        [1]).  The max-cpb parameter signals that the
                        receiver has more memory than the minimum
                        amount of coded picture buffer memory required
                        by the signaled level conveyed in the value of
                        the profile-level-id parameter.  When max-cpb
                        is signaled, the receiver MUST be able to
                        decode NAL unit streams that conform to the
                        signaled level, with the exception that the
                        MaxCPB value in Table A-1 of [1] for the
                        signaled level is replaced with the value of
                        max-cpb.  The value of max-cpb MUST be greater
                        than or equal to the value of MaxCPB for the
                        level given in Table A-1 of [1].  Senders MAY
                        use this knowledge to construct coded video
                        streams with greater variation of bit rate
                        than can be achieved with the
                        MaxCPB value in Table A-1 of [1].

                            Informative note: The coded picture buffer
                            is used in the hypothetical reference
                            decoder (Annex C) of H.264.  The use of the
                            hypothetical reference decoder is
                            recommended in H.264 encoders to verify
                            that the produced bitstream conforms to the
                            standard and to control the output bitrate.
                            Thus, the coded picture buffer is
                            conceptually independent of any other
                            potential buffers in the receiver,
                            including de-interleaving and de-jitter
                            buffers.  The coded picture buffer need not
                            be implemented in decoders as specified in
                            Annex C of H.264, but rather standard-
                            compliant decoders can have any buffering
                            arrangements provided that they can decode
                            standard-compliant bitstreams.  Thus, in
                            practice, the input buffer for video
                            decoder can be integrated with de-
                            interleaving and de-jitter buffers of the
                            receiver.
ToP   noToC   RFC3984 - Page 42
       max-dpb:         The value of max-dpb is an integer indicating
                        the maximum decoded picture buffer size in
                        units of 1024 bytes.  The max-dpb parameter
                        signals that the receiver has more memory than
                        the minimum amount of decoded picture buffer
                        memory required by the signaled level conveyed
                        in the value of the profile-level-id parameter.
                        When max-dpb is signaled, the receiver MUST be
                        able to decode NAL unit streams that conform to
                        the signaled level, with the exception that the
                        MaxDPB value in Table A-1 of [1] for the
                        signaled level is replaced with the value of
                        max-dpb.  Consequently, a receiver that signals
                        max-dpb MUST be capable of storing the
                        following number of decoded frames,
                        complementary field pairs, and non-paired
                        fields in its decoded picture buffer:

                        Min(1024 * max-dpb / ( PicWidthInMbs *
                        FrameHeightInMbs * 256 * ChromaFormatFactor ),
                        16)

                        PicWidthInMbs, FrameHeightInMbs, and
                        ChromaFormatFactor are defined in [1].

                        The value of max-dpb MUST be greater than or
                        equal to the value of MaxDPB for the level
                        given in Table A-1 of [1].  Senders MAY use
                        this knowledge to construct coded video streams
                        with improved compression.

                            Informative note: This parameter was added
                            primarily to complement a similar codepoint
                            in the ITU-T Recommendation H.245, so as to
                            facilitate signaling gateway designs.  The
                            decoded picture buffer stores reconstructed
                            samples and is a property of the video
                            decoder only.  There is no relationship
                            between the size of the decoded picture
                            buffer and the buffers used in RTP,
                            especially de-interleaving and de-jitter
                            buffers.

       max-br:          The value of max-br is an integer indicating
                        the maximum video bit rate in units of 1000
                        bits per second for the VCL HRD parameters (see
                        A.3.1 item i of [1]) and in units of 1200 bits
ToP   noToC   RFC3984 - Page 43
                        per second for the NAL HRD parameters (see
                        A.3.1 item j of [1]).

                        The max-br parameter signals that the video
                        decoder of the receiver is capable of decoding
                        video at a higher bit rate than is required by
                        the signaled level conveyed in the value of the
                        profile-level-id parameter.  The value of max-
                        br MUST be greater than or equal to the value
                        of MaxBR for the level given in Table A-1 of
                        [1].

                        When max-br is signaled, the video codec of the
                        receiver MUST be able to decode NAL unit
                        streams that conform to the signaled level,
                        conveyed in the profile-level-id parameter,
                        with the following exceptions in the limits
                        specified by the level:
                        o The value of max-br replaces the MaxBR value
                          of the signaled level (in Table A-1 of [1]).
                        o When the max-cpb parameter is not present,
                          the result of the following formula replaces
                          the value of MaxCPB in Table A-1 of [1]:
                          (MaxCPB of the signaled level) * max-br /
                          (MaxBR of the signaled level).

                        For example, if a receiver signals capability
                        for Level 1.2 with max-br equal to 1550, this
                        indicates a maximum video bitrate of 1550
                        kbits/sec for VCL HRD parameters, a maximum
                        video bitrate of 1860 kbits/sec for NAL HRD
                        parameters, and a CPB size of 4036458 bits
                        (1550000 / 384000 * 1000 * 1000).

                        The value of max-br MUST be greater than or
                        equal to the value MaxBR for the signaled level
                        given in Table A-1 of [1].

                        Senders MAY use this knowledge to send higher
                        bitrate video as allowed in the level
                        definition of Annex A of H.264, to achieve
                        improved video quality.

                            Informative note: This parameter was added
                            primarily to complement a similar codepoint
                            in the ITU-T Recommendation H.245, so as to
                            facilitate signaling gateway designs.  No
                            assumption can be made from the value of
ToP   noToC   RFC3984 - Page 44
                            this parameter that the network is capable
                            of handling such bit rates at any given
                            time.  In particular, no conclusion can be
                            drawn that the signaled bit rate is
                            possible under congestion control
                            constraints.

      redundant-pic-cap:
                        This parameter signals the capabilities of a
                        receiver implementation.  When equal to 0, the
                        parameter indicates that the receiver makes no
                        attempt to use redundant coded pictures to
                        correct incorrectly decoded primary coded
                        pictures.  When equal to 0, the receiver is not
                        capable of using redundant slices; therefore, a
                        sender SHOULD avoid sending redundant slices to
                        save bandwidth.  When equal to 1, the receiver
                        is capable of decoding any such redundant slice
                        that covers a corrupted area in a primary
                        decoded picture (at least partly), and therefore
                        a sender MAY send redundant slices.  When the
                        parameter is not present, then a value of 0
                        MUST be used for redundant-pic-cap.  When
                        present, the value of redundant-pic-cap MUST be
                        either 0 or 1.

                        When the profile-level-id parameter is present
                        in the same capability signaling as the
                        redundant-pic-cap parameter, and the profile
                        indicated in profile-level-id is such that it
                        disallows the use of redundant coded pictures
                        (e.g., Main Profile), the value of redundant-
                        pic-cap MUST be equal to 0.  When a receiver
                        indicates redundant-pic-cap equal to 0, the
                        received stream SHOULD NOT contain redundant
                        coded pictures.

                            Informative note: Even if redundant-pic-cap
                            is equal to 0, the decoder is able to
                            ignore redundant codec pictures provided
                            that the decoder supports such a profile
                            (Baseline, Extended) in which redundant
                            coded pictures are allowed.

                            Informative note: Even if redundant-pic-cap
                            is equal to 1, the receiver may also choose
                            other error concealment strategies to
ToP   noToC   RFC3984 - Page 45
                            replace or complement decoding of redundant
                            slices.

       sprop-parameter-sets:
                        This parameter MAY be used to convey
                        any sequence and picture parameter set NAL
                        units (herein referred to as the initial
                        parameter set NAL units) that MUST precede any
                        other NAL units in decoding order.  The
                        parameter MUST NOT be used to indicate codec
                        capability in any capability exchange
                        procedure.  The value of the parameter is the
                        base64 [6] representation of the initial
                        parameter set NAL units as specified in
                        sections 7.3.2.1 and 7.3.2.2 of [1].  The
                        parameter sets are conveyed in decoding order,
                        and no framing of the parameter set NAL units
                        takes place.  A comma is used to separate any
                        pair of parameter sets in the list.  Note that
                        the number of bytes in a parameter set NAL unit
                        is typically less than 10, but a picture
                        parameter set NAL unit can contain several
                        hundreds of bytes.

                           Informative note: When several payload
                           types are offered in the SDP Offer/Answer
                           model, each with its own sprop-parameter-
                           sets parameter, then the receiver cannot
                           assume that those parameter sets do not use
                           conflicting storage locations (i.e.,
                           identical values of parameter set
                           identifiers).  Therefore, a receiver should
                           double-buffer all sprop-parameter-sets and
                           make them available to the decoder instance
                           that decodes a certain payload type.

       parameter-add:   This parameter MAY be used to signal whether
                        the receiver of this parameter is allowed to
                        add parameter sets in its signaling response
                        using the sprop-parameter-sets MIME parameter.
                        The value of this parameter is either 0 or 1.
                        0 is equal to false; i.e., it is not allowed to
                        add parameter sets.  1 is equal to true; i.e.,
                        it is allowed to add parameter sets.  If the
                        parameter is not present, its value MUST be 1.
ToP   noToC   RFC3984 - Page 46
       packetization-mode:
                        This parameter signals the properties of an
                        RTP payload type or the capabilities of a
                        receiver implementation.  Only a single
                        configuration point can be indicated; thus,
                        when capabilities to support more than one
                        packetization-mode are declared, multiple
                        configuration points (RTP payload types) must
                        be used.

                        When the value of packetization-mode is equal
                        to 0 or packetization-mode is not present, the
                        single NAL mode, as defined in section 6.2 of
                        RFC 3984, MUST be used.  This mode is in use in
                        standards using ITU-T Recommendation H.241 [15]
                        (see section 12.1).  When the value of
                        packetization-mode is equal to 1, the non-
                        interleaved mode, as defined in section 6.3 of
                        RFC 3984, MUST be used.  When the value of
                        packetization-mode is equal to 2, the
                        interleaved mode, as defined in section 6.4 of
                        RFC 3984, MUST be used.  The value of
                        packetization mode MUST be an integer in the
                        range of 0 to 2, inclusive.

       sprop-interleaving-depth:
                        This parameter MUST NOT be present
                        when packetization-mode is not present or the
                        value of packetization-mode is equal to 0 or 1.
                        This parameter MUST be present when the value
                        of packetization-mode is equal to 2.

                        This parameter signals the properties of a NAL
                        unit stream.  It specifies the maximum number
                        of VCL NAL units that precede any VCL NAL unit
                        in the NAL unit stream in transmission order
                        and follow the VCL NAL unit in decoding order.
                        Consequently, it is guaranteed that receivers
                        can reconstruct NAL unit decoding order when
                        the buffer size for NAL unit decoding order
                        recovery is at least the value of sprop-
                        interleaving-depth + 1 in terms of VCL NAL
                        units.

                        The value of sprop-interleaving-depth MUST be
                        an integer in the range of 0 to 32767,
                        inclusive.
ToP   noToC   RFC3984 - Page 47
       sprop-deint-buf-req:
                        This parameter MUST NOT be present when
                        packetization-mode is not present or the value
                        of packetization-mode is equal to 0 or 1.  It
                        MUST be present when the value of
                        packetization-mode is equal to 2.

                        sprop-deint-buf-req signals the required size
                        of the deinterleaving buffer for the NAL unit
                        stream.  The value of the parameter MUST be
                        greater than or equal to the maximum buffer
                        occupancy (in units of bytes) required in such
                        a deinterleaving buffer that is specified in
                        section 7.2 of RFC 3984.  It is guaranteed that
                        receivers can perform the deinterleaving of
                        interleaved NAL units into NAL unit decoding
                        order, when the deinterleaving buffer size is
                        at least the value of sprop-deint-buf-req in
                        terms of bytes.

                        The value of sprop-deint-buf-req MUST be an
                        integer in the range of 0 to 4294967295,
                        inclusive.

                            Informative note: sprop-deint-buf-req
                            indicates the required size of the
                            deinterleaving buffer only.  When network
                            jitter can occur, an appropriately sized
                            jitter buffer has to be provisioned for
                            as well.

       deint-buf-cap:   This parameter signals the capabilities of a
                        receiver implementation and indicates the
                        amount of deinterleaving buffer space in units
                        of bytes that the receiver has available for
                        reconstructing the NAL unit decoding order.  A
                        receiver is able to handle any stream for which
                        the value of the sprop-deint-buf-req parameter
                        is smaller than or equal to this parameter.

                        If the parameter is not present, then a value
                        of 0 MUST be used for deint-buf-cap.  The value
                        of deint-buf-cap MUST be an integer in the
                        range of 0 to 4294967295, inclusive.

                            Informative note: deint-buf-cap indicates
                            the maximum possible size of the
                            deinterleaving buffer of the receiver only.
ToP   noToC   RFC3984 - Page 48
                            When network jitter can occur, an
                            appropriately sized jitter buffer has to
                            be provisioned for as well.

       sprop-init-buf-time:
                        This parameter MAY be used to signal the
                        properties of a NAL unit stream.  The parameter
                        MUST NOT be present, if the value of
                        packetization-mode is equal to 0 or 1.

                        The parameter signals the initial buffering
                        time that a receiver MUST buffer before
                        starting decoding to recover the NAL unit
                        decoding order from the transmission order.
                        The parameter is the maximum value of
                        (transmission time of a NAL unit - decoding
                        time of the NAL unit), assuming reliable and
                        instantaneous transmission, the same
                        timeline for transmission and decoding, and
                        that decoding starts when the first packet
                        arrives.

                        An example of specifying the value of sprop-
                        init-buf-time follows.  A NAL unit stream is
                        sent in the following interleaved order, in
                        which the value corresponds to the decoding
                        time and the transmission order is from left to
                        right:

                        0  2  1  3  5  4  6  8  7 ...

                        Assuming a steady transmission rate of NAL
                        units, the transmission times are:

                        0  1  2  3  4  5  6  7  8 ...

                        Subtracting the decoding time from the
                        transmission time column-wise results in the
                        following series:

                        0 -1  1  0 -1  1  0 -1  1 ...

                        Thus, in terms of intervals of NAL unit
                        transmission times, the value of
                        sprop-init-buf-time in this
                        example is 1.
ToP   noToC   RFC3984 - Page 49
                        The parameter is coded as a non-negative base10
                        integer representation in clock ticks of a 90-
                        kHz clock.  If the parameter is not present,
                        then no initial buffering time value is
                        defined.  Otherwise the value of sprop-init-
                        buf-time MUST be an integer in the range of 0
                        to 4294967295, inclusive.

                        In addition to the signaled sprop-init-buf-
                        time, receivers SHOULD take into account the
                        transmission delay jitter buffering, including
                        buffering for the delay jitter caused by
                        mixers, translators, gateways, proxies,
                        traffic-shapers, and other network elements.

       sprop-max-don-diff:
                        This parameter MAY be used to signal the
                        properties of a NAL unit stream.  It MUST NOT
                        be used to signal transmitter or receiver or
                        codec capabilities.  The parameter MUST NOT be
                        present if the value of packetization-mode is
                        equal to 0 or 1.  sprop-max-don-diff is an
                        integer in the range of 0 to 32767, inclusive.
                        If sprop-max-don-diff is not present, the value
                        of the parameter is unspecified.  sprop-max-
                        don-diff is calculated as follows:

                        sprop-max-don-diff = max{AbsDON(i) -
                        AbsDON(j)},
                        for any i and any j>i,

                        where i and j indicate the index of the NAL
                        unit in the transmission order and AbsDON
                        denotes a decoding order number of the NAL
                        unit that does not wrap around to 0 after
                        65535.  In other words, AbsDON is calculated as
                        follows: Let m and n be consecutive NAL units
                        in transmission order.  For the very first NAL
                        unit in transmission order (whose index is 0),
                        AbsDON(0) = DON(0).  For other NAL units,
                        AbsDON is calculated as follows:

                        If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

                        If (DON(m) < DON(n) and DON(n) - DON(m) <
                        32768),
                        AbsDON(n) = AbsDON(m) + DON(n) - DON(m)
ToP   noToC   RFC3984 - Page 50
                        If (DON(m) > DON(n) and DON(m) - DON(n) >=
                        32768),
                        AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

                        If (DON(m) < DON(n) and DON(n) - DON(m) >=
                        32768),

                        AbsDON(n) = AbsDON(m) - (DON(m) + 65536 -
                        DON(n))

                        If (DON(m) > DON(n) and DON(m) - DON(n) <
                        32768),
                        AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

                        where DON(i) is the decoding order number of
                        the NAL unit having index i in the transmission
                        order.  The decoding order number is specified
                        in section 5.5 of RFC 3984.

                            Informative note: Receivers may use sprop-
                            max-don-diff to trigger which NAL units in
                            the receiver buffer can be passed to the
                            decoder.

     max-rcmd-nalu-size:
                        This parameter MAY be used to signal the
                        capabilities of a receiver.  The parameter MUST
                        NOT be used for any other purposes.  The value
                        of the parameter indicates the largest NALU
                        size in bytes that the receiver can handle
                        efficiently.  The parameter value is a
                        recommendation, not a strict upper boundary.
                        The sender MAY create larger NALUs but must be
                        aware that the handling of these may come at a
                        higher cost than NALUs conforming to the
                        limitation.

                        The value of max-rcmd-nalu-size MUST be an
                        integer in the range of 0 to 4294967295,
                        inclusive.  If this parameter is not specified,
                        no known limitation to the NALU size exists.
                        Senders still have to consider the MTU size
                        available between the sender and the receiver
                        and SHOULD run MTU discovery for this purpose.

                        This parameter is motivated by, for example, an
                        IP to H.223 video telephony gateway, where
                        NALUs smaller than the H.223 transport data
ToP   noToC   RFC3984 - Page 51
                        unit will be more efficient.  A gateway may
                        terminate IP; thus, MTU discovery will normally
                        not work beyond the gateway.

                            Informative note: Setting this parameter to
                            a lower than necessary value may have a
                            negative impact.

   Encoding considerations:
                        This type is only defined for transfer via RTP
                        (RFC 3550).

                        A file format of H.264/AVC video is defined in
                        [29].  This definition is utilized by other
                        file formats, such as the 3GPP multimedia file
                        format (MIME type video/3gpp) [30] or the MP4
                        file format (MIME type video/mp4).

   Security considerations:
                        See section 9 of RFC 3984.

   Public specification:
                        Please refer to RFC 3984 and its section 15.

   Additional information:
                        None

   File extensions:     none
   Macintosh file type code: none
   Object identifier or OID: none

   Person & email address to contact for further information:
                        stewe@stewe.org

   Intended usage:      COMMON

   Author:
                        stewe@stewe.org
   Change controller:
                        IETF Audio/Video Transport working group
                        delegated from the IESG.
ToP   noToC   RFC3984 - Page 52

8.2. SDP Parameters

8.2.1. Mapping of MIME Parameters to SDP

The MIME media type video/H264 string is mapped to fields in the Session Description Protocol (SDP) [5] as follows: o The media name in the "m=" line of SDP MUST be video. o The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the MIME subtype). o The clock rate in the "a=rtpmap" line MUST be 90000. o The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs", "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop- parameter-sets", "parameter-add", "packetization-mode", "sprop- interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req", "sprop-init-buf-time", "sprop-max-don-diff", and "max-rcmd-nalu- size", when present, MUST be included in the "a=fmtp" line of SDP. These parameters are expressed as a MIME media type string, in the form of a semicolon separated list of parameter=value pairs. An example of media representation in SDP is as follows (Baseline Profile, Level 3.0, some of the constraints of the Main profile may not be obeyed): m=video 49170 RTP/AVP 98 a=rtpmap:98 H264/90000 a=fmtp:98 profile-level-id=42A01E; sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==

8.2.2. Usage with the SDP Offer/Answer Model

When H.264 is offered over RTP using SDP in an Offer/Answer model [7] for negotiation for unicast usage, the following limitations and rules apply: o The parameters identifying a media format configuration for H.264 are "profile-level-id", "packetization-mode", and, if required by "packetization-mode", "sprop-deint-buf-req". These three parameters MUST be used symmetrically; i.e., the answerer MUST either maintain all configuration parameters or remove the media format (payload type) completely, if one or more of the parameter values are not supported.
ToP   noToC   RFC3984 - Page 53
         Informative note: The requirement for symmetric use applies
         only for the above three parameters and not for the other
         stream properties and capability parameters.

      To simplify handling and matching of these configurations, the
      same RTP payload type number used in the offer SHOULD also be used
      in the answer, as specified in [7].  An answer MUST NOT contain a
      payload type number used in the offer unless the configuration
      ("profile-level-id", "packetization-mode", and, if present,
      "sprop-deint-buf-req") is the same as in the offer.

         Informative note: An offerer, when receiving the answer, has to
         compare payload types not declared in the offer based on media
         type (i.e., video/h264) and the above three parameters with any
         payload types it has already declared, in order to determine
         whether the configuration in question is new or equivalent to a
         configuration already offered.

   o  The parameters "sprop-parameter-sets", "sprop-deint-buf-req",
      "sprop-interleaving-depth", "sprop-max-don-diff", and "sprop-
      init-buf-time" describe the properties of the NAL unit stream that
      the offerer or answerer is sending for this media format
      configuration.  This differs from the normal usage of the
      Offer/Answer parameters: normally such parameters declare the
      properties of the stream that the offerer or the answerer is able
      to receive.  When dealing with H.264, the offerer assumes that the
      answerer will be able to receive media encoded using the
      configuration being offered.

         Informative note: The above parameters apply for any stream
         sent by the declaring entity with the same configuration; i.e.,
         they are dependent on their source.  Rather then being bound to
         the payload type, the values may have to be applied to another
         payload type when being sent, as they apply for the
         configuration.

   o  The capability parameters ("max-mbps", "max-fs", "max-cpb", "max-
      dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-nalu-size") MAY be
      used to declare further capabilities.  Their interpretation
      depends on the direction attribute.  When the direction attribute
      is sendonly, then the parameters describe the limits of the RTP
      packets and the NAL unit stream that the sender is capable of
      producing.  When the direction attribute is sendrecv or recvonly,
      then the parameters describe the limitations of what the receiver
      accepts.
ToP   noToC   RFC3984 - Page 54
   o  As specified above, an offerer has to include the size of the
      deinterleaving buffer in the offer for an interleaved H.264
      stream.  To enable the offerer and answerer to inform each other
      about their capabilities for deinterleaving buffering, both
      parties are RECOMMENDED to include "deint-buf-cap".  This
      information MAY be used when the value for "sprop-deint-buf-req"
      is selected in a second round of offer and answer.  For
      interleaved streams, it is also RECOMMENDED to consider offering
      multiple payload types with different buffering requirements when
      the capabilities of the receiver are unknown.

   o  The "sprop-parameter-sets" parameter is used as described above.
      In addition, an answerer MUST maintain all parameter sets received
      in the offer in its answer.  Depending on the value of the
      "parameter-add" parameter, different rules apply: If "parameter-
      add" is false (0), the answer MUST NOT add any additional
      parameter sets.  If "parameter-add" is true (1), the answerer, in
      its answer, MAY add additional parameter sets to the "sprop-
      parameter-sets" parameter.  The answerer MUST also, independent of
      the value of "parameter-add", accept to receive a video stream
      using the sprop-parameter-sets it declared in the answer.

         Informative note: care must be taken when parameter sets are
         added not to cause overwriting of already transmitted parameter
         sets by using conflicting parameter set identifiers.

   For streams being delivered over multicast, the following rules apply
   in addition:

   o  The stream properties parameters ("sprop-parameter-sets", "sprop-
      deint-buf-req", "sprop-interleaving-depth", "sprop-max-don-diff",
      and "sprop-init-buf-time") MUST NOT be changed by the answerer.
      Thus, a payload type can either be accepted unaltered or removed.

   o  The receiver capability parameters "max-mbps", "max-fs", "max-
      cpb", "max-dpb", "max-br", and "max-rcmd-nalu-size" MUST be
      supported by the answerer for all streams declared as sendrecv or
      recvonly; otherwise, one of the following actions MUST be
      performed: the media format is removed, or the session rejected.

   o  The receiver capability parameter redundant-pic-cap SHOULD be
      supported by the answerer for all streams declared as sendrecv or
      recvonly as follows:  The answerer SHOULD NOT include redundant
      coded pictures in the transmitted stream if the offerer indicated
      redundant-pic-cap equal to 0.  Otherwise (when redundant_pic_cap
      is equal to 1), it is beyond the scope of this memo to recommend
      how the answerer should use redundant coded pictures.
ToP   noToC   RFC3984 - Page 55
   Below are the complete lists of how the different parameters shall be
   interpreted in the different combinations of offer or answer and
   direction attribute.

   o  In offers and answers for which "a=sendrecv" or no direction
      attribute is used, or in offers and answers for which "a=recvonly"
      is used, the following interpretation of the parameters MUST be
      used.

      Declaring actual configuration or properties for receiving:

         - profile-level-id
         - packetization-mode

      Declaring actual properties of the stream to be sent (applicable
      only when "a=sendrecv" or no direction attribute is used):

         - sprop-deint-buf-req
         - sprop-interleaving-depth
         - sprop-parameter-sets
         - sprop-max-don-diff
         - sprop-init-buf-time

      Declaring receiver implementation capabilities:

         - max-mbps
         - max-fs
         - max-cpb
         - max-dpb
         - max-br
         - redundant-pic-cap
         - deint-buf-cap
         - max-rcmd-nalu-size

      Declaring how Offer/Answer negotiation shall be performed:

         - parameter-add

   o  In an offer or answer for which the direction attribute
      "a=sendonly" is included for the media stream, the following
      interpretation of the parameters MUST be used:

      Declaring actual configuration and properties of stream proposed
      to be sent:

         - profile-level-id
         - packetization-mode
         - sprop-deint-buf-req
ToP   noToC   RFC3984 - Page 56
         - sprop-max-don-diff
         - sprop-init-buf-time
         - sprop-parameter-sets
         - sprop-interleaving-depth

      Declaring the capabilities of the sender when it receives a
      stream:

         - max-mbps
         - max-fs
         - max-cpb
         - max-dpb
         - max-br
         - redundant-pic-cap
         - deint-buf-cap
         - max-rcmd-nalu-size

      Declaring how Offer/Answer negotiation shall be performed:

         - parameter-add

   Furthermore, the following considerations are necessary:

   o  Parameters used for declaring receiver capabilities are in general
      downgradable; i.e., they express the upper limit for a sender's
      possible behavior.  Thus a sender MAY select to set its encoder
      using only lower/lesser or equal values of these parameters.
      "sprop-parameter-sets" MUST NOT be used in a sender's declaration
      of its capabilities, as the limits of the values that are carried
      inside the parameter sets are implicit with the profile and level
      used.

   o  Parameters declaring a configuration point are not downgradable,
      with the exception of the level part of the "profile-level-id"
      parameter.  This expresses values a receiver expects to be used
      and must be used verbatim on the sender side.

   o  When a sender's capabilities are declared, and non-downgradable
      parameters are used in this declaration, then these parameters
      express a configuration that is acceptable.  In order to achieve
      high interoperability levels, it is often advisable to offer
      multiple alternative configurations; e.g., for the packetization
      mode.  It is impossible to offer multiple configurations in a
      single payload type.  Thus, when multiple configuration offers are
      made, each offer requires its own RTP payload type associated with
      the offer.
ToP   noToC   RFC3984 - Page 57
   o  A receiver SHOULD understand all MIME parameters, even if it only
      supports a subset of the payload format's functionality.  This
      ensures that a receiver is capable of understanding when an offer
      to receive media can be downgraded to what is supported by the
      receiver of the offer.

   o  An answerer MAY extend the offer with additional media format
      configurations.  However, to enable their usage, in most cases a
      second offer is required from the offerer to provide the stream
      properties parameters that the media sender will use.  This also
      has the effect that the offerer has to be able to receive this
      media format configuration, not only to send it.

   o  If an offerer wishes to have non-symmetric capabilities between
      sending and receiving, the offerer has to offer different RTP
      sessions; i.e., different media lines declared as "recvonly" and
      "sendonly", respectively.  This may have further implications on
      the system.

8.2.3. Usage in Declarative Session Descriptions

When H.264 over RTP is offered with SDP in a declarative style, as in RTSP [27] or SAP [28], the following considerations are necessary. o All parameters capable of indicating the properties of both a NAL unit stream and a receiver are used to indicate the properties of a NAL unit stream. For example, in this case, the parameter "profile-level-id" declares the values used by the stream, instead of the capabilities of the sender. This results in that the following interpretation of the parameters MUST be used: Declaring actual configuration or properties: - profile-level-id - sprop-parameter-sets - packetization-mode - sprop-interleaving-depth - sprop-deint-buf-req - sprop-max-don-diff - sprop-init-buf-time
ToP   noToC   RFC3984 - Page 58
      Not usable:

         - max-mbps
         - max-fs
         - max-cpb
         - max-dpb
         - max-br
         - redundant-pic-cap
         - max-rcmd-nalu-size
         - parameter-add
         - deint-buf-cap

   o  A receiver of the SDP is required to support all parameters and
      values of the parameters provided; otherwise, the receiver MUST
      reject (RTSP) or not participate in (SAP) the session.  It falls
      on the creator of the session to use values that are expected to
      be supported by the receiving application.

8.3. Examples

A SIP Offer/Answer exchange wherein both parties are expected to both send and receive could look like the following. Only the media codec specific parts of the SDP are shown. Some lines are wrapped due to text constraints. Offerer -> Answer SDP message: m=video 49170 RTP/AVP 100 99 98 a=rtpmap:98 H264/90000 a=fmtp:98 profile-level-id=42A01E; packetization-mode=0; sprop-parameter-sets=Z0IACpZTBYmI,aMljiA== a=rtpmap:99 H264/90000 a=fmtp:99 profile-level-id=42A01E; packetization-mode=1; sprop-parameter-sets=Z0IACpZTBYmI,aMljiA== a=rtpmap:100 H264/90000 a=fmtp:100 profile-level-id=42A01E; packetization-mode=2; sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==; sprop-interleaving-depth=45; sprop-deint-buf-req=64000; sprop-init-buf-time=102478; deint-buf-cap=128000 The above offer presents the same codec configuration in three different packetization formats. PT 98 represents single NALU mode, PT 99 non-interleaved mode; PT 100 indicates the interleaved mode. In the interleaved mode case, the interleaving parameters that the offerer would use if the answer indicates support for PT 100 are also included. In all three cases the parameter "sprop-parameter-sets" conveys the initial parameter sets that are required for the answerer when receiving a stream from the offerer when this configuration
ToP   noToC   RFC3984 - Page 59
   (profile-level-id and packetization mode) is accepted.  Note that the
   value for "sprop-parameter-sets", although identical in the example
   above, could be different for each payload type.

     Answerer -> Offerer SDP message:

     m=video 49170 RTP/AVP 100 99 97
     a=rtpmap:97 H264/90000
     a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
               sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
               KyzFGleR
     a=rtpmap:99 H264/90000
     a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
               sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
               KyzFGleR; max-rcmd-nalu-size=3980
     a=rtpmap:100 H264/90000
     a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
               sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
               KyzFGleR; sprop-interleaving-depth=60;
               sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
               deint-buf-cap=128000; max-rcmd-nalu-size=3980

   As the Offer/Answer negotiation covers both sending and receiving
   streams, an offer indicates the exact parameters for what the offerer
   is willing to receive, whereas the answer indicates the same for what
   the answerer accepts to receive.  In this case the offerer declared
   that it is willing to receive payload type 98.  The answerer accepts
   this by declaring a equivalent payload type 97; i.e., it has
   identical values for the three parameters "profile-level-id",
   packetization-mode, and "sprop-deint-buf-req".  This has the
   following implications for both the offerer and the answerer
   concerning the parameters that declare properties.  The offerer
   initially declared a certain value of the "sprop-parameter-sets" in
   the payload definition for PT=98.  However, as the answerer accepted
   this as PT=97, the values of "sprop-parameter-sets" in PT=98 must now
   be used instead when the offerer sends PT=97.  Similarly, when the
   answerer sends PT=98 to the offerer, it has to use the properties
   parameters it declared in PT=97.

   The answerer also accepts the reception of the two configurations
   that payload types 99 and 100 represent.  It provides the initial
   parameter sets for the answerer-to-offerer direction, and for
   buffering related parameters that it will use to send the payload
   types.  It also provides the offerer with its memory limit for
   deinterleaving operations by providing a "deint-buf-cap" parameter.
   This is only useful if the offerer decides on making a second offer,
   where it can take the new value into account.  The "max-rcmd-nalu-
   size" indicates that the answerer can efficiently process NALUs up to
ToP   noToC   RFC3984 - Page 60
   the size of 3980 bytes.  However, there is no guarantee that the
   network supports this size.

   Please note that the parameter sets in the above example do not
   represent a legal operation point of an H.264 codec.  The base64
   strings are only used for illustration.

8.4. Parameter Set Considerations

The H.264 parameter sets are a fundamental part of the video codec and vital to its operation; see section 1.2. Due to their characteristics and their importance for the decoding process, lost or erroneously transmitted parameter sets can hardly be concealed locally at the receiver. A reference to a corrupt parameter set has normally fatal results to the decoding process. Corruption could occur, for example, due to the erroneous transmission or loss of a parameter set data structure, but also due to the untimely transmission of a parameter set update. Therefore, the following recommendations are provided as a guideline for the implementer of the RTP sender. Parameter set NALUs can be transported using three different principles: A. Using a session control protocol (out-of-band) prior to the actual RTP session. B. Using a session control protocol (out-of-band) during an ongoing RTP session. C. Within the RTP stream in the payload (in-band) during an ongoing RTP session. It is necessary to implement principles A and B within a session control protocol. SIP and SDP can be used as described in the SDP Offer/Answer model and in the previous sections of this memo. This section contains guidelines on how principles A and B must be implemented within session control protocols. It is independent of the particular protocol used. Principle C is supported by the RTP payload format defined in this specification. The picture and sequence parameter set NALUs SHOULD NOT be transmitted in the RTP payload unless reliable transport is provided for RTP, as a loss of a parameter set of either type will likely prevent decoding of a considerable portion of the corresponding RTP
ToP   noToC   RFC3984 - Page 61
   stream.  Thus, the transmission of parameter sets using a reliable
   session control protocol (i.e., usage of principle A or B above) is
   RECOMMENDED.

   In the rest of the section it is assumed that out-of-band signaling
   provides reliable transport of parameter set NALUs and that in-band
   transport does not.  If in-band signaling of parameter sets is used,
   the sender SHOULD take the error characteristics into account and use
   mechanisms to provide a high probability for delivering the parameter
   sets correctly.  Mechanisms that increase the probability for a
   correct reception include packet repetition, FEC, and retransmission.
   The use of an unreliable, out-of-band control protocol has similar
   disadvantages as the in-band signaling (possible loss) and, in
   addition, may also lead to difficulties in the synchronization (see
   below).  Therefore, it is NOT RECOMMENDED.

   Parameter sets MAY be added or updated during the lifetime of a
   session using principles B and C.  It is required that parameter sets
   are present at the decoder prior to the NAL units that refer to them.
   Updating or adding of parameter sets can result in further problems,
   and therefore the following recommendations should be considered.

   -  When parameter sets are added or updated, principle C is
      vulnerable to transmission errors as described above, and
      therefore principle B is RECOMMENDED.

   -  When parameter sets are added or updated, care SHOULD be taken to
      ensure that any parameter set is delivered prior to its usage.  It
      is common that no synchronization is present between out-of-band
      signaling and in-band traffic.  If out-of-band signaling is used,
      it is RECOMMENDED that a sender does not start sending NALUs
      requiring the updated parameter sets prior to acknowledgement of
      delivery from the signaling protocol.

   -  When parameter sets are updated, the following synchronization
      issue should be taken into account.  When overwriting a parameter
      set at the receiver, the sender has to ensure that the parameter
      set in question is not needed by any NALU present in the network
      or receiver buffers.  Otherwise, decoding with a wrong parameter
      set may occur.  To lessen this problem, it is RECOMMENDED either
      to overwrite only those parameter sets that have not been used for
      a sufficiently long time (to ensure that all related NALUs have
      been consumed), or to add a new parameter set instead (which may
      have negative consequences for the efficiency of the video
      coding).

   -  When new parameter sets are added, previously unused parameter set
      identifiers are used.  This avoids the problem identified in the
ToP   noToC   RFC3984 - Page 62
      previous paragraph.  However, in a multiparty session, unless a
      synchronized control protocol is used, there is a risk that
      multiple entities try to add different parameter sets for the same
      identifier, which has to be avoided.

   -  Adding or modifying parameter sets by using both principles B and
      C in the same RTP session may lead to inconsistencies of the
      parameter sets because of the lack of synchronization between the
      control and the RTP channel.  Therefore, principles B and C MUST
      NOT both be used in the same session unless sufficient
      synchronization can be provided.

   In some scenarios (e.g., when only the subset of this payload format
   specification corresponding to H.241 is used), it is not possible to
   employ out-of-band parameter set transmission.  In this case,
   parameter sets have to be transmitted in-band.  Here, the
   synchronization with the non-parameter-set-data in the bitstream is
   implicit, but the possibility of a loss has to be taken into account.
   The loss probability should be reduced using the mechanisms discussed
   above.

   -  When parameter sets are initially provided using principle A and
      then later added or updated in-band (principle C), there is a risk
      associated with updating the parameter sets delivered out-of-band.
      If receivers miss some in-band updates (for example, because of a
      loss or a late tune-in), those receivers attempt to decode the
      bitstream using out-dated parameters.  It is RECOMMENDED that
      parameter set IDs be partitioned between the out-of-band and in-
      band parameter sets.

   To allow for maximum flexibility and best performance from the H.264
   coder, it is recommended, if possible, to allow any sender to add its
   own parameter sets to be used in a session.  Setting the "parameter-
   add" parameter to false should only be done in cases where the
   session topology prevents a participant to add its own parameter
   sets.



(page 62 continued on part 3)

Next Section