RFC 6190

RTP Payload Format for Scalable Video Coding

Pages: 100
Proposed Standard
→ Errata

Part 2 of 4 – Pages 23 to 51

RFC6190 - Page 23 prevText

4.  RTP Payload Format

4.1.  RTP Header Usage

   In addition to Section 5.1 of [RFC6184], the following rules apply.

   o Setting of the M bit:

   The M bit of an RTP packet for which the packet payload is an NI-MTAP
   MUST be equal to 1 if the last NAL unit, in decoding order, of the
   access unit associated with the RTP timestamp is contained in the
   packet.

   o Setting of the RTP timestamp:

   For an RTP packet for which the packet payload is an empty NAL unit,
   the RTP timestamp must be set according to Section 4.10.

   For an RTP packet for which the packet payload is a PACSI NAL unit,
   the RTP timestamp MUST be equal to the NALU-time of the next non-
   PACSI NAL unit in transmission order.  Recall that the NALU-time of a
   NAL unit in an MTAP is defined in [RFC6184] as the value that the RTP
   timestamp would have if that NAL unit would be transported in its own
   RTP packet.

   o Setting of the SSRC:

   For both SST and MST, the SSRC values MUST be set according to
   [RFC3550].

4.2.  NAL Unit Extension and Header Usage

4.2.1.  NAL Unit Extension

   This memo specifies a NAL unit extension mechanism to allow for
   introduction of new types of NAL units, beyond the three NAL unit
   types left undefined in [RFC6184] (i.e., 0, 30, and 31).  The
   extension mechanism utilizes the NAL unit type value 31 and is
   specified as follows.  When the NAL unit type value is equal to 31,
   the one-byte NAL unit header consisting of the F, NRI, and Type
   fields as specified in Section 1.1.3 is extended by one additional
   octet, which consists of a 5-bit field named Subtype and three 1-bit
   fields named J, K, and L, respectively.  The additional octet is
   shown in the following figure.

RFC6190 - Page 24

         +---------------+
         |0|1|2|3|4|5|6|7|
         +-+-+-+-+-+-+-+-+
         | Subtype |J|K|L|
         +---------------+

   The Subtype value determines the (extended) NAL unit type of this NAL
   unit.  The interpretation of the fields J, K, and L depends on the
   Subtype.  The semantics of the fields are as follows.

   When Subtype is equal to 1, the NAL unit is an empty NAL unit as
   specified in Section 4.10.  When Subtype is equal to 2, the NAL unit
   is an NI-MTAP NAL unit as specified in Section 4.7.1.  All other
   values of Subtype (0, 3-31) are reserved for future extensions, and
   receivers MUST ignore the entire NAL unit when Subtype is equal to
   any of these reserved values.

4.2.2.  NAL Unit Header Usage

   The structure and semantics of the NAL unit header according to the
   H.264 specification [H.264] were introduced in Section 1.1.3.  This
   section specifies the extended semantics of the NAL unit header
   fields F, NRI, I, PRID, DID, QID, TID, U, and D, according to this
   memo.  When the Type field is equal to 31, the semantics of the
   fields in the extension NAL unit header were specified in Section
   4.2.1.

   The semantics of F specified in Section 5.3 of [RFC6184] also apply
   in this memo.  That is, a value of 0 for F indicates that the NAL
   unit type octet and payload should not contain bit errors or other
   syntax violations, whereas a value of 1 for F indicates that the NAL
   unit type octet and payload may contain bit errors or other syntax
   violations.  MANEs SHOULD set the F bit to indicate bit errors in the
   NAL unit.

   For NRI, for a bitstream conforming to one of the profiles defined in
   Annex A of [H.264] and transported using [RFC6184], the semantics
   specified in Section 5.3 of [RFC6184] apply, i.e., NRI also indicates
   the relative importance of NAL units.  For a bitstream conforming to
   one of the profiles defined in Annex G of [H.264] and transported
   using this memo, in addition to the semantics specified in Annex G of
   [H.264], NRI also indicates the relative importance of NAL units
   within a layer.

   For I, in addition to the semantics specified in Annex G of [H.264],
   according to this memo, MANEs MAY use this information to protect NAL
   units with I equal to 1 better than NAL units with I equal to 0.
   MANEs MAY also utilize information of NAL units with I equal to 1 to

RFC6190 - Page 25

   decide when to forward more packets for an RTP packet stream.  For
   example, when it is detected that spatial layer switching has
   happened such that the operation point has changed to a higher value
   of DID, MANEs MAY start to forward NAL units with the higher value of
   DID only after forwarding a NAL unit with I equal to 1 with the
   higher value of DID.

   Note that, in the context of this section, "protecting a NAL unit"
   means any RTP or network transport mechanism that could improve the
   probability of successful delivery of the packet conveying the NAL
   unit, including applying a Quality of Service (QoS) enabled network,
   Forward Error Correction (FEC), retransmissions, and advanced
   scheduling behavior, whenever possible.

   For PRID, the semantics specified in Annex G of [H.264] apply.  Note
   that MANEs implementing unequal error protection MAY use this
   information to protect NAL units with smaller PRID values better than
   those with larger PRID values, for example, by including only the
   more important NAL units in a FEC protection mechanism.  The
   importance for the decoding process decreases as the PRID value
   increases.

   For DID, QID, or TID, in addition to the semantics specified in Annex
   G of [H.264], according to this memo, values of DID, QID, or TID
   indicate the relative importance in their respective dimension.  A
   lower value of DID, QID, or TID indicates a higher importance if the
   other two components are identical.  MANEs MAY use this information
   to protect more important NAL units better than less important NAL
   units.

   For U, in addition to the semantics specified in Annex G of [H.264],
   according to this memo, MANEs MAY use this information to protect NAL
   units with U equal to 1 better than NAL units with U equal to 0.

   For D, in addition to the semantics specified in Annex G of [H.264],
   according to this memo, MANEs MAY use this information to determine
   whether a given NAL unit is required for successfully decoding a
   certain Operation Point of the SVC bitstream, hence to decide whether
   to forward the NAL unit.

4.3.  Payload Structures

   The NAL unit structure is central to H.264/AVC, [RFC6184], as well as
   SVC and this memo.  In H.264/AVC and SVC, all coded bits for
   representing a video signal are encapsulated in NAL units.  In
   [RFC6184], each RTP packet payload is structured as a NAL unit, which
   contains one or a part of one NAL unit specified in H.264/AVC, or
   aggregates one or more NAL units specified in H.264/AVC.

RFC6190 - Page 26

   [RFC6184] specifies three basic payload structures (in Section 5.2 of
   [RFC6184]): single NAL unit packet, aggregation packet, fragmentation
   unit, and six new types (24 to 29) of NAL units.  The value of the
   Type field of the RTP packet payload header (i.e., the first byte of
   the payload) may be equal to any value from 1 to 23 for a single NAL
   unit packet, any value from 24 to 27 for an aggregation packet, and
   28 or 29 for a fragmentation unit.

   In addition to the NAL unit types defined originally for H.264/AVC,
   SVC defines three new NAL unit types specifically for SVC: coded
   slice in scalable extension NAL units (type 20), prefix NAL units
   (type 14), and subset sequence parameter set NAL units (type 15), as
   described in Section 1.1.

   This memo further introduces three new types of NAL units, PACSI NAL
   unit (NAL unit type 30) as specified in Section 4.9, empty NAL unit
   (type 31, subtype 1) as specified in Section 4.10, and NI-MTAP NAL
   unit (type 31, subtype 2) as specified in Section 4.7.1.

   The RTP packet payload structure in [RFC6184] is maintained with
   slight extensions in this memo, as follows.  Each RTP packet payload
   is still structured as a NAL unit, which contains one or a part of
   one NAL unit specified in H.264/AVC and SVC, or contains one PACSI
   NAL unit or one empty NAL unit, or aggregates zero or more NAL units
   specified in H.264/AVC and SVC, zero or one PACSI NAL unit, and zero
   or more empty NAL units.

   In this memo, one of the three basic payload structures,
   fragmentation unit, remains the same as in [RFC6184], and the other
   two, single NAL unit packet and aggregation packet, are extended as
   follows.  The value of the Type field of the payload header may be
   equal to any value from 1 to 23, inclusive, and 30 to 31, inclusive,
   for a single NAL unit packet, and any value from 24 to 27, inclusive,
   and 31, for an aggregation packet.  When the Type field of the
   payload header is equal to 31 and the Subtype field of the payload
   header is equal to 2, the packet is an aggregation packet (containing
   an NI-MTAP NAL unit).  When the Type field of the payload header is
   equal to 31 and the Subtype field of the payload header is equal to
   1, the packet is a single NAL unit packet (containing an empty NAL
   unit).

   Note that, in this memo, the length of the payload header varies
   depending on the value of the Type field in the first byte of the RTP
   packet payload.  If the value is equal to 14, 20, or 30, the first
   four bytes of the packet payload form the payload header; otherwise,
   if the value is equal to 31, the first two bytes of the payload form
   the payload header; otherwise, the payload header is the first byte
   of the packet payload.

RFC6190 - Page 27

   Table 1 lists the NAL unit types introduced in SVC and this memo and
   where they are described in this memo.  Table 2 summarizes the basic
   payload structure types for all NAL unit types when they are directly
   used as RTP packet payloads according to this memo.  Table 3
   summarizes the NAL unit types allowed to be aggregated (i.e., used as
   aggregation units in aggregation packets) or fragmented (i.e.,
   carried in fragmentation units) according to this memo.

   Table 1.  NAL unit types introduced in SVC and this memo

   Type  Subtype  NAL Unit Name                Section Numbers
   -----------------------------------------------------------
   14     -       Prefix NAL unit                    1.1
   15     -       Subset sequence parameter set      1.1
   20     -       Coded slice in scalable extension  1.1
   30     -       PACSI NAL unit                     4.9
   31     0       reserved                           4.2.1
   31     1       Empty NAL unit                     4.10
   31     2       NI-MTAP                            4.7.1
   31     3-31    reserved                           4.2.1

   Table 2.  Basic payload structure types for all NAL unit
   types when they are directly used as RTP packet payloads

   Type   Subtype    Basic Payload Structure
   ------------------------------------------
   0      -          reserved
   1-23   -          Single NAL Unit Packet
   24-27  -          Aggregation Packet
   28-29  -          Fragmentation Unit
   30     -          Single NAL Unit Packet
   31     0          reserved
   31     1          Single NAL Unit Packet
   31     2          Aggregation Packet
   31     3-31       reserved

RFC6190 - Page 28

   Table 3.  Summary of the NAL unit types allowed to be
   aggregated or fragmented (yes = allowed, no = disallowed,
   - = not applicable/not specified)

   Type  Subtype STAP-A STAP-B MTAP16 MTAP24 FU-A FU-B NI-MTAP
   -------------------------------------------------------------
   0     -          -      -      -      -     -     -     -
   1-23  -        yes    yes    yes    yes   yes   yes   yes
   24-29 -         no     no     no     no    no    no    no
   30    -        yes    yes    yes    yes    no    no   yes
   31    0          -      -      -      -     -     -     -
   31    1        yes     no     no     no    no    no   yes
   31    2         no     no     no     no    no    no    no
   31    3-31       -      -      -      -     -     -     -

4.4.  Transmission Modes

   This memo enables transmission of an SVC bitstream over one or more
   RTP sessions.  If only one RTP session is used for transmission of
   the SVC bitstream, the transmission mode is referred to as single-
   session transmission (SST); otherwise (more than one RTP session is
   used for transmission of the SVC bitstream), the transmission mode is
   referred to as multi-session transmission (MST).

   SST SHOULD be used for point-to-point unicast scenarios, while MST
   SHOULD be used for point-to-multipoint multicast scenarios where
   different receivers requires different operation points of the same
   SVC bitstream, to improve bandwidth utilizing efficiency.

   If the OPTIONAL mst-mode media type parameter (see Section 7.1) is
   not present, SST MUST be used; otherwise (mst-mode is present), MST
   MUST be used.

4.5.  Packetization Modes

4.5.1.  Packetization Modes for Single-Session Transmission

   When SST is in use, Section 5.4 of [RFC6184] applies with the
   following extensions.

   The packetization modes specified in Section 5.4 of [RFC6184],
   namely, single NAL unit mode, non-interleaved mode, and interleaved
   mode, are also referred to as session packetization modes.  Table 4
   summarizes the allowed session packetization modes for SST.

RFC6190 - Page 29

   Table 4.  Summary of allowed session packetization modes
   (denoted as "Session Mode" for simplicity) for SST (yes =
   allowed, no = disallowed)

   Session Mode               Allowed
   -------------------------------------
   Single NAL Unit Mode         yes
   Non-Interleaved Mode         yes
   Interleaved Mode             yes

   For NAL unit types in the range of 0 to 29, inclusive, the NAL unit
   types allowed to be directly used as packet payloads for each session
   packetization mode are the same as specified in Section 5.4 of
   [RFC6184].  For other NAL unit types, which are newly introduced in
   this memo, the NAL unit types allowed to be directly used as packet
   payloads for each session packetization mode are summarized in Table
   5.

   Table 5.  New NAL unit types allowed to be directly used
   as packet payloads for each session packetization mode
   (yes = allowed, no = disallowed, - = not applicable/not specified)

   Type   Subtype    Single NAL    Non-Interleaved    Interleaved
                     Unit Mode           Mode             Mode
   -------------------------------------------------------------
   30     -            yes               no               no
   31     0              -                -                -
   31     1            yes              yes               no
   31     2             no              yes               no
   31     3-31           -                -                -

4.5.2.  Packetization Modes for Multi-Session Transmission

   For MST, this memo specifies four MST packetization modes:

   o  Non-interleaved timestamp based mode (NI-T);

   o  Non-interleaved cross-session decoding order number (CS-DON) based
      mode (NI-C);

   o  Non-interleaved combined timestamp and CS-DON mode (NI-TC); and

   o  Interleaved CS-DON (I-C) mode.

   These four modes differ in two ways.  First, they differ in terms of
   whether NAL units are required to be transmitted within each RTP
   session in decoding order (i.e., non-interleaved), or they are
   allowed to be transmitted in a different order (i.e., interleaved).

RFC6190 - Page 30

   Second, they differ in the mechanisms they provide in order to
   recover the correct decoding order of the NAL units across all RTP
   sessions involved.

   The NI-T, NI-C, and NI-TC modes do not allow interleaving, and are
   thus targeted for systems that require relatively low end-to-end
   latency, e.g., conversational systems.  The I-C mode allows
   interleaving and is thus targeted for systems that do not require
   very low end-to-end latency.  The benefits of interleaving are the
   same as that of the interleaved mode specified in [RFC6184].

   The NI-T mode uses timestamps to recover the decoding order of NAL
   units, whereas the NI-C and I-C modes both use the CS-DON mechanism
   (explained later) to do so.  The NI-TC mode provides both timestamps
   and the CS-DON method; receivers in this case may choose to use
   either method for performing decoding order recovery.  The MST
   packetization mode in use MUST be signaled by the value of the
   OPTIONAL mst-mode media type parameter.  The used MST packetization
   mode governs which session packetization modes are allowed in the
   associated RTP sessions, which in turn govern which NAL unit types
   are allowed to be directly used as RTP packet payloads.

   Table 6 summarizes the allowed session packetization modes for NI-T,
   NI-C, and NI-TC.  Table 7 summarizes the allowed session
   packetization modes for I-C.

   Table 6.  Summary of allowed session packetization modes
   (denoted as "Session Mode" for simplicity) for NI-T, NI-C, and
   NI-TC (yes = allowed, no = disallowed)

   Session Mode            Base Session    Enhancement Session
   -----------------------------------------------------------
   Single NAL Unit Mode         yes             no
   Non-Interleaved Mode         yes            yes
   Interleaved Mode              no             no

   Table 7.  Summary of allowed session packetization modes
   (denoted as "Session Mode" for simplicity) for I-C
   (yes = allowed, no = disallowed)

   Session Mode            Base Session    Enhancement Session
   -----------------------------------------------------------
   Single NAL Unit Mode          no             no
   Non-Interleaved Mode          no             no
   Interleaved Mode             yes            yes

RFC6190 - Page 31

   For NAL unit types in the range of 0 to 29, inclusive, the NAL unit
   types allowed to be directly used as packet payloads for each session
   packetization mode are the same as specified in Section 5.4 of
   [RFC6184].  For other NAL unit types, which are newly introduced in
   this memo, the NAL unit types allowed to be directly used as packet
   payloads for each allowed session packetization mode for NI-T, NI-C,
   NI-TC, and I-C are summarized in Tables 8, 9, 10, and 11,
   respectively.

   Table 8.  New NAL unit types allowed to be directly used
   as packet payloads for each allowed session packetization
   mode when NI-T is in use (yes = allowed, no = disallowed,
   - = not applicable/not specified)

   Type   Subtype    Single NAL    Non-Interleaved
                     Unit Mode           Mode
   ---------------------------------------------------
   30     -            yes               no
   31     0              -                -
   31     1            yes              yes
   31     2             no              yes
   31     3-31           -                -

   Table 9.  New NAL unit types allowed to be directly used
   as packet payloads for each allowed session packetization
   mode when NI-C is in use (yes = allowed, no = disallowed,
   - = not applicable/not specified)

   Type   Subtype    Single NAL    Non-Interleaved
                     Unit Mode           Mode
   ---------------------------------------------------
   30     -            yes              yes
   31     0              -                -
   31     1             no               no
   31     2             no              yes
   31     3-31           -                -

RFC6190 - Page 32

   Table 10.  New NAL unit types allowed to be directly used
   as packet payloads for each allowed session packetization
   mode when NI-TC is in use (yes = allowed, no = disallowed,
   - = not applicable/not specified)

   Type   Subtype    Single NAL    Non-Interleaved
                     Unit Mode           Mode
   ---------------------------------------------------
   30     -            yes              yes
   31     0              -                -
   31     1             yes             yes
   31     2             no              yes
   31     3-31           -                -

   Table 11.  New NAL unit types allowed to be directly used
   as packet payloads for the allowed session packetization
   mode when I-C is in use (yes = allowed, no = disallowed,
   - = not applicable/not specified)

   Type   Subtype    Interleaved Mode
   ------------------------------------
   30     -               no
   31     0                -
   31     1               no
   31     2               no
   31     3-31             -

   When MST is in use and the MST packetization mode in use is NI-C,
   empty NAL units (type 31, subtype 1) MUST NOT be used, i.e., no RTP
   packet is allowed to contain one or more empty NAL units.

   When MST is in use and the MST packetization mode in use is I-C, both
   empty NAL units (type 31, subtype 1) and NI-MTAP NAL units (type 31,
   subtype 2) MUST NOT be used, i.e., no RTP packet is allowed to
   contain one or more empty NAL units or an NI-MTAP NAL unit.

4.6.  Single NAL Unit Packets

   Section 5.6 of [RFC6184] applies with the following extensions.

   The payload of a single NAL unit packet MAY be a PACSI NAL unit (Type
   30) or an empty NAL unit (Type 31 and Subtype 1), in addition to a
   NAL unit with NAL unit type equal to any value from 1 to 23,
   inclusive.

RFC6190 - Page 33

   If the Type field of the first byte of the payload is not equal to
   31, the payload header is the first byte of the payload.  Otherwise,
   (the Type field of the first byte of the payload is equal to 31), the
   payload header is the first two bytes of the payload.

4.7.  Aggregation Packets

   In addition to Section 5.7 of [RFC6184], the following applies in
   this memo.

4.7.1.  Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs)

   One new NAL unit type introduced in this memo is the non-interleaved
   multi-time aggregation packet (NI-MTAP).  An NI-MTAP consists of one
   or more non-interleaved multi-time aggregation units.

   The NAL units contained in NI-MTAPs MUST be aggregated in decoding
   order.

   A non-interleaved multi-time aggregation unit for the NI-MTAP
   consists of 16 bits of unsigned size information of the following NAL
   unit (in network byte order), and 16 bits (in network byte order) of
   timestamp offset (TS offset) for the NAL unit.  The structure is
   presented in Figure 1.  The starting or ending position of an
   aggregation unit within a packet may or may not be on a 32-bit word
   boundary.  The NAL units in the NI-MTAP are ordered in NAL unit
   decoding order.

   The Type field of the NI-MTAP MUST be set equal to "31".

   The F bit MUST be set to 0 if all the F bits of the aggregated NAL
   units are zero; otherwise, it MUST be set to 1.

   The value of NRI MUST be the maximum value of NRI across all NAL
   units carried in the NI-MTAP packet.

   The field Subtype MUST be equal to 2.

   If the field J is equal to 1, the optional DON field MUST be present
   for each of the non-interleaved multi-time aggregation units.  For
   SST, the J field MUST be equal to 0.  For MST, in the NI-T mode the J
   field MUST be equal to 0, whereas in the NI-C or NI-TC mode the J
   field MUST be equal to 1.  When the NI-C or NI-TC mode is in use, the
   DON field, when present, MUST represent the CS-DON value for the
   particular NAL unit as defined in Section 6.2.2.

   The fields K and L MUST be both equal to 0.

RFC6190 - Page 34

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   :        NAL unit size          |        TS offset              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        DON (optional)         |                               |
   |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+    NAL unit                   |
   |                                                               |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 1.  Non-interleaved multi-time aggregation unit for NI-MTAP

   Let TS be the RTP timestamp of the packet carrying the NAL unit.
   Recall that the NALU-time of a NAL unit in an MTAP is defined in
   [RFC6184] as the value that the RTP timestamp would have if that NAL
   unit would be transported in its own RTP packet.  The timestamp
   offset field MUST be set to a value equal to the value of the
   following formula:

      if NALU-time >= TS, TS offset = NALU-time - TS
      else, TS offset = NALU-time + (2^32 - TS)

   For the "earliest" multi-time aggregation unit in an NI-MTAP, the
   timestamp offset MUST be zero.  Hence, the RTP timestamp of the NI-
   MTAP itself is identical to the earliest NALU-time.

      Informative note: The "earliest" multi-time aggregation unit is
      the one that would have the smallest extended RTP timestamp among
      all the aggregation units of an NI-MTAP if the aggregation units
      were encapsulated in single NAL unit packets.  An extended
      timestamp is a timestamp that has more than 32 bits and is capable
      of counting the wraparound of the timestamp field, thus enabling
      one to determine the smallest value if the timestamp wraps.  Such
      an "earliest" aggregation unit may or may not be the first one in
      the order in which the aggregation units are encapsulated in an
      NI-MTAP.  The "earliest" NAL unit need not be the same as the
      first NAL unit in the NAL unit decoding order either.

   Figure 2 presents an example of an RTP packet that contains an NI-
   MTAP that contains two non-interleaved multi-time aggregation units,
   labeled as 1 and 2 in the figure.

RFC6190 - Page 35

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          RTP Header                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|NRI|  Type   | Subtype |J|K|L|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                                                               |
   |        Non-interleaved multi-time aggregation unit #1         |
   :                                                               :
   |                                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                 |  Non-interleaved multi-time |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                             |
   |                      aggregation unit #2                      |
   :                                                               :
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               :...OPTIONAL RTP padding        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 2.  An RTP packet including an NI-MTAP containing two
   non-interleaved multi-time aggregation units

4.8.  Fragmentation Units (FUs)

   Section 5.8 of [RFC6184] applies.

      Informative note: In case a NAL unit with the four-byte SVC NAL
      unit header is fragmented, the three-byte SVC-specific header
      extension is considered as part of the NAL unit payload.  That is,
      the three-byte SVC-specific header extension is only available in
      the first fragment of the fragmented NAL unit.

4.9.  Payload Content Scalability Information (PACSI) NAL Unit

   Another new type of NAL unit specified in this memo is the payload
   content scalability information (PACSI) NAL unit.  The Type field of
   PACSI NAL units MUST be equal to 30 (a NAL unit type value left
   unspecified in [H.264] and [RFC6184]).  A PACSI NAL unit MAY be
   carried in a single NAL unit packet or an aggregation packet, and
   MUST NOT be fragmented.

   PACSI NAL units may be used for the following purposes:

   o  To enable MANEs to decide whether to forward, process, or discard
      aggregation packets, by checking in PACSI NAL units the
      scalability information and other characteristics of the

RFC6190 - Page 36

      aggregated NAL units, rather than looking into the aggregated NAL
      units themselves, which are defined by the video coding
      specification;

   o  To enable correct decoding order recovery in MST using the NI-C or
      NI-TC mode, with the help of the CS-DON information included in
      PACSI NAL units; and

   o  To improve resilience to packet losses, e.g., by utilizing the
      following data or information included in PACSI NAL units:
      repeated Supplemental Enhancement Information (SEI) messages,
      information regarding the start and end of layer representations,
      and the indices to layer representations of the lowest temporal
      subset.

   PACSI NAL units MAY be ignored in the NI-T mode without affecting the
   decoding order recovery process.

   When a PACSI NAL unit is present in an aggregation packet, the
   following applies.

   o  The PACSI NAL unit MUST be the first aggregated NAL unit in the
      aggregation packet.

   o  There MUST be at least one additional aggregated NAL unit in the
      aggregation packet.

   o  The RTP header fields and the payload header fields of the
      aggregation packet are set as if the PACSI NAL unit was not
      included in the aggregation packet.

   o  If the aggregation packet is an MTAP16, MTAP24, or NI-MTAP with
      the J field equal to 1, the decoding order number (DON) for the
      PACSI NAL unit MUST be set to indicate that the PACSI NAL unit has
      an identical DON to the first NAL unit in decoding order among the
      remaining NAL units in the aggregation packet.

   When a PACSI NAL unit is included in a single NAL unit packet, it is
   associated with the next non-PACSI NAL unit in transmission order,
   and the RTP header fields of the packet are set as if the next non-
   PACSI NAL unit in transmission order was included in a single NAL
   unit packet.

   The PACSI NAL unit structure is as follows.  The first four octets
   are exactly the same as the four-byte SVC NAL unit header discussed
   in Section 1.1.3.  They are followed by one octet containing several
   flags, then five optional octets, and finally zero or more SEI NAL
   units.  Each SEI NAL unit is preceded by a 16-bit unsigned size field

RFC6190 - Page 37

   (in network byte order) that indicates the size of the following NAL
   unit in bytes (excluding these two octets, but including the NAL unit
   header octet of the SEI NAL unit).  Figure 3 illustrates the PACSI
   NAL unit structure and an example of a PACSI NAL unit containing two
   SEI NAL units.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|NRI|  Type   |R|I|   PRID    |N| DID |  QID  | TID |U|D|O| RR|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |X|Y|T|A|P|C|S|E| TL0PICIDX (o) |        IDRPICID (o)           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          DONC (o)             |        NAL unit size 1        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                 SEI NAL unit 1                                |
   |                                                               |
   |               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |        NAL unit size 2        |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
   |                                                               |
   |            SEI NAL unit 2                                     |
   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 3.  PACSI NAL unit structure.  Fields suffixed by
   "(o)" are OPTIONAL.

   The bits A, P, and C are specified only if the bit X is equal to 1.
   The bits S and E are specified, and the fields TL0PICIDX and IDRPICID
   are present, only if the bit Y is equal to 1.  The field DONC is
   present only if the bit T is equal to 1.  The field T MUST be equal
   to 0 if the PACSI NAL unit is contained in an STAP-B, MTAP16, MTAP24,
   or NI-MTAP with the J field equal to 1.

   The values of the fields in PACSI NAL unit MUST be set as follows.

   o  The F bit MUST be set to 1 if the F bit in at least one of the
      remaining NAL units in the aggregation packet is equal to 1 (when
      the PACSI NAL unit is included in an aggregation packet) or if the
      next non-PACSI NAL unit in transmission order has the F bit equal
      to 1 (when the PACSI NAL unit is included in a single NAL unit
      packet).  Otherwise, the F bit MUST be set to 0.

RFC6190 - Page 38

   o  The NRI field MUST be set to the highest value of NRI field among
      all the remaining NAL units in the aggregation packet (when the
      PACSI NAL unit is included in an aggregation packet) or the value
      of the NRI field of the next non-PACSI NAL unit in transmission
      order (when the PACSI NAL unit is included in a single NAL unit
      packet).

   o  The Type field MUST be set to 30.

   o  The R bit MUST be set to 1.  Receivers MUST ignore the value of R.

   o  The I bit MUST be set to 1 if the I bit of at least one of the
      remaining NAL units in the aggregation packet is equal to 1 (when
      the PACSI NAL unit is included in an aggregation packet) or if the
      I bit of the next non-PACSI NAL unit in transmission order is
      equal to 1 (when the PACSI NAL unit is included in a single NAL
      unit packet).  Otherwise, the I bit MUST be set to 0.

   o  The PRID field MUST be set to the lowest value of the PRID values
      of the remaining NAL units in the aggregation packet (when the
      PACSI NAL unit is included in an aggregation packet) or the PRID
      value of the next non-PACSI NAL unit in transmission order (when
      the PACSI NAL unit is included in a single NAL unit packet).

   o  The N bit MUST be set to 1 if the N bit of all the remaining NAL
      units in the aggregation packet is equal to 1 (when the PACSI NAL
      unit is included in an aggregation packet) or if the N bit of the
      next non-PACSI NAL unit in transmission order is equal to 1 (when
      the PACSI NAL unit is included in a single NAL unit packet).
      Otherwise, the N bit MUST be set to 0.

   o  The DID field MUST be set to the lowest value of the DID values of
      the remaining NAL units in the aggregation packet (when the PACSI
      NAL unit is included in an aggregation packet) or the DID value of
      the next non-PACSI NAL unit in transmission order (when the PACSI
      NAL unit is included in a single NAL unit packet).

   o  The QID field MUST be set to the lowest value of the QID values of
      the remaining NAL units with the lowest value of DID in the
      aggregation packet (when the PACSI NAL unit is included in an
      aggregation packet) or the QID value of the next non-PACSI NAL
      unit in transmission order (when the PACSI NAL unit is included in
      a single NAL unit packet).

   o  The TID field MUST be set to the lowest value of the TID values of
      the remaining NAL units with the lowest value of DID in the
      aggregation packet (when the PACSI NAL unit is included in an

RFC6190 - Page 39

      aggregation packet) or the TID value of the next non-PACSI NAL
      unit in transmission order (when the PACSI NAL unit is included in
      a single NAL unit packet).

   o  The U bit MUST be set to 1 if the U bit of at least one of the
      remaining NAL units in the aggregation packet is equal to 1 (when
      the PACSI NAL unit is included in an aggregation packet) or if the
      U bit of the next non-PACSI NAL unit in transmission order is
      equal to 1 (when the PACSI NAL unit is included in a single NAL
      unit packet).  Otherwise, the U bit MUST be set to 0.

   o  The D bit MUST be set to 1 if the D value of all the remaining NAL
      units in the aggregation packet is equal to 1 (when the PACSI NAL
      unit is included in an aggregation packet) or if the D bit of the
      next non-PACSI NAL unit in transmission order is equal to 1 (when
      the PACSI NAL unit is included in a single NAL unit packet).
      Otherwise, the D bit MUST be set to 0.

   o  The O bit MUST be set to 1 if the O bit of at least one of the
      remaining NAL units in the aggregation packet is equal to 1 (when
      the PACSI NAL unit is included in an aggregation packet) or if the
      O bit of the next non-PACSI NAL unit in transmission order is
      equal to 1 (when the PACSI NAL unit is included in a single NAL
      unit packet).  Otherwise, the O bit MUST be set to 0.

   o  The RR field MUST be set to "11" (in binary form).  Receivers MUST
      ignore the value of RR.

   o  If the X bit is equal to 1, the bits A, P, and C are specified as
      below.  Otherwise, the bits A, P, and C are unspecified, and
      receivers MUST ignore the values of these bits.  The X bit SHOULD
      be identical for all the PACSI NAL units in all the RTP sessions
      carrying the same SVC bitstream.

   o  If the Y bit is equal to 1, the OPTIONAL fields TL0PICIDX and
      IDRPICID MUST be present and specified as below, and the bits S
      and E are also specified as below.  Otherwise, the fields
      TL0PICIDX and IDRPICID MUST NOT be present, while the S and E bits
      are unspecified and receivers MUST ignore the values of these
      bits.  The Y bit MUST be identical for all the PACSI NAL units in
      all the RTP sessions carrying the same SVC bitstream.  The Y bit
      MUST be equal to 0 when the parameter packetization-mode is equal
      to 2.

   o  If the T bit is equal to 1, the OPTIONAL field DONC MUST be
      present and specified as below.  Otherwise, the field DONC MUST
      NOT be present.  The field T MUST be equal to 0 if the PACSI NAL
      unit is contained in an STAP-B, MTAP16, MTAP24, or NI-MTAP.

RFC6190 - Page 40

   o  The A bit MUST be set to 1 if at least one of the remaining NAL
      units in the aggregation packet belongs to an anchor layer
      representation (when the PACSI NAL unit is included in an
      aggregation packet) or if the next non-PACSI NAL unit in
      transmission order belongs to an anchor layer representation (when
      the PACSI NAL unit is included in a single NAL unit packet).
      Otherwise, the A bit MUST be set to 0.

      Informative note: The A bit indicates whether CGS or spatial layer
      switching at a non-IDR layer representation (a layer
      representation with nal_unit_type not equal to 5 and idr_flag not
      equal to 1) can be performed.  With some picture coding structures
      a non-IDR intra layer representation can be used for random
      access.  Compared to using only IDR layer representations, higher
      coding efficiency can be achieved.  The H.264/AVC or SVC solution
      to indicate the random accessibility of a non-IDR intra layer
      representation is using a recovery point SEI message.  The A bit
      offers direct access to this information, without having to parse
      the recovery point SEI message, which may be buried deeply in an
      SEI NAL unit.  Furthermore, the SEI message may or may not be
      present in the bitstream.

   o  The P bit MUST be set to 1 if all the remaining NAL units in the
      aggregation packet have redundant_pic_cnt greater than 0 (when the
      PACSI NAL unit is included in an aggregation packet) or the next
      non-PACSI NAL unit in transmission order has redundant_pic_cnt
      greater than 0 (when the PACSI NAL unit is included in a single
      NAL unit packet).  Otherwise, the P bit MUST be set to 0.

      Informative note: The P bit indicates whether a packet can be
      discarded because it contains only redundant slice NAL units.
      Without this bit, the corresponding information can be obtained
      from the syntax element redundant_pic_cnt, which is contained in
      the variable-length coded slice header.

   o  The C bit MUST be set to 1 if at least one of the remaining NAL
      units in the aggregation packet belongs to an intra layer
      representation (when the PACSI NAL unit is included in an
      aggregation packet) or if the next non-PACSI NAL unit in
      transmission order belongs to an intra layer representation (when
      the PACSI NAL unit is included in a single NAL unit packet).
      Otherwise, the C bit MUST be set to 0.

      Informative note: The C bit indicates whether a packet contains
      intra slices, which may be the only packets to be forwarded, e.g.,
      when the network conditions are particularly adverse.

RFC6190 - Page 41

   o  The S bit MUST be set to 1, if the first NAL unit following the
      PACSI NAL unit in an aggregation packet is the first VCL NAL unit,
      in decoding order, of a layer representation (when the PACSI NAL
      unit is included in an aggregation packet) or if the next non-
      PACSI NAL unit in transmission order is the first VCL NAL unit, in
      decoding order, of a layer representation(when the PACSI NAL unit
      is included in a single NAL unit packet).  Otherwise, the S bit
      MUST be set to 0.

   o  The E bit MUST be set to 1, if the last NAL unit following the
      PACSI NAL unit in an aggregation packet is the last VCL NAL unit,
      in decoding order, of a layer representation (when the PACSI NAL
      unit is included in an aggregation packet) or if the next non-
      PACSI NAL unit in transmission order is the last VCL NAL unit, in
      decoding order, of a layer representation (when the PACSI NAL unit
      is included in a single NAL unit packet).  Otherwise, the E bit
      MUST be set to 0.

      Informative note: In an aggregation packet it is always possible
      to detect the beginning or end of a layer representation by
      detecting changes in the values of dependency_id, quality_id, and
      temporal_id in NAL unit headers, except from the first and last
      NAL units of a packet.  The S or E bits are used to provide this
      information, for both single NAL unit and aggregation packets, so
      that previous or following packets do not have to be examined.
      This enables MANEs to detect slice loss and take proper action
      such as requesting a retransmission as soon as possible, as well
      as to allow efficient playout buffer handling similarly to the M
      bit present in the RTP header.  The M bit in the RTP header still
      indicates the end of an access unit, not the end of a layer
      representation.

   o  When present, the TL0PICIDX field MUST be set to equal to
      tl0_dep_rep_idx as specified in Annex G of [H.264] for the layer
      representation containing the first NAL unit following the PACSI
      NAL unit in the aggregation packet (when the PACSI NAL unit is
      included in an aggregation packet) or containing the next non-
      PACSI NAL unit in transmission order (when the PACSI NAL unit is
      included in a single NAL unit packet).

   o  When present, the IDRPICID field MUST be set to equal to
      effective_idr_pic_id as specified in Annex G of [H.264] for the
      layer representation containing the first NAL unit following the
      PACSI NAL unit in the aggregation packet (when the PACSI NAL unit
      is included in an aggregation packet) or containing the next non-
      PACSI NAL unit in transmission order (when the PACSI NAL unit is
      included in a single NAL unit packet).

RFC6190 - Page 42

      Informative note: The TL0PICIDX and IDRPICID fields enable the
      detection of the loss of layer representations in the most
      important temporal layer (with temporal_id equal to 0) by
      receivers as well as MANEs.  SVC provides a solution that uses SEI
      messages, which are harder to parse and may or may not be present
      in the bitstream.  When the PACSI NAL unit is part of an NI-MTAP
      packet, it is possible to infer the correct values of
      tl0_dep_rep_idx and idr_pic_id for all layer representations
      contained in the NI-MTAP by following the rules that specify how
      these parameters are set as given in Annex G of [H.264] and by
      detecting the different layer representations contained in the NI-
      MTAP packet by detecting changes in the values of dependency_id_,
      quality_id, and temporal_id in the NAL unit headers as well as
      using the S and E flags.  The only exception is if NAL units of an
      IDR picture are present in the NI-MTAP in a position other than
      the first NAL unit following the PACSI NAL unit, in which case the
      value of idr_pic_id cannot be inferred.  In this case the NAL unit
      has to be partially parsed to obtain the idr_pic_id.  Note that,
      due to the large size of IDR pictures, their inclusion in an NI-
      MTAP, and especially in a position other than the first NAL unit
      following the PACSI NAL unit, may be neither practical nor useful.

   o  When present, the field DONC indicates the cross-session decoding
      order number (CS-DON) for the first of the remaining NAL units in
      the aggregation packet (when the PACSI NAL unit is included in an
      aggregation packet) or the CS-DON of the next non-PACSI NAL unit
      in transmission order (when the PACSI NAL unit is included in a
      single NAL unit packet).  CS-DON is further discussed in Section
      4.11.

   The PACSI NAL unit MAY include a subset of the SEI NAL units
   associated with the access unit to which the first non-PACSI NAL unit
   in the aggregation packet belongs, and MUST NOT contain SEI NAL units
   associated with any other access unit.

      Informative note:  In H.264/AVC and SVC, within each access unit,
      SEI NAL units must appear before any VCL NAL unit in decoding
      order.  Therefore, without using PACSI NAL units, SEI messages are
      typically only conveyed in the first of the packets carrying an
      access unit.  Senders may repeat SEI NAL units in PACSI NAL units,
      so that they are repeated in more than one packet and thus
      increase robustness against packet losses.  Receivers may use the
      repeated SEI messages in place of missing SEI messages.

   For a PACSI NAL unit included in an aggregation packet, an SEI
   message SHOULD NOT be included in the PACSI NAL unit and also
   included in one of the remaining NAL units contained in the same
   aggregation packet.

RFC6190 - Page 43

4.10.  Empty NAL unit

   An empty NAL unit MAY be included in a single NAL unit packet, an
   STAP-A or an NI-MTAP packet.  Empty NAL units MUST have an RTP
   timestamp (when transported in a single NAL unit packet) or NALU-
   time (when transported in an aggregation packet) that is associated
   with an access unit for which there exists at least one NAL unit of
   type 1, 5, or 20.  When MST is used, the type 1, 5, or 20 NAL unit
   may be in a different RTP session.  Empty NAL units may be used in
   the decoding order recovery process of the NI-T mode as described in
   Section 5.2.1.

   The packet structure is shown in the following figure.

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |F|NRI|  Type   | Subtype |J|K|L|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 4.  Empty NAL unit structure.

   The fields MUST be set as follows:

     F MUST be equal to 0
     NRI MUST be equal to 3
     Type MUST be equal to 31
     Subtype MUST be equal to 1
     J MUST be equal to 0
     K MUST be equal to 0
     L MUST be equal to 0

4.11.  Decoding Order Number (DON)

   The DON concept is introduced in [RFC6184] and is used to recover the
   decoding order when interleaving is used within a single session.
   Section 5.5 of [RFC6184] applies when using SST.

   When using MST, it is necessary to recover the decoding order across
   the various RTP sessions regardless if interleaving is used or not.
   In addition to the timestamp mechanism described later, the CS-DON
   mechanism is an extension of the DON facility that can be used for
   this purpose, and is defined in the following section.

4.11.1.  Cross-Session DON (CS-DON) for Multi-Session Transmission

   The cross-session decoding order number (CS-DON) is a number that
   indicates the decoding order of NAL units across all RTP sessions
   involved in MST.  It is similar to the DON concept in [RFC6184], but
   contrary to [RFC6184] where the DON was used only for interleaved

RFC6190 - Page 44

   packetization, in this memo it is used not only in the interleaved
   MST mode (I-C) but also in two of the non-interleaved MST modes (NI-C
   and NI-TC).

   When the NI-C or NI-TC MST modes are in use, the packetization of
   each session MUST be as specified in Section 5.2.2.  In PACSI NAL
   units the CS-DON value is explicitly coded in the field DONC.  For
   non-PACSI NAL units the CS-DON value is derived as follows.  Let SN
   indicate the RTP sequence number of a packet.

   o  For each non-PACSI NAL unit carried in a session using the single
      NAL unit session packetization mode, the CS-DON value of the NAL
      unit is equal to (DONC_prev_PACSI + SN_diff - 1) % 65536, wherein
      "%" is the modulo operation, DONC_prev_PACSI is the DONC value of
      the previous PACSI NAL unit with the same NALU-time as the current
      NAL unit, and SN_diff is calculated as follows:

         if SN1 > SN2, SN_diff = SN1 - SN2
         else SN_diff = SN2 + 65536 - SN1

      where SN1 and SN2 are the SNs of the current NAL unit and the
      previous PACSI NAL unit with the same NALU-time, respectively.

   o  For non-PACSI NAL units carried in a session using the non-
      interleaved session packetization mode, the CS-DON value of each
      non-PACSI NAL unit is derived as follows.

         For a non-PACSI NAL unit in a single NAL unit packet, the
         following applies.

            If the previous PACSI NAL unit is contained in a single NAL
            unit packet, the CS-DON value of the NAL unit is calculated
            as above;

            otherwise (the previous PACSI NAL unit is contained in an
            STAP-A packet), the CS-DON value of the NAL unit is
            calculated as above, with DONC_prev_PACSI being replaced by
            the CS-DON value of the previous non-PACSI NAL unit in
            decoding order (i.e., the CS-DON value of the last NAL unit
            of the STAP-A packet).

         For a non-PACSI NAL unit in an STAP-A packet, the following
         applies.

            If the non-PACSI NAL unit is the first non-PACSI NAL unit in
            the STAP-A packet, the CS-DON value of the NAL unit is equal
            to DONC of the PACSI NAL unit in the STAP-A packet;

RFC6190 - Page 45

            otherwise (the non-PACSI NAL unit is not the first non-
            PACSI NAL unit in the STAP-A packet), the CS-DON value of
            the NAL unit is equal to: (the CS-DON value of the previous
            non-PACSI NAL unit in decoding order + 1) % 65536, wherein
            "%" is the modulo operation.

         For a non-PACSI NAL unit in a number of FU-A packets, the CS-
         DON value of the NAL unit is calculated the same way as when
         the single NAL unit session packetization mode is in use, with
         SN1 being the SN value of the first FU-A packet.

         For a non-PACSI NAL unit in an NI-MTAP packet, the CS-DON value
         is equal to the value of the DON field of the non-interleaved
         multi-time aggregation unit.

   When the I-C MST packetization mode is in use, the DON values derived
   according to [RFC6184] for all the NAL units in each of the RTP
   sessions MUST indicate CS-DON values.

5.  Packetization Rules

   Section 6 of [RFC6184] applies in this memo, with the following
   additions.

5.1.  Packetization Rules for Single-Session Transmission

   All receivers MUST support the single NAL unit packetization mode to
   provide backward compatibility to endpoints supporting only the
   single NAL unit mode of [RFC6184].  However, the use of single NAL
   unit packetization mode (packetization-mode equal to 0) SHOULD be
   avoided whenever possible, because encapsulating NAL units of small
   sizes in their own packets (e.g., small NAL units containing
   parameter sets, prefix NAL units, or SEI messages) is less efficient
   due to the packet header overhead.

   All receivers MUST support the non-interleaved mode.

      Informative note: The non-interleaved mode of [RFC6184] does allow
      an application to encapsulate a single NAL unit in a single RTP
      packet.  Historically, the single NAL unit mode has been included
      in [RFC6184] only for compatibility with ITU-T Rec. H.241 Annex A
      [H.241].  There is no point in carrying this historic ballast
      towards a new application space such as the one provided with SVC.
      The implementation complexity increase for supporting the
      additional mechanisms of the non-interleaved mode (namely, STAP-A
      and FU-A) is minor, whereas the benefits are significant.  As a
      result, the support of STAP-A and FU-A is required.  Additionally,

RFC6190 - Page 46

      support for two of the three NAL unit types defined in this memo,
      namely, empty NAL units and NI-MTAP is needed, as specified in
      Section 4.5.1.

   A NAL unit of small size SHOULD be encapsulated in an aggregation
   packet together with one or more other NAL units.  For example, non-
   VCL NAL units such as access unit delimiters, parameter sets, or SEI
   NAL units are typically small.

   A prefix NAL unit and the NAL unit with which it is associated, and
   which follows the prefix NAL unit in decoding order, SHOULD be
   included in the same aggregation packet whenever an aggregation
   packet is used for the associated NAL unit, unless this would violate
   session MTU constraints or if fragmentation units are used for the
   associated NAL unit.

      Informative note: Although the prefix NAL unit is ignored by an
      H.264/AVC decoder, it is necessary in the SVC decoding process.

      Given the small size of the prefix NAL unit, it is best if it is
      transported in the same RTP packet as its associated NAL unit.

   When only an H.264/AVC compatible subset of the SVC base layer is
   transmitted in an RTP session, the subset MUST be encapsulated
   according to [RFC6184].  This way, an [RFC6184] receiver will be able
   to receive the H.264/AVC compatible bitstream subset.

   When a set of layers including one or more SVC enhancement layers is
   transmitted in an RTP session, the set SHOULD be carried in one RTP
   stream that SHOULD be encapsulated according to this memo.

5.2.  Packetization Rules for Multi-Session Transmission

   When MST is used, the packetization rules specified in Section 5.1
   still apply.  In addition, the following packetization rules MUST be
   followed, to ensure that decoding order of NAL units carried in the
   sessions can be correctly recovered for each of the MST packetization
   modes using the de-packetization process specified in Section 6.2.

   The NI-T and NI-TC modes both use timestamps to recover the decoding
   order.  In order to be able to do so, it is necessary for the RTP
   packet stream to contain data for all sampling instances of a given
   RTP session in all enhancement RTP sessions that depend on the given
   RTP session.  The NI-C and I-C modes do not have this limitation, and
   use the CS-DON values as a means to explicitly indicate decoding
   order, either directly coded in PACSI NAL units, or inferred from

RFC6190 - Page 47

   them using the packetization rules.  It is noted that the NI-TC mode
   offers both alternatives and it is up to the receiver to select which
   one to use.

5.2.1.  NI-T/NI-TC Packetization Rules

   When using the NI-T mode and a PACSI NAL unit is present, the T bit
   MUST be equal to 0, i.e., the DONC field MUST NOT be present.

   When using the NI-T mode, the optional parameters sprop-mst-remux-
   buf-size, sprop-remux-buf-req, remux-buf-cap, sprop-remux-init-buf-
   time, sprop-mst-max-don-diff MUST NOT be present.

   When the NI-T or NI-TC MST mode is in use, the following applies.

   If one or more NAL units of an access unit of sampling time instance
   t is present in RTP session A, then one or more NAL units of the same
   access unit MUST be present in any enhancement RTP session that
   depends on RTP session A.

      Informative note: The mapping between RTP and NTP format
      timestamps is conveyed in RTCP SR packets.  In addition, the
      mechanisms for faster media timestamp synchronization discussed in
      [RFC6051] may be used to speed up the acquisition of the RTP-to-
      wall-clock mapping.

      Informative note: The rule above may require the insertion of NAL
      units, typically when temporal scalability is used, i.e., an
      enhancement RTP session does not contain any NAL units for an
      access unit with a particular NTP timestamp (media timestamp),
      which, however, is present in a lower enhancement RTP session or
      the base RTP session.  There are two ways to insert additional NAL
      units in order to satisfy this rule:

      - One option for adding additional NAL units is to use empty NAL
        units (defined in Section 4.10), which can be used by the
        process described in Section 6.2.1 for the access unit
        reordering process.

      - Additional NAL units may also be added by the encoder itself,
        for example, by transmitting coded data that simply instruct the
        decoder to repeat the previous picture.  This option, however,
        may be difficult to use with pre-encoded content.

   If a packet must be inserted in order to satisfy the above rule,
   e.g., in case of a MANE generating multiple RTP streams out of a
   single RTP stream, the inserted packet must have an RTP timestamp
   that maps to the same wall-clock time (in NTP format) as the one of

RFC6190 - Page 48

   the RTP timestamp of any packet of the access unit present in any
   lower enhancement RTP session or the base RTP session.  This is easy
   to accomplish if the NAL unit or the packet can be inserted at the
   time of the RTP stream generation, since the media timestamp (NTP
   timestamp) must be the same for the inserted packet and the packet of
   the corresponding access unit.  If there is no knowledge of the media
   time at RTP stream generation or if the RTP streams are not generated
   at the same instance, this can be also applied later in the
   transmission process.  In this case the NTP timestamp of the inserted
   packet can be calculated as follows.

   Assume that a packet A2 of an access unit with RTP timestamp TS_A2 is
   present in base RTP session A, and that no packet of that access unit
   is present in enhancement RTP session B, as shown in Figure 5.  Thus,
   a packet B2 must be inserted into session B following the rule above.
   The most recent RTCP sender report in session A carries NTP timestamp
   NTP_A and the RTP timestamp TS_A.  The sender report in session B
   with a lower NTP timestamp than NTP_A is NTP_B, and carries the RTP
   timestamp TS_B.

     RTP  session B:..B0........B1........(B2)......................

     RTCP session B:.....SR(NTP_B,TS_B).............................

     RTP  session A:..A0........A1........A2........................

     RTCP session A:..................SR(NTP_A,TS_A)................

     -----------------|--x------|-----x---|------------------------>
                                                              NTP time
     --------------------+<---------->+<->+------------------------>
                               t1       t2              RTP TS(B) time

   Figure 5.  Example calculation of RTP timestamp for packet
   insertion in an enhancement layer RTP session

   The vertical bars ("|")in the NTP time line in the figure above
   indicate that access unit data is present in at least one of the
   sessions.  The "x" marks indicate the times of the sender reports.
   The RTP timestamp time line for session B, shown right below the NTP
   time line, indicates two time segments, t1 and t2. t1 is the time
   difference between the sender reports between the two sessions,
   expressed in RTP timestamp clock ticks, and t2 is the time difference
   from the session A sender report to the A2 packet, again expressed in
   RTP timestamp clock ticks.  The sum of these differences is added to

RFC6190 - Page 49

   the RTP timestamp of the session report from session B in order to
   derive the correct RTP timestamp for the inserted packet B2.  In
   other words:

     TS_B2 = TS_B + t1 + t2

   Let toRTP() be a function that calculates the RTP time difference (in
   clock ticks of the used clock) given an NTP timestamp difference, and
   effRTPdiff() be a function that calculates the effective difference
   between two timestamps, including wraparounds:

     effRTPdiff( ts1, ts2 ):

         if( ts1 <= ts2 ) then
             effRTPdiff := ts1-ts2
         else
             effRTPDiff := (4294967296 + ts2) - ts1
   We have:

     t1 = toRTP(NTP_A - NTP_B) and t2 = effRTPdiff(TS_A2, TS_A)

   Hence in order to generate the RTP timestamp TS_B2 for the inserted
   packet B2, the RTP timestamp for packet B2 TS_B2 can be calculated as
   follows.

     TS_B2 =  TS_B + toRTP(NTP_A - NTP_B) +  effRTPdiff(TS_A2, TS_A)

5.2.2.  NI-C/NI-TC Packetization Rules

   When the NI-C or NI-TC MST mode is in use, the following applies for
   each of the RTP sessions.

   o  For each single NAL unit packet containing a non-PACSI NAL unit,
      the previous packet, if present, MUST have the same RTP timestamp
      as the single NAL unit packet, and the following applies.

      o  If the NALU-time of the non-PACSI NAL unit is not equal to the
         NALU-time of the previous non-PACSI NAL unit in decoding order,
         the previous packet MUST contain a PACSI NAL unit containing
         the DONC field.

   o  In an STAP-A packet the first NAL unit in the STAP-A packet MUST
      be a PACSI NAL unit containing the DONC field.

   o  For an FU-A packet the previous packet MUST have the same RTP
      timestamp as the FU-A packet, and the following applies.

RFC6190 - Page 50

      o If the FU-A packet is the start of the fragmented NAL unit, the
         following applies.

         o  If the NALU-time of the fragmented NAL unit is not equal to
            the NALU-time of the previous non-PACSI NAL unit in decoding
            order, the previous packet MUST contain a PACSI NAL unit
            containing the DONC field;

         o  Otherwise, (the NALU-time of the fragmented NAL unit is
            equal to the NALU-time of the previous non-PACSI NAL unit in
            decoding order), the previous packet MAY contain a PACSI NAL
            unit containing the DONC field.

      o  Otherwise, if the FU-A packet is the end of the fragmented NAL
         unit, the following applies.

         o  If the next non-PACSI NAL unit in decoding order has NALU-
            time equal to the NALU-time of the fragmented NAL unit, and
            is carried in a number of FU-A packets or a single NAL unit
            packet, the next packet MUST be a single NAL unit packet
            containing a PACSI NAL unit containing the DONC field.

         o  Otherwise (the FU-A packet is neither the start nor the end
            of the fragmented NAL unit), the previous packet MUST be a
            FU-A packet.

   o  For each single NAL unit packet containing a PACSI NAL unit, if
      present, the PACSI NAL unit MUST contain the DONC field.

   o  When the optional media type parameter sprop-mst-csdon-always-
      present is equal to 1, the session packetization mode in use MUST
      be the non-interleaved mode, and only STAP-A and NI-MTAP packets
      can be used.

5.2.3.  I-C Packetization Rules

   When the I-C MST packetization mode is in use, the following applies.

   o  When a PACSI NAL unit is present, the T bit MUST be equal to 0,
      i.e., the DONC field is not present, and the Y bit MUST be equal
      to 0, i.e., the TL0PICIDX and IDRPICID are not present.

5.2.4.  Packetization Rules for Non-VCL NAL Units

   NAL units that do not directly encode video slices are known in H.264
   as non-VCL NAL units.  Non-VCL units that are only used by, or only
   relevant to, enhancement RTP sessions SHOULD be sent in the lowest
   session to which they are relevant.

RFC6190 - Page 51

   Some senders, however, such as those sending pre-encoded data, may be
   unable to easily determine which non-VCL units are relevant to which
   session.  Thus, non-VCL NAL units MAY, instead, be sent in a session
   on which the session using these non-VCL NAL units depends (e.g., the
   base RTP session).

   If a non-VCL unit is relevant to more than one RTP session, neither
   of which depends on the other(s), the NAL unit MAY be sent in another
   session on which all these sessions depend.

5.2.5.  Packetization Rules for Prefix NAL Units

   Section 5.1 of this memo applies, with the following addition.  If
   the base layer is sent in a base RTP session using [RFC6184], prefix
   NAL units MAY be sent in the lowest enhancement RTP session rather
   than in the base RTP session.

(page 51 continued on part 3)