4. RTP Payload Format
4.1. RTP Header Usage
In addition to Section 5.1 of [RFC6184], the following rules apply.
o Setting of the M bit:
The M bit of an RTP packet for which the packet payload is an NI-MTAP
MUST be equal to 1 if the last NAL unit, in decoding order, of the
access unit associated with the RTP timestamp is contained in the
packet.
o Setting of the RTP timestamp:
For an RTP packet for which the packet payload is an empty NAL unit,
the RTP timestamp must be set according to Section 4.10.
For an RTP packet for which the packet payload is a PACSI NAL unit,
the RTP timestamp MUST be equal to the NALU-time of the next non-
PACSI NAL unit in transmission order. Recall that the NALU-time of a
NAL unit in an MTAP is defined in [RFC6184] as the value that the RTP
timestamp would have if that NAL unit would be transported in its own
RTP packet.
o Setting of the SSRC:
For both SST and MST, the SSRC values MUST be set according to
[RFC3550].
4.2. NAL Unit Extension and Header Usage
4.2.1. NAL Unit Extension
This memo specifies a NAL unit extension mechanism to allow for
introduction of new types of NAL units, beyond the three NAL unit
types left undefined in [RFC6184] (i.e., 0, 30, and 31). The
extension mechanism utilizes the NAL unit type value 31 and is
specified as follows. When the NAL unit type value is equal to 31,
the one-byte NAL unit header consisting of the F, NRI, and Type
fields as specified in Section 1.1.3 is extended by one additional
octet, which consists of a 5-bit field named Subtype and three 1-bit
fields named J, K, and L, respectively. The additional octet is
shown in the following figure.
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| Subtype |J|K|L|
+---------------+
The Subtype value determines the (extended) NAL unit type of this NAL
unit. The interpretation of the fields J, K, and L depends on the
Subtype. The semantics of the fields are as follows.
When Subtype is equal to 1, the NAL unit is an empty NAL unit as
specified in Section 4.10. When Subtype is equal to 2, the NAL unit
is an NI-MTAP NAL unit as specified in Section 4.7.1. All other
values of Subtype (0, 3-31) are reserved for future extensions, and
receivers MUST ignore the entire NAL unit when Subtype is equal to
any of these reserved values.
4.2.2. NAL Unit Header Usage
The structure and semantics of the NAL unit header according to the
H.264 specification [H.264] were introduced in Section 1.1.3. This
section specifies the extended semantics of the NAL unit header
fields F, NRI, I, PRID, DID, QID, TID, U, and D, according to this
memo. When the Type field is equal to 31, the semantics of the
fields in the extension NAL unit header were specified in Section
4.2.1.
The semantics of F specified in Section 5.3 of [RFC6184] also apply
in this memo. That is, a value of 0 for F indicates that the NAL
unit type octet and payload should not contain bit errors or other
syntax violations, whereas a value of 1 for F indicates that the NAL
unit type octet and payload may contain bit errors or other syntax
violations. MANEs SHOULD set the F bit to indicate bit errors in the
NAL unit.
For NRI, for a bitstream conforming to one of the profiles defined in
Annex A of [H.264] and transported using [RFC6184], the semantics
specified in Section 5.3 of [RFC6184] apply, i.e., NRI also indicates
the relative importance of NAL units. For a bitstream conforming to
one of the profiles defined in Annex G of [H.264] and transported
using this memo, in addition to the semantics specified in Annex G of
[H.264], NRI also indicates the relative importance of NAL units
within a layer.
For I, in addition to the semantics specified in Annex G of [H.264],
according to this memo, MANEs MAY use this information to protect NAL
units with I equal to 1 better than NAL units with I equal to 0.
MANEs MAY also utilize information of NAL units with I equal to 1 to
decide when to forward more packets for an RTP packet stream. For
example, when it is detected that spatial layer switching has
happened such that the operation point has changed to a higher value
of DID, MANEs MAY start to forward NAL units with the higher value of
DID only after forwarding a NAL unit with I equal to 1 with the
higher value of DID.
Note that, in the context of this section, "protecting a NAL unit"
means any RTP or network transport mechanism that could improve the
probability of successful delivery of the packet conveying the NAL
unit, including applying a Quality of Service (QoS) enabled network,
Forward Error Correction (FEC), retransmissions, and advanced
scheduling behavior, whenever possible.
For PRID, the semantics specified in Annex G of [H.264] apply. Note
that MANEs implementing unequal error protection MAY use this
information to protect NAL units with smaller PRID values better than
those with larger PRID values, for example, by including only the
more important NAL units in a FEC protection mechanism. The
importance for the decoding process decreases as the PRID value
increases.
For DID, QID, or TID, in addition to the semantics specified in Annex
G of [H.264], according to this memo, values of DID, QID, or TID
indicate the relative importance in their respective dimension. A
lower value of DID, QID, or TID indicates a higher importance if the
other two components are identical. MANEs MAY use this information
to protect more important NAL units better than less important NAL
units.
For U, in addition to the semantics specified in Annex G of [H.264],
according to this memo, MANEs MAY use this information to protect NAL
units with U equal to 1 better than NAL units with U equal to 0.
For D, in addition to the semantics specified in Annex G of [H.264],
according to this memo, MANEs MAY use this information to determine
whether a given NAL unit is required for successfully decoding a
certain Operation Point of the SVC bitstream, hence to decide whether
to forward the NAL unit.
4.3. Payload Structures
The NAL unit structure is central to H.264/AVC, [RFC6184], as well as
SVC and this memo. In H.264/AVC and SVC, all coded bits for
representing a video signal are encapsulated in NAL units. In
[RFC6184], each RTP packet payload is structured as a NAL unit, which
contains one or a part of one NAL unit specified in H.264/AVC, or
aggregates one or more NAL units specified in H.264/AVC.
[RFC6184] specifies three basic payload structures (in Section 5.2 of
[RFC6184]): single NAL unit packet, aggregation packet, fragmentation
unit, and six new types (24 to 29) of NAL units. The value of the
Type field of the RTP packet payload header (i.e., the first byte of
the payload) may be equal to any value from 1 to 23 for a single NAL
unit packet, any value from 24 to 27 for an aggregation packet, and
28 or 29 for a fragmentation unit.
In addition to the NAL unit types defined originally for H.264/AVC,
SVC defines three new NAL unit types specifically for SVC: coded
slice in scalable extension NAL units (type 20), prefix NAL units
(type 14), and subset sequence parameter set NAL units (type 15), as
described in Section 1.1.
This memo further introduces three new types of NAL units, PACSI NAL
unit (NAL unit type 30) as specified in Section 4.9, empty NAL unit
(type 31, subtype 1) as specified in Section 4.10, and NI-MTAP NAL
unit (type 31, subtype 2) as specified in Section 4.7.1.
The RTP packet payload structure in [RFC6184] is maintained with
slight extensions in this memo, as follows. Each RTP packet payload
is still structured as a NAL unit, which contains one or a part of
one NAL unit specified in H.264/AVC and SVC, or contains one PACSI
NAL unit or one empty NAL unit, or aggregates zero or more NAL units
specified in H.264/AVC and SVC, zero or one PACSI NAL unit, and zero
or more empty NAL units.
In this memo, one of the three basic payload structures,
fragmentation unit, remains the same as in [RFC6184], and the other
two, single NAL unit packet and aggregation packet, are extended as
follows. The value of the Type field of the payload header may be
equal to any value from 1 to 23, inclusive, and 30 to 31, inclusive,
for a single NAL unit packet, and any value from 24 to 27, inclusive,
and 31, for an aggregation packet. When the Type field of the
payload header is equal to 31 and the Subtype field of the payload
header is equal to 2, the packet is an aggregation packet (containing
an NI-MTAP NAL unit). When the Type field of the payload header is
equal to 31 and the Subtype field of the payload header is equal to
1, the packet is a single NAL unit packet (containing an empty NAL
unit).
Note that, in this memo, the length of the payload header varies
depending on the value of the Type field in the first byte of the RTP
packet payload. If the value is equal to 14, 20, or 30, the first
four bytes of the packet payload form the payload header; otherwise,
if the value is equal to 31, the first two bytes of the payload form
the payload header; otherwise, the payload header is the first byte
of the packet payload.
Table 1 lists the NAL unit types introduced in SVC and this memo and
where they are described in this memo. Table 2 summarizes the basic
payload structure types for all NAL unit types when they are directly
used as RTP packet payloads according to this memo. Table 3
summarizes the NAL unit types allowed to be aggregated (i.e., used as
aggregation units in aggregation packets) or fragmented (i.e.,
carried in fragmentation units) according to this memo.
Table 1. NAL unit types introduced in SVC and this memo
Type Subtype NAL Unit Name Section Numbers
-----------------------------------------------------------
14 - Prefix NAL unit 1.1
15 - Subset sequence parameter set 1.1
20 - Coded slice in scalable extension 1.1
30 - PACSI NAL unit 4.9
31 0 reserved 4.2.1
31 1 Empty NAL unit 4.10
31 2 NI-MTAP 4.7.1
31 3-31 reserved 4.2.1Table 2. Basic payload structure types for all NAL unit
types when they are directly used as RTP packet payloads
Type Subtype Basic Payload Structure
------------------------------------------
0 - reserved
1-23 - Single NAL Unit Packet
24-27 - Aggregation Packet
28-29 - Fragmentation Unit
30 - Single NAL Unit Packet
31 0 reserved
31 1 Single NAL Unit Packet
31 2 Aggregation Packet
31 3-31 reserved
Table 3. Summary of the NAL unit types allowed to be
aggregated or fragmented (yes = allowed, no = disallowed,
- = not applicable/not specified)
Type Subtype STAP-A STAP-B MTAP16 MTAP24 FU-A FU-B NI-MTAP
-------------------------------------------------------------
0 - - - - - - - -
1-23 - yes yes yes yes yes yes yes
24-29 - no no no no no no no
30 - yes yes yes yes no no yes
31 0 - - - - - - -
31 1 yes no no no no no yes
31 2 no no no no no no no
31 3-31 - - - - - - -4.4. Transmission Modes
This memo enables transmission of an SVC bitstream over one or more
RTP sessions. If only one RTP session is used for transmission of
the SVC bitstream, the transmission mode is referred to as single-
session transmission (SST); otherwise (more than one RTP session is
used for transmission of the SVC bitstream), the transmission mode is
referred to as multi-session transmission (MST).
SST SHOULD be used for point-to-point unicast scenarios, while MST
SHOULD be used for point-to-multipoint multicast scenarios where
different receivers requires different operation points of the same
SVC bitstream, to improve bandwidth utilizing efficiency.
If the OPTIONAL mst-mode media type parameter (see Section 7.1) is
not present, SST MUST be used; otherwise (mst-mode is present), MST
MUST be used.
4.5. Packetization Modes
4.5.1. Packetization Modes for Single-Session Transmission
When SST is in use, Section 5.4 of [RFC6184] applies with the
following extensions.
The packetization modes specified in Section 5.4 of [RFC6184],
namely, single NAL unit mode, non-interleaved mode, and interleaved
mode, are also referred to as session packetization modes. Table 4
summarizes the allowed session packetization modes for SST.
Table 4. Summary of allowed session packetization modes
(denoted as "Session Mode" for simplicity) for SST (yes =
allowed, no = disallowed)
Session Mode Allowed
-------------------------------------
Single NAL Unit Mode yes
Non-Interleaved Mode yes
Interleaved Mode yes
For NAL unit types in the range of 0 to 29, inclusive, the NAL unit
types allowed to be directly used as packet payloads for each session
packetization mode are the same as specified in Section 5.4 of
[RFC6184]. For other NAL unit types, which are newly introduced in
this memo, the NAL unit types allowed to be directly used as packet
payloads for each session packetization mode are summarized in Table
5.
Table 5. New NAL unit types allowed to be directly used
as packet payloads for each session packetization mode
(yes = allowed, no = disallowed, - = not applicable/not specified)
Type Subtype Single NAL Non-Interleaved Interleaved
Unit Mode Mode Mode
-------------------------------------------------------------
30 - yes no no
31 0 - - -
31 1 yes yes no
31 2 no yes no
31 3-31 - - -4.5.2. Packetization Modes for Multi-Session Transmission
For MST, this memo specifies four MST packetization modes:
o Non-interleaved timestamp based mode (NI-T);
o Non-interleaved cross-session decoding order number (CS-DON) based
mode (NI-C);
o Non-interleaved combined timestamp and CS-DON mode (NI-TC); and
o Interleaved CS-DON (I-C) mode.
These four modes differ in two ways. First, they differ in terms of
whether NAL units are required to be transmitted within each RTP
session in decoding order (i.e., non-interleaved), or they are
allowed to be transmitted in a different order (i.e., interleaved).
Second, they differ in the mechanisms they provide in order to
recover the correct decoding order of the NAL units across all RTP
sessions involved.
The NI-T, NI-C, and NI-TC modes do not allow interleaving, and are
thus targeted for systems that require relatively low end-to-end
latency, e.g., conversational systems. The I-C mode allows
interleaving and is thus targeted for systems that do not require
very low end-to-end latency. The benefits of interleaving are the
same as that of the interleaved mode specified in [RFC6184].
The NI-T mode uses timestamps to recover the decoding order of NAL
units, whereas the NI-C and I-C modes both use the CS-DON mechanism
(explained later) to do so. The NI-TC mode provides both timestamps
and the CS-DON method; receivers in this case may choose to use
either method for performing decoding order recovery. The MST
packetization mode in use MUST be signaled by the value of the
OPTIONAL mst-mode media type parameter. The used MST packetization
mode governs which session packetization modes are allowed in the
associated RTP sessions, which in turn govern which NAL unit types
are allowed to be directly used as RTP packet payloads.
Table 6 summarizes the allowed session packetization modes for NI-T,
NI-C, and NI-TC. Table 7 summarizes the allowed session
packetization modes for I-C.
Table 6. Summary of allowed session packetization modes
(denoted as "Session Mode" for simplicity) for NI-T, NI-C, and
NI-TC (yes = allowed, no = disallowed)
Session Mode Base Session Enhancement Session
-----------------------------------------------------------
Single NAL Unit Mode yes no
Non-Interleaved Mode yes yes
Interleaved Mode no noTable 7. Summary of allowed session packetization modes
(denoted as "Session Mode" for simplicity) for I-C
(yes = allowed, no = disallowed)
Session Mode Base Session Enhancement Session
-----------------------------------------------------------
Single NAL Unit Mode no no
Non-Interleaved Mode no no
Interleaved Mode yes yes
If the Type field of the first byte of the payload is not equal to
31, the payload header is the first byte of the payload. Otherwise,
(the Type field of the first byte of the payload is equal to 31), the
payload header is the first two bytes of the payload.
4.7. Aggregation Packets
In addition to Section 5.7 of [RFC6184], the following applies in
this memo.
4.7.1. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs)
One new NAL unit type introduced in this memo is the non-interleaved
multi-time aggregation packet (NI-MTAP). An NI-MTAP consists of one
or more non-interleaved multi-time aggregation units.
The NAL units contained in NI-MTAPs MUST be aggregated in decoding
order.
A non-interleaved multi-time aggregation unit for the NI-MTAP
consists of 16 bits of unsigned size information of the following NAL
unit (in network byte order), and 16 bits (in network byte order) of
timestamp offset (TS offset) for the NAL unit. The structure is
presented in Figure 1. The starting or ending position of an
aggregation unit within a packet may or may not be on a 32-bit word
boundary. The NAL units in the NI-MTAP are ordered in NAL unit
decoding order.
The Type field of the NI-MTAP MUST be set equal to "31".
The F bit MUST be set to 0 if all the F bits of the aggregated NAL
units are zero; otherwise, it MUST be set to 1.
The value of NRI MUST be the maximum value of NRI across all NAL
units carried in the NI-MTAP packet.
The field Subtype MUST be equal to 2.
If the field J is equal to 1, the optional DON field MUST be present
for each of the non-interleaved multi-time aggregation units. For
SST, the J field MUST be equal to 0. For MST, in the NI-T mode the J
field MUST be equal to 0, whereas in the NI-C or NI-TC mode the J
field MUST be equal to 1. When the NI-C or NI-TC mode is in use, the
DON field, when present, MUST represent the CS-DON value for the
particular NAL unit as defined in Section 6.2.2.
The fields K and L MUST be both equal to 0.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: NAL unit size | TS offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DON (optional) | |
|-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NAL unit |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1. Non-interleaved multi-time aggregation unit for NI-MTAP
Let TS be the RTP timestamp of the packet carrying the NAL unit.
Recall that the NALU-time of a NAL unit in an MTAP is defined in
[RFC6184] as the value that the RTP timestamp would have if that NAL
unit would be transported in its own RTP packet. The timestamp
offset field MUST be set to a value equal to the value of the
following formula:
if NALU-time >= TS, TS offset = NALU-time - TS
else, TS offset = NALU-time + (2^32 - TS)
For the "earliest" multi-time aggregation unit in an NI-MTAP, the
timestamp offset MUST be zero. Hence, the RTP timestamp of the NI-
MTAP itself is identical to the earliest NALU-time.
Informative note: The "earliest" multi-time aggregation unit is
the one that would have the smallest extended RTP timestamp among
all the aggregation units of an NI-MTAP if the aggregation units
were encapsulated in single NAL unit packets. An extended
timestamp is a timestamp that has more than 32 bits and is capable
of counting the wraparound of the timestamp field, thus enabling
one to determine the smallest value if the timestamp wraps. Such
an "earliest" aggregation unit may or may not be the first one in
the order in which the aggregation units are encapsulated in an
NI-MTAP. The "earliest" NAL unit need not be the same as the
first NAL unit in the NAL unit decoding order either.
Figure 2 presents an example of an RTP packet that contains an NI-
MTAP that contains two non-interleaved multi-time aggregation units,
labeled as 1 and 2 in the figure.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| Type | Subtype |J|K|L| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| Non-interleaved multi-time aggregation unit #1 |
: :
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Non-interleaved multi-time |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| aggregation unit #2 |
: :
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2. An RTP packet including an NI-MTAP containing two
non-interleaved multi-time aggregation units4.8. Fragmentation Units (FUs)
Section 5.8 of [RFC6184] applies.
Informative note: In case a NAL unit with the four-byte SVC NAL
unit header is fragmented, the three-byte SVC-specific header
extension is considered as part of the NAL unit payload. That is,
the three-byte SVC-specific header extension is only available in
the first fragment of the fragmented NAL unit.
4.9. Payload Content Scalability Information (PACSI) NAL Unit
Another new type of NAL unit specified in this memo is the payload
content scalability information (PACSI) NAL unit. The Type field of
PACSI NAL units MUST be equal to 30 (a NAL unit type value left
unspecified in [H.264] and [RFC6184]). A PACSI NAL unit MAY be
carried in a single NAL unit packet or an aggregation packet, and
MUST NOT be fragmented.
PACSI NAL units may be used for the following purposes:
o To enable MANEs to decide whether to forward, process, or discard
aggregation packets, by checking in PACSI NAL units the
scalability information and other characteristics of the
aggregated NAL units, rather than looking into the aggregated NAL
units themselves, which are defined by the video coding
specification;
o To enable correct decoding order recovery in MST using the NI-C or
NI-TC mode, with the help of the CS-DON information included in
PACSI NAL units; and
o To improve resilience to packet losses, e.g., by utilizing the
following data or information included in PACSI NAL units:
repeated Supplemental Enhancement Information (SEI) messages,
information regarding the start and end of layer representations,
and the indices to layer representations of the lowest temporal
subset.
PACSI NAL units MAY be ignored in the NI-T mode without affecting the
decoding order recovery process.
When a PACSI NAL unit is present in an aggregation packet, the
following applies.
o The PACSI NAL unit MUST be the first aggregated NAL unit in the
aggregation packet.
o There MUST be at least one additional aggregated NAL unit in the
aggregation packet.
o The RTP header fields and the payload header fields of the
aggregation packet are set as if the PACSI NAL unit was not
included in the aggregation packet.
o If the aggregation packet is an MTAP16, MTAP24, or NI-MTAP with
the J field equal to 1, the decoding order number (DON) for the
PACSI NAL unit MUST be set to indicate that the PACSI NAL unit has
an identical DON to the first NAL unit in decoding order among the
remaining NAL units in the aggregation packet.
When a PACSI NAL unit is included in a single NAL unit packet, it is
associated with the next non-PACSI NAL unit in transmission order,
and the RTP header fields of the packet are set as if the next non-
PACSI NAL unit in transmission order was included in a single NAL
unit packet.
The PACSI NAL unit structure is as follows. The first four octets
are exactly the same as the four-byte SVC NAL unit header discussed
in Section 1.1.3. They are followed by one octet containing several
flags, then five optional octets, and finally zero or more SEI NAL
units. Each SEI NAL unit is preceded by a 16-bit unsigned size field
(in network byte order) that indicates the size of the following NAL
unit in bytes (excluding these two octets, but including the NAL unit
header octet of the SEI NAL unit). Figure 3 illustrates the PACSI
NAL unit structure and an example of a PACSI NAL unit containing two
SEI NAL units.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| Type |R|I| PRID |N| DID | QID | TID |U|D|O| RR|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|X|Y|T|A|P|C|S|E| TL0PICIDX (o) | IDRPICID (o) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| DONC (o) | NAL unit size 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| SEI NAL unit 1 |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | NAL unit size 2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| SEI NAL unit 2 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3. PACSI NAL unit structure. Fields suffixed by
"(o)" are OPTIONAL.
The bits A, P, and C are specified only if the bit X is equal to 1.
The bits S and E are specified, and the fields TL0PICIDX and IDRPICID
are present, only if the bit Y is equal to 1. The field DONC is
present only if the bit T is equal to 1. The field T MUST be equal
to 0 if the PACSI NAL unit is contained in an STAP-B, MTAP16, MTAP24,
or NI-MTAP with the J field equal to 1.
The values of the fields in PACSI NAL unit MUST be set as follows.
o The F bit MUST be set to 1 if the F bit in at least one of the
remaining NAL units in the aggregation packet is equal to 1 (when
the PACSI NAL unit is included in an aggregation packet) or if the
next non-PACSI NAL unit in transmission order has the F bit equal
to 1 (when the PACSI NAL unit is included in a single NAL unit
packet). Otherwise, the F bit MUST be set to 0.
o The NRI field MUST be set to the highest value of NRI field among
all the remaining NAL units in the aggregation packet (when the
PACSI NAL unit is included in an aggregation packet) or the value
of the NRI field of the next non-PACSI NAL unit in transmission
order (when the PACSI NAL unit is included in a single NAL unit
packet).
o The Type field MUST be set to 30.
o The R bit MUST be set to 1. Receivers MUST ignore the value of R.
o The I bit MUST be set to 1 if the I bit of at least one of the
remaining NAL units in the aggregation packet is equal to 1 (when
the PACSI NAL unit is included in an aggregation packet) or if the
I bit of the next non-PACSI NAL unit in transmission order is
equal to 1 (when the PACSI NAL unit is included in a single NAL
unit packet). Otherwise, the I bit MUST be set to 0.
o The PRID field MUST be set to the lowest value of the PRID values
of the remaining NAL units in the aggregation packet (when the
PACSI NAL unit is included in an aggregation packet) or the PRID
value of the next non-PACSI NAL unit in transmission order (when
the PACSI NAL unit is included in a single NAL unit packet).
o The N bit MUST be set to 1 if the N bit of all the remaining NAL
units in the aggregation packet is equal to 1 (when the PACSI NAL
unit is included in an aggregation packet) or if the N bit of the
next non-PACSI NAL unit in transmission order is equal to 1 (when
the PACSI NAL unit is included in a single NAL unit packet).
Otherwise, the N bit MUST be set to 0.
o The DID field MUST be set to the lowest value of the DID values of
the remaining NAL units in the aggregation packet (when the PACSI
NAL unit is included in an aggregation packet) or the DID value of
the next non-PACSI NAL unit in transmission order (when the PACSI
NAL unit is included in a single NAL unit packet).
o The QID field MUST be set to the lowest value of the QID values of
the remaining NAL units with the lowest value of DID in the
aggregation packet (when the PACSI NAL unit is included in an
aggregation packet) or the QID value of the next non-PACSI NAL
unit in transmission order (when the PACSI NAL unit is included in
a single NAL unit packet).
o The TID field MUST be set to the lowest value of the TID values of
the remaining NAL units with the lowest value of DID in the
aggregation packet (when the PACSI NAL unit is included in an
aggregation packet) or the TID value of the next non-PACSI NAL
unit in transmission order (when the PACSI NAL unit is included in
a single NAL unit packet).
o The U bit MUST be set to 1 if the U bit of at least one of the
remaining NAL units in the aggregation packet is equal to 1 (when
the PACSI NAL unit is included in an aggregation packet) or if the
U bit of the next non-PACSI NAL unit in transmission order is
equal to 1 (when the PACSI NAL unit is included in a single NAL
unit packet). Otherwise, the U bit MUST be set to 0.
o The D bit MUST be set to 1 if the D value of all the remaining NAL
units in the aggregation packet is equal to 1 (when the PACSI NAL
unit is included in an aggregation packet) or if the D bit of the
next non-PACSI NAL unit in transmission order is equal to 1 (when
the PACSI NAL unit is included in a single NAL unit packet).
Otherwise, the D bit MUST be set to 0.
o The O bit MUST be set to 1 if the O bit of at least one of the
remaining NAL units in the aggregation packet is equal to 1 (when
the PACSI NAL unit is included in an aggregation packet) or if the
O bit of the next non-PACSI NAL unit in transmission order is
equal to 1 (when the PACSI NAL unit is included in a single NAL
unit packet). Otherwise, the O bit MUST be set to 0.
o The RR field MUST be set to "11" (in binary form). Receivers MUST
ignore the value of RR.
o If the X bit is equal to 1, the bits A, P, and C are specified as
below. Otherwise, the bits A, P, and C are unspecified, and
receivers MUST ignore the values of these bits. The X bit SHOULD
be identical for all the PACSI NAL units in all the RTP sessions
carrying the same SVC bitstream.
o If the Y bit is equal to 1, the OPTIONAL fields TL0PICIDX and
IDRPICID MUST be present and specified as below, and the bits S
and E are also specified as below. Otherwise, the fields
TL0PICIDX and IDRPICID MUST NOT be present, while the S and E bits
are unspecified and receivers MUST ignore the values of these
bits. The Y bit MUST be identical for all the PACSI NAL units in
all the RTP sessions carrying the same SVC bitstream. The Y bit
MUST be equal to 0 when the parameter packetization-mode is equal
to 2.
o If the T bit is equal to 1, the OPTIONAL field DONC MUST be
present and specified as below. Otherwise, the field DONC MUST
NOT be present. The field T MUST be equal to 0 if the PACSI NAL
unit is contained in an STAP-B, MTAP16, MTAP24, or NI-MTAP.
o The A bit MUST be set to 1 if at least one of the remaining NAL
units in the aggregation packet belongs to an anchor layer
representation (when the PACSI NAL unit is included in an
aggregation packet) or if the next non-PACSI NAL unit in
transmission order belongs to an anchor layer representation (when
the PACSI NAL unit is included in a single NAL unit packet).
Otherwise, the A bit MUST be set to 0.
Informative note: The A bit indicates whether CGS or spatial layer
switching at a non-IDR layer representation (a layer
representation with nal_unit_type not equal to 5 and idr_flag not
equal to 1) can be performed. With some picture coding structures
a non-IDR intra layer representation can be used for random
access. Compared to using only IDR layer representations, higher
coding efficiency can be achieved. The H.264/AVC or SVC solution
to indicate the random accessibility of a non-IDR intra layer
representation is using a recovery point SEI message. The A bit
offers direct access to this information, without having to parse
the recovery point SEI message, which may be buried deeply in an
SEI NAL unit. Furthermore, the SEI message may or may not be
present in the bitstream.
o The P bit MUST be set to 1 if all the remaining NAL units in the
aggregation packet have redundant_pic_cnt greater than 0 (when the
PACSI NAL unit is included in an aggregation packet) or the next
non-PACSI NAL unit in transmission order has redundant_pic_cnt
greater than 0 (when the PACSI NAL unit is included in a single
NAL unit packet). Otherwise, the P bit MUST be set to 0.
Informative note: The P bit indicates whether a packet can be
discarded because it contains only redundant slice NAL units.
Without this bit, the corresponding information can be obtained
from the syntax element redundant_pic_cnt, which is contained in
the variable-length coded slice header.
o The C bit MUST be set to 1 if at least one of the remaining NAL
units in the aggregation packet belongs to an intra layer
representation (when the PACSI NAL unit is included in an
aggregation packet) or if the next non-PACSI NAL unit in
transmission order belongs to an intra layer representation (when
the PACSI NAL unit is included in a single NAL unit packet).
Otherwise, the C bit MUST be set to 0.
Informative note: The C bit indicates whether a packet contains
intra slices, which may be the only packets to be forwarded, e.g.,
when the network conditions are particularly adverse.
o The S bit MUST be set to 1, if the first NAL unit following the
PACSI NAL unit in an aggregation packet is the first VCL NAL unit,
in decoding order, of a layer representation (when the PACSI NAL
unit is included in an aggregation packet) or if the next non-
PACSI NAL unit in transmission order is the first VCL NAL unit, in
decoding order, of a layer representation(when the PACSI NAL unit
is included in a single NAL unit packet). Otherwise, the S bit
MUST be set to 0.
o The E bit MUST be set to 1, if the last NAL unit following the
PACSI NAL unit in an aggregation packet is the last VCL NAL unit,
in decoding order, of a layer representation (when the PACSI NAL
unit is included in an aggregation packet) or if the next non-
PACSI NAL unit in transmission order is the last VCL NAL unit, in
decoding order, of a layer representation (when the PACSI NAL unit
is included in a single NAL unit packet). Otherwise, the E bit
MUST be set to 0.
Informative note: In an aggregation packet it is always possible
to detect the beginning or end of a layer representation by
detecting changes in the values of dependency_id, quality_id, and
temporal_id in NAL unit headers, except from the first and last
NAL units of a packet. The S or E bits are used to provide this
information, for both single NAL unit and aggregation packets, so
that previous or following packets do not have to be examined.
This enables MANEs to detect slice loss and take proper action
such as requesting a retransmission as soon as possible, as well
as to allow efficient playout buffer handling similarly to the M
bit present in the RTP header. The M bit in the RTP header still
indicates the end of an access unit, not the end of a layer
representation.
o When present, the TL0PICIDX field MUST be set to equal to
tl0_dep_rep_idx as specified in Annex G of [H.264] for the layer
representation containing the first NAL unit following the PACSI
NAL unit in the aggregation packet (when the PACSI NAL unit is
included in an aggregation packet) or containing the next non-
PACSI NAL unit in transmission order (when the PACSI NAL unit is
included in a single NAL unit packet).
o When present, the IDRPICID field MUST be set to equal to
effective_idr_pic_id as specified in Annex G of [H.264] for the
layer representation containing the first NAL unit following the
PACSI NAL unit in the aggregation packet (when the PACSI NAL unit
is included in an aggregation packet) or containing the next non-
PACSI NAL unit in transmission order (when the PACSI NAL unit is
included in a single NAL unit packet).
Informative note: The TL0PICIDX and IDRPICID fields enable the
detection of the loss of layer representations in the most
important temporal layer (with temporal_id equal to 0) by
receivers as well as MANEs. SVC provides a solution that uses SEI
messages, which are harder to parse and may or may not be present
in the bitstream. When the PACSI NAL unit is part of an NI-MTAP
packet, it is possible to infer the correct values of
tl0_dep_rep_idx and idr_pic_id for all layer representations
contained in the NI-MTAP by following the rules that specify how
these parameters are set as given in Annex G of [H.264] and by
detecting the different layer representations contained in the NI-
MTAP packet by detecting changes in the values of dependency_id_,
quality_id, and temporal_id in the NAL unit headers as well as
using the S and E flags. The only exception is if NAL units of an
IDR picture are present in the NI-MTAP in a position other than
the first NAL unit following the PACSI NAL unit, in which case the
value of idr_pic_id cannot be inferred. In this case the NAL unit
has to be partially parsed to obtain the idr_pic_id. Note that,
due to the large size of IDR pictures, their inclusion in an NI-
MTAP, and especially in a position other than the first NAL unit
following the PACSI NAL unit, may be neither practical nor useful.
o When present, the field DONC indicates the cross-session decoding
order number (CS-DON) for the first of the remaining NAL units in
the aggregation packet (when the PACSI NAL unit is included in an
aggregation packet) or the CS-DON of the next non-PACSI NAL unit
in transmission order (when the PACSI NAL unit is included in a
single NAL unit packet). CS-DON is further discussed in Section
4.11.
The PACSI NAL unit MAY include a subset of the SEI NAL units
associated with the access unit to which the first non-PACSI NAL unit
in the aggregation packet belongs, and MUST NOT contain SEI NAL units
associated with any other access unit.
Informative note: In H.264/AVC and SVC, within each access unit,
SEI NAL units must appear before any VCL NAL unit in decoding
order. Therefore, without using PACSI NAL units, SEI messages are
typically only conveyed in the first of the packets carrying an
access unit. Senders may repeat SEI NAL units in PACSI NAL units,
so that they are repeated in more than one packet and thus
increase robustness against packet losses. Receivers may use the
repeated SEI messages in place of missing SEI messages.
For a PACSI NAL unit included in an aggregation packet, an SEI
message SHOULD NOT be included in the PACSI NAL unit and also
included in one of the remaining NAL units contained in the same
aggregation packet.
4.10. Empty NAL unit
An empty NAL unit MAY be included in a single NAL unit packet, an
STAP-A or an NI-MTAP packet. Empty NAL units MUST have an RTP
timestamp (when transported in a single NAL unit packet) or NALU-
time (when transported in an aggregation packet) that is associated
with an access unit for which there exists at least one NAL unit of
type 1, 5, or 20. When MST is used, the type 1, 5, or 20 NAL unit
may be in a different RTP session. Empty NAL units may be used in
the decoding order recovery process of the NI-T mode as described in
Section 5.2.1.
The packet structure is shown in the following figure.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| Type | Subtype |J|K|L|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4. Empty NAL unit structure.
The fields MUST be set as follows:
F MUST be equal to 0
NRI MUST be equal to 3
Type MUST be equal to 31
Subtype MUST be equal to 1
J MUST be equal to 0
K MUST be equal to 0
L MUST be equal to 0
4.11. Decoding Order Number (DON)
The DON concept is introduced in [RFC6184] and is used to recover the
decoding order when interleaving is used within a single session.
Section 5.5 of [RFC6184] applies when using SST.
When using MST, it is necessary to recover the decoding order across
the various RTP sessions regardless if interleaving is used or not.
In addition to the timestamp mechanism described later, the CS-DON
mechanism is an extension of the DON facility that can be used for
this purpose, and is defined in the following section.
4.11.1. Cross-Session DON (CS-DON) for Multi-Session Transmission
The cross-session decoding order number (CS-DON) is a number that
indicates the decoding order of NAL units across all RTP sessions
involved in MST. It is similar to the DON concept in [RFC6184], but
contrary to [RFC6184] where the DON was used only for interleaved
packetization, in this memo it is used not only in the interleaved
MST mode (I-C) but also in two of the non-interleaved MST modes (NI-C
and NI-TC).
When the NI-C or NI-TC MST modes are in use, the packetization of
each session MUST be as specified in Section 5.2.2. In PACSI NAL
units the CS-DON value is explicitly coded in the field DONC. For
non-PACSI NAL units the CS-DON value is derived as follows. Let SN
indicate the RTP sequence number of a packet.
o For each non-PACSI NAL unit carried in a session using the single
NAL unit session packetization mode, the CS-DON value of the NAL
unit is equal to (DONC_prev_PACSI + SN_diff - 1) % 65536, wherein
"%" is the modulo operation, DONC_prev_PACSI is the DONC value of
the previous PACSI NAL unit with the same NALU-time as the current
NAL unit, and SN_diff is calculated as follows:
if SN1 > SN2, SN_diff = SN1 - SN2
else SN_diff = SN2 + 65536 - SN1
where SN1 and SN2 are the SNs of the current NAL unit and the
previous PACSI NAL unit with the same NALU-time, respectively.
o For non-PACSI NAL units carried in a session using the non-
interleaved session packetization mode, the CS-DON value of each
non-PACSI NAL unit is derived as follows.
For a non-PACSI NAL unit in a single NAL unit packet, the
following applies.
If the previous PACSI NAL unit is contained in a single NAL
unit packet, the CS-DON value of the NAL unit is calculated
as above;
otherwise (the previous PACSI NAL unit is contained in an
STAP-A packet), the CS-DON value of the NAL unit is
calculated as above, with DONC_prev_PACSI being replaced by
the CS-DON value of the previous non-PACSI NAL unit in
decoding order (i.e., the CS-DON value of the last NAL unit
of the STAP-A packet).
For a non-PACSI NAL unit in an STAP-A packet, the following
applies.
If the non-PACSI NAL unit is the first non-PACSI NAL unit in
the STAP-A packet, the CS-DON value of the NAL unit is equal
to DONC of the PACSI NAL unit in the STAP-A packet;
otherwise (the non-PACSI NAL unit is not the first non-
PACSI NAL unit in the STAP-A packet), the CS-DON value of
the NAL unit is equal to: (the CS-DON value of the previous
non-PACSI NAL unit in decoding order + 1) % 65536, wherein
"%" is the modulo operation.
For a non-PACSI NAL unit in a number of FU-A packets, the CS-
DON value of the NAL unit is calculated the same way as when
the single NAL unit session packetization mode is in use, with
SN1 being the SN value of the first FU-A packet.
For a non-PACSI NAL unit in an NI-MTAP packet, the CS-DON value
is equal to the value of the DON field of the non-interleaved
multi-time aggregation unit.
When the I-C MST packetization mode is in use, the DON values derived
according to [RFC6184] for all the NAL units in each of the RTP
sessions MUST indicate CS-DON values.
5. Packetization Rules
Section 6 of [RFC6184] applies in this memo, with the following
additions.
5.1. Packetization Rules for Single-Session Transmission
All receivers MUST support the single NAL unit packetization mode to
provide backward compatibility to endpoints supporting only the
single NAL unit mode of [RFC6184]. However, the use of single NAL
unit packetization mode (packetization-mode equal to 0) SHOULD be
avoided whenever possible, because encapsulating NAL units of small
sizes in their own packets (e.g., small NAL units containing
parameter sets, prefix NAL units, or SEI messages) is less efficient
due to the packet header overhead.
All receivers MUST support the non-interleaved mode.
Informative note: The non-interleaved mode of [RFC6184] does allow
an application to encapsulate a single NAL unit in a single RTP
packet. Historically, the single NAL unit mode has been included
in [RFC6184] only for compatibility with ITU-T Rec. H.241 Annex A
[H.241]. There is no point in carrying this historic ballast
towards a new application space such as the one provided with SVC.
The implementation complexity increase for supporting the
additional mechanisms of the non-interleaved mode (namely, STAP-A
and FU-A) is minor, whereas the benefits are significant. As a
result, the support of STAP-A and FU-A is required. Additionally,
support for two of the three NAL unit types defined in this memo,
namely, empty NAL units and NI-MTAP is needed, as specified in
Section 4.5.1.
A NAL unit of small size SHOULD be encapsulated in an aggregation
packet together with one or more other NAL units. For example, non-
VCL NAL units such as access unit delimiters, parameter sets, or SEI
NAL units are typically small.
A prefix NAL unit and the NAL unit with which it is associated, and
which follows the prefix NAL unit in decoding order, SHOULD be
included in the same aggregation packet whenever an aggregation
packet is used for the associated NAL unit, unless this would violate
session MTU constraints or if fragmentation units are used for the
associated NAL unit.
Informative note: Although the prefix NAL unit is ignored by an
H.264/AVC decoder, it is necessary in the SVC decoding process.
Given the small size of the prefix NAL unit, it is best if it is
transported in the same RTP packet as its associated NAL unit.
When only an H.264/AVC compatible subset of the SVC base layer is
transmitted in an RTP session, the subset MUST be encapsulated
according to [RFC6184]. This way, an [RFC6184] receiver will be able
to receive the H.264/AVC compatible bitstream subset.
When a set of layers including one or more SVC enhancement layers is
transmitted in an RTP session, the set SHOULD be carried in one RTP
stream that SHOULD be encapsulated according to this memo.
5.2. Packetization Rules for Multi-Session Transmission
When MST is used, the packetization rules specified in Section 5.1
still apply. In addition, the following packetization rules MUST be
followed, to ensure that decoding order of NAL units carried in the
sessions can be correctly recovered for each of the MST packetization
modes using the de-packetization process specified in Section 6.2.
The NI-T and NI-TC modes both use timestamps to recover the decoding
order. In order to be able to do so, it is necessary for the RTP
packet stream to contain data for all sampling instances of a given
RTP session in all enhancement RTP sessions that depend on the given
RTP session. The NI-C and I-C modes do not have this limitation, and
use the CS-DON values as a means to explicitly indicate decoding
order, either directly coded in PACSI NAL units, or inferred from
them using the packetization rules. It is noted that the NI-TC mode
offers both alternatives and it is up to the receiver to select which
one to use.
5.2.1. NI-T/NI-TC Packetization Rules
When using the NI-T mode and a PACSI NAL unit is present, the T bit
MUST be equal to 0, i.e., the DONC field MUST NOT be present.
When using the NI-T mode, the optional parameters sprop-mst-remux-
buf-size, sprop-remux-buf-req, remux-buf-cap, sprop-remux-init-buf-
time, sprop-mst-max-don-diff MUST NOT be present.
When the NI-T or NI-TC MST mode is in use, the following applies.
If one or more NAL units of an access unit of sampling time instance
t is present in RTP session A, then one or more NAL units of the same
access unit MUST be present in any enhancement RTP session that
depends on RTP session A.
Informative note: The mapping between RTP and NTP format
timestamps is conveyed in RTCP SR packets. In addition, the
mechanisms for faster media timestamp synchronization discussed in
[RFC6051] may be used to speed up the acquisition of the RTP-to-
wall-clock mapping.
Informative note: The rule above may require the insertion of NAL
units, typically when temporal scalability is used, i.e., an
enhancement RTP session does not contain any NAL units for an
access unit with a particular NTP timestamp (media timestamp),
which, however, is present in a lower enhancement RTP session or
the base RTP session. There are two ways to insert additional NAL
units in order to satisfy this rule:
- One option for adding additional NAL units is to use empty NAL
units (defined in Section 4.10), which can be used by the
process described in Section 6.2.1 for the access unit
reordering process.
- Additional NAL units may also be added by the encoder itself,
for example, by transmitting coded data that simply instruct the
decoder to repeat the previous picture. This option, however,
may be difficult to use with pre-encoded content.
If a packet must be inserted in order to satisfy the above rule,
e.g., in case of a MANE generating multiple RTP streams out of a
single RTP stream, the inserted packet must have an RTP timestamp
that maps to the same wall-clock time (in NTP format) as the one of
the RTP timestamp of any packet of the access unit present in any
lower enhancement RTP session or the base RTP session. This is easy
to accomplish if the NAL unit or the packet can be inserted at the
time of the RTP stream generation, since the media timestamp (NTP
timestamp) must be the same for the inserted packet and the packet of
the corresponding access unit. If there is no knowledge of the media
time at RTP stream generation or if the RTP streams are not generated
at the same instance, this can be also applied later in the
transmission process. In this case the NTP timestamp of the inserted
packet can be calculated as follows.
Assume that a packet A2 of an access unit with RTP timestamp TS_A2 is
present in base RTP session A, and that no packet of that access unit
is present in enhancement RTP session B, as shown in Figure 5. Thus,
a packet B2 must be inserted into session B following the rule above.
The most recent RTCP sender report in session A carries NTP timestamp
NTP_A and the RTP timestamp TS_A. The sender report in session B
with a lower NTP timestamp than NTP_A is NTP_B, and carries the RTP
timestamp TS_B.
RTP session B:..B0........B1........(B2)......................
RTCP session B:.....SR(NTP_B,TS_B).............................
RTP session A:..A0........A1........A2........................
RTCP session A:..................SR(NTP_A,TS_A)................
-----------------|--x------|-----x---|------------------------>
NTP time
--------------------+<---------->+<->+------------------------>
t1 t2 RTP TS(B) time
Figure 5. Example calculation of RTP timestamp for packet
insertion in an enhancement layer RTP session
The vertical bars ("|")in the NTP time line in the figure above
indicate that access unit data is present in at least one of the
sessions. The "x" marks indicate the times of the sender reports.
The RTP timestamp time line for session B, shown right below the NTP
time line, indicates two time segments, t1 and t2. t1 is the time
difference between the sender reports between the two sessions,
expressed in RTP timestamp clock ticks, and t2 is the time difference
from the session A sender report to the A2 packet, again expressed in
RTP timestamp clock ticks. The sum of these differences is added to
the RTP timestamp of the session report from session B in order to
derive the correct RTP timestamp for the inserted packet B2. In
other words:
TS_B2 = TS_B + t1 + t2
Let toRTP() be a function that calculates the RTP time difference (in
clock ticks of the used clock) given an NTP timestamp difference, and
effRTPdiff() be a function that calculates the effective difference
between two timestamps, including wraparounds:
effRTPdiff( ts1, ts2 ):
if( ts1 <= ts2 ) then
effRTPdiff := ts1-ts2
else
effRTPDiff := (4294967296 + ts2) - ts1
We have:
t1 = toRTP(NTP_A - NTP_B) and t2 = effRTPdiff(TS_A2, TS_A)
Hence in order to generate the RTP timestamp TS_B2 for the inserted
packet B2, the RTP timestamp for packet B2 TS_B2 can be calculated as
follows.
TS_B2 = TS_B + toRTP(NTP_A - NTP_B) + effRTPdiff(TS_A2, TS_A)
5.2.2. NI-C/NI-TC Packetization Rules
When the NI-C or NI-TC MST mode is in use, the following applies for
each of the RTP sessions.
o For each single NAL unit packet containing a non-PACSI NAL unit,
the previous packet, if present, MUST have the same RTP timestamp
as the single NAL unit packet, and the following applies.
o If the NALU-time of the non-PACSI NAL unit is not equal to the
NALU-time of the previous non-PACSI NAL unit in decoding order,
the previous packet MUST contain a PACSI NAL unit containing
the DONC field.
o In an STAP-A packet the first NAL unit in the STAP-A packet MUST
be a PACSI NAL unit containing the DONC field.
o For an FU-A packet the previous packet MUST have the same RTP
timestamp as the FU-A packet, and the following applies.
o If the FU-A packet is the start of the fragmented NAL unit, the
following applies.
o If the NALU-time of the fragmented NAL unit is not equal to
the NALU-time of the previous non-PACSI NAL unit in decoding
order, the previous packet MUST contain a PACSI NAL unit
containing the DONC field;
o Otherwise, (the NALU-time of the fragmented NAL unit is
equal to the NALU-time of the previous non-PACSI NAL unit in
decoding order), the previous packet MAY contain a PACSI NAL
unit containing the DONC field.
o Otherwise, if the FU-A packet is the end of the fragmented NAL
unit, the following applies.
o If the next non-PACSI NAL unit in decoding order has NALU-
time equal to the NALU-time of the fragmented NAL unit, and
is carried in a number of FU-A packets or a single NAL unit
packet, the next packet MUST be a single NAL unit packet
containing a PACSI NAL unit containing the DONC field.
o Otherwise (the FU-A packet is neither the start nor the end
of the fragmented NAL unit), the previous packet MUST be a
FU-A packet.
o For each single NAL unit packet containing a PACSI NAL unit, if
present, the PACSI NAL unit MUST contain the DONC field.
o When the optional media type parameter sprop-mst-csdon-always-
present is equal to 1, the session packetization mode in use MUST
be the non-interleaved mode, and only STAP-A and NI-MTAP packets
can be used.
5.2.3. I-C Packetization Rules
When the I-C MST packetization mode is in use, the following applies.
o When a PACSI NAL unit is present, the T bit MUST be equal to 0,
i.e., the DONC field is not present, and the Y bit MUST be equal
to 0, i.e., the TL0PICIDX and IDRPICID are not present.
5.2.4. Packetization Rules for Non-VCL NAL Units
NAL units that do not directly encode video slices are known in H.264
as non-VCL NAL units. Non-VCL units that are only used by, or only
relevant to, enhancement RTP sessions SHOULD be sent in the lowest
session to which they are relevant.
Some senders, however, such as those sending pre-encoded data, may be
unable to easily determine which non-VCL units are relevant to which
session. Thus, non-VCL NAL units MAY, instead, be sent in a session
on which the session using these non-VCL NAL units depends (e.g., the
base RTP session).
If a non-VCL unit is relevant to more than one RTP session, neither
of which depends on the other(s), the NAL unit MAY be sent in another
session on which all these sessions depend.
5.2.5. Packetization Rules for Prefix NAL Units
Section 5.1 of this memo applies, with the following addition. If
the base layer is sent in a base RTP session using [RFC6184], prefix
NAL units MAY be sent in the lowest enhancement RTP session rather
than in the base RTP session.