RFC 8088

How to Write an RTP Payload Format

Pages: 65
Informational
Updates: 2736

Part 3 of 5 – Pages 31 to 47

RFC8088 - Page 31 prevText

5.  Designing Payload Formats

   The best summary of payload format design is KISS (Keep It Simple,
   Stupid).  A simple payload format is easier to review for
   correctness, easier to implement, and has low complexity.
   Unfortunately, contradictory requirements sometimes make it hard to
   do things simply.  Complexity issues and problems that occur for RTP
   payload formats are:

   Too many configurations:  Contradictory requirements lead to the
      result that one configuration is created for each conceivable
      case.  Such contradictory requirements are often between
      functionality and bandwidth.  This outcome has two big
      disadvantages; First all configurations need to be implemented.
      Second, the user application must select the most suitable
      configuration.  Selecting the best configuration can be very
      difficult and, in negotiating applications, this can create
      interoperability problems.  The recommendation is to try to select

RFC8088 - Page 32

      a very limited set of configurations (preferably one) that perform
      well for the most common cases and are capable of handling the
      other cases, but maybe not that well.

   Hard to implement:  Certain payload formats may become difficult to
      implement both correctly and efficiently.  This needs to be
      considered in the design.

   Interaction with general mechanisms:  Special solutions may create
      issues with deployed tools for RTP, such as tools for more robust
      transport of RTP.  For example, a requirement for an unbroken
      sequence number space creates issues for mechanisms relying on
      payload type switching interleaving media-independent resilience
      within a stream.

5.1.  Features of RTP Payload Formats

   There are a number of common features in RTP payload formats.  There
   is no general requirement to support these features; instead, their
   applicability must be considered for each payload format.  In fact,
   it may be that certain features are not even applicable.

5.1.1.  Aggregation

   Aggregation allows for the inclusion of multiple Application Data
   Units (ADUs) within the same RTP payload.  This is commonly supported
   for codecs that produce ADUs of sizes smaller than the IP MTU.  One
   reason for the use of aggregation is the reduction of header overhead
   (IP/UDP/RTP headers).  When setting into relation the ADU size and
   the MTU size, do remember that the MTU may be significantly larger
   than 1500 bytes.  An MTU of 9000 bytes is available today and an MTU
   of 64k may be available in the future.  Many speech codecs have the
   property of ADUs of a few fixed sizes.  Video encoders may generally
   produce ADUs of quite flexible sizes.  Thus, the need for aggregation
   may be less.  But some codecs produce small ADUs mixed with large
   ones, for example, H.264 Supplemental Enhancement Information (SEI)
   messages.  Sending individual SEI message in separate packets are not
   efficient compared to combing the with other ADUs.  Also, some small
   ADUs are, within the media domain, semantically coupled to the larger
   ADUs (for example, in-band parameter sets in H.264 [RFC6184]).  In
   such cases, aggregation is sensible, even if not required from a
   payload/header overhead viewpoint.  There also exist cases when the
   ADUs are pre-produced and can't be adopted to a specific networks
   MTU.  Instead, their packetization needs to be adopted to the
   network.  All above factors should be taken into account when
   deciding on the inclusion of aggregation, and weighting its benefits

RFC8088 - Page 33

   against the complexity of defining them (which can be significant
   especially when aggregation is performed over ADUs with different
   playback times).

   The main disadvantage of aggregation, beyond implementation
   complexity, is the extra delay introduced (due to buffering until a
   sufficient number of ADUs have been collected at the sender) and
   reduced robustness against packet loss.  Aggregation also introduces
   buffering requirements at the receiver.

5.1.2.  Fragmentation

   If the real-time media format has the property that it may produce
   ADUs that are larger than common MTU sizes, then fragmentation
   support should be considered.  An RTP payload format may always fall
   back on IP fragmentation; however, as discussed in RFC 2736, this has
   some drawbacks.  Perhaps the most important reason to avoid IP
   fragmentation is that IP fragmented packets commonly are discarded in
   the network, especially by NATs or firewalls.  The usage of
   fragmentation at the RTP payload format level allows for more
   efficient usage of RTP packet loss recovery mechanisms.  It may also
   in some cases also allow better usage of partial ADUs by doing media
   specific fragmentation at media-specific boundaries.  In use cases
   where the ADUs are pre-produced and can't be adopted to the network's
   MTU size, support for fragmentation can be crucial.

5.1.3.  Interleaving and Transmission Rescheduling

   Interleaving has been implemented in a number of payload formats to
   allow for less quality reduction when packet loss occurs.  When
   losses are bursty and several consecutive packets are lost, the
   impact on quality can be quite severe.  Interleaving is used to
   convert that burst loss to several spread-out individual packet
   losses.  It can also be used when several ADUs are aggregated in the
   same packets.  A loss of an RTP packet with several ADUs in the
   payload has the same effect as a burst loss if the ADUs would have
   been transmitted in individual packets.  To reduce the burstiness of
   the loss, the data present in an aggregated payload may be
   interleaved, thus, spreading the loss over a longer time period.

   A requirement for doing interleaving within an RTP payload format is
   the aggregation of multiple ADUs.  For formats that do not use
   aggregation, there is still a possibility of implementing a
   transmission order rescheduling mechanism.  That has the effect that
   the packets transmitted consecutively originate from different points
   in the RTP stream.  This can be used to mitigate burst losses, which
   may be useful if one transmits packets at frequent intervals.
   However, it may also be used to transmit more significant data

RFC8088 - Page 34

   earlier in combination with RTP retransmission to allow for more
   graceful degradation and increased possibility to receive the most
   important data, e.g., intra frames of video.

   The drawback of interleaving is the significantly increased
   transmission buffering delay, making it less useful for low-delay
   applications.  It may also create significant buffering requirements
   on the receiver.  That buffering is also problematic, as it is
   usually difficult to indicate when a receiver may start consume data
   and still avoid buffer under run caused by the interleaving mechanism
   itself.  Transmission rescheduling is only useful in a few specific
   cases, as in streaming with retransmissions.  The potential gains
   must be weighed against the complexity of these schemes.

5.1.4.  Media Back Channels

   A few RTP payload formats have implemented back channels within the
   media format.  Those have been for specific features, like the AMR
   [RFC4867] codec mode request (CMR) field.  The CMR field is used in
   the operation of gateways to circuit-switched voice to allow an IP
   terminal to react to the circuit-switched network's need for a
   specific encoder mode.  A common motivation for media back channels
   is the need to have signaling in direct relation to the media or the
   media path.

   If back channels are considered for an RTP payload format they should
   be for a specific requirements which cannot be easily satisfied by
   more generic mechanisms within RTP or RTCP.

5.1.5.  Media Scalability

   Some codecs support various types of media scalability, i.e. some
   data of a RTP stream may be removed to adapt the media's properties,
   such as bitrate and quality.  The adaptation may be applied in the
   following dimensions of the media:

   Temporal:  For most video codecs it is possible to adapt the frame
      rate without any specific definition of a temporal scalability
      mode, e.g., for H.264 [RFC6184].  In these cases, the sender
      changes which frames it delivers and the RTP timestamp makes it
      clear the frame interval and each frames relative capture time.
      H.264 Scalable Video Coding (SVC) [RFC6190] has more explicit
      support for temporal scalability.

   Spatial:  Video codecs supporting scalability may adapt the
      resolution, e.g., in SVC [RFC6190].

RFC8088 - Page 35

   Quality:  The quality of the encoded stream may be scaled by adapting
      the accuracy of the coding process, as, e.g.  possible with Signal
      to Noise Ratio (SNR) fidelity scalability of SVC [RFC6190].

   At the time of writing this document, codecs that support scalability
   have a bit of a revival.  It has been realized that getting the
   required functionality for supporting the features of the media
   stream into the RTP framework is quite challenging.  One of the
   recent examples for layered and scalable codecs is SVC [RFC6190].

   SVC is a good example for a payload format supporting media
   scalability features, which have been in its basic form already
   included in RTP.  A layered codec supports the dropping of data parts
   of a RTP stream, i.e., RTP packets may not be transmitted or
   forwarded to a client in order to adapt the RTP streams bitrate as
   well as the received encoded stream's quality, while still providing
   a decodable subset of the encoded stream to a client.  One example
   for using the scalability feature may be an RTP Mixer (Multipoint
   Control Unit) [RFC7667], which controls the rate and quality sent out
   to participants in a communication based on dropping RTP packets or
   removing part of the payload.  Another example may be a transport
   channel, which allows for differentiation in Quality of Service (QoS)
   parameters based on RTP sessions in a multicast session.  In such a
   case, the more important packets of the scalable encoded stream (base
   layer) may get better QoS parameters than the less important packets
   (enhancement layer) in order to provide some kind of graceful
   degradation.  The scalability features required for allowing an
   adaptive transport, as described in the two examples above, are based
   on RTP multiplexing in order to identify the packets to be dropped or
   transmitted/forwarded.  The multiplexing features defined for
   Scalable Video Coding [RFC6190] are:

      Single Session Transmission (SST), where all media layers of the
      media are transported as a single synchronization source (SSRC) in
      a single RTP session; as well as

      Multi-Session Transmission (MST), which should more accurately be
      called multi-stream transmission, where different media layers or
      a set of media layers are transported in different RTP streams,
      i.e., using multiple sources (SSRCs).

   In the first case (SST), additional in-band as well as out-of-band
   signaling is required in order to allow identification of packets
   belonging to a specific media layer.  Furthermore, an adaptation of
   the encoded stream requires dropping of specific packets in order to
   provide the client with a compliant encoded stream.  In case of using
   encryption, it is typically required for an adapting network device

RFC8088 - Page 36

   to be in the security context to allow packet dropping and providing
   an intact RTP session to the client.  This typically requires the
   network device to be an RTP mixer.

   In general, having a media-unaware network device dropping excessive
   packets will be more problematic than having a Media-Aware Network
   Entity (MANE).  First is the need to understand the media format and
   know which ADUs or payloads belong to the layers, that no other layer
   will be dependent on after the dropping.  Second, if the MANE can
   work as an RTP mixer or translator, it can rewrite the RTP and RTCP
   in such a way that the receiver will not suspect unintentional RTP
   packet losses needing repair actions.  This as the receiver can't
   determine if a lost packet was an important base layer packet or one
   of the less important extension layers.

   In the second case (MST), the RTP packet streams can be sent using a
   single or multiple RTP session, and thus transport flows, e.g., on
   different multicast groups.  Transmitting the streams in different
   RTP sessions, then the out-of-band signaling typically provides
   enough information to identify the media layers and its properties.
   The decision on dropping packets is based on the Network Address that
   identifies the RTP session to be dropped.  In order to allow correct
   data provisioning to a decoder after reception from different
   sessions, data realignment mechanisms are required.  In some cases,
   existing generic tools, as described below, can be employed to enable
   such realignment; when those generic mechanisms are sufficient, they
   should be used.  For example, "Rapid Synchronisation for RTP Flows"
   [RFC6051], uses existing RTP mechanisms, i.e. the NTP timestamp, to
   ensure timely inter-session synchronization.  Another is the
   signaling feature for indicating dependencies of RTP sessions in SDP,
   as defined in the Media Decoding Dependency Grouping in SDP
   [RFC5583].

   Using MST within a single RTP session is also possible and allows
   stream level handling instead of looking deeper into the packets by a
   MANE.  However, transport flow-level properties will be the same
   unless packet based mechanisms like Diffserv is used.

   When QoS settings, e.g., Diffserv markings, are used to ensure that
   the extension layers are dropped prior the base layer the receiving
   endpoint has the benefit in MST to know which layer or set of layers
   the missing packets belong to as it will be bound to different RTP
   sessions or RTP packet streams (SSRCs), thus, explicitly indicating
   the importance of the loss.

RFC8088 - Page 37

5.1.6.  High Packet Rates

   Some media codecs require high packet rates; in these cases, the RTP
   sequence number wraps too quickly.  As a rule of thumb, it must not
   be possible to wrap the sequence number space within at least three
   RTCP reporting intervals.  As the reporting interval can vary widely
   due to configuration and session properties, and also must take into
   account the randomization of the interval, one can use the TCP
   maximum segment lifetime (MSL), i.e., 2 minutes, in ones
   consideration.  If earlier wrapping may occur, then the payload
   format should specify an extended sequence number field to allow the
   receiver to determine where a specific payload belongs in the
   sequence, even in the face of extensive reordering.  The RTP payload
   format for uncompressed video [RFC4175] can be used as an example for
   such a field.

   RTCP is also affected by high packet rates.  For RTCP mechanisms that
   do not use extended counters, there is significant risk that they
   wrap multiple times between RTCP reporting or feedback; thus,
   producing uncertainty about which packet(s) are referenced.  The
   payload designer can't effect the RTCP packet formats used and their
   design, but can note this considerations when configuring RTCP
   bandwidth and reporting intervals to avoid to wrapping issues.

5.2.  Selecting Timestamp Definition

   The RTP timestamp is an important part and has two design choices
   associated with it.  The first is the definition that determines what
   the timestamp value in a particular RTP packet will be, the second is
   which timestamp rate should be used.

   The timestamp definition needs to explicitly define what the
   timestamp value in the RTP packet represent for a particular payload
   format.  Two common definitions are used; for discretely sampled
   media, like video frames, the sampling time of the earliest included
   video frame which the data represent (fully or partially) is used;
   for continuous media like audio, the sampling time of the earliest
   sample which the payload data represent.  There exist cases where
   more elaborate or other definitions are used.

   RTP payload formats with a timestamp definition that results in no or
   little correlation between the media time instance and its
   transmission time cause the RTCP jitter calculation to become
   unusable due to the errors introduced on the sender side.  A common
   example is a payload format for a video codec where the RTP timestamp
   represents the capture time of the video frame, but frames are large

RFC8088 - Page 38

   enough that multiple RTP packets need to be sent for each frame
   spread across the framing interval.  It should be noted whether or
   not the payload format has this property.

   An RTP payload format also needs to define what timestamp rates, or
   clock rates (as it is also called), may be used.  Depending on the
   RTP payload format, this may be a single rate or multiple ones or
   theoretically any rate.  So what needs to be considered when
   selecting a rate?

   The rate needs be selected so that one can determine where in the
   time line of the media a particular sample (e.g., individual audio
   sample, or video frame) or set of samples (e.g., audio frames)
   belong.  To enable correct synchronization of this data with previous
   frames, including over periods of discontinuous transmission or
   irregularities.

   For audio, it is common to require audio sample accuracy.  Thus, one
   commonly selects the input sampling rate as the timestamp rate.  This
   can, however, be challenging for audio codecs that support multiple
   different sampling frequencies, either as codec input or being used
   internally but effecting output, for example, frame duration.
   Depending on how one expects to use these different sampling rates
   one can allow multiple timestamp rates, each matching a particular
   codec input or sampling rate.  However, due to the issues with using
   multiple different RTP timestamp rates for the same source (SSRC)
   [RFC7160], this should be avoided if one expects to need to switch
   between modes.

   Then, an alternative is to find a common denominator frequency
   between the different modes, e.g., OPUS [RFC7587] that uses 48 kHz.
   If the different modes uses or can use a common input/output
   frequency, then selecting this also needs to be considered.  However,
   it is important to consider all aspects as the case of AMR-WB+
   [RFC4352] illustrates.  AMR-WB+'s RTP timestamp rate has the very
   unusual value of 72 kHz, despite the fact that output normally is at
   a sample rate of 48kHz.  The design is motivated by the media codec's
   production of a large range of different frame lengths in time
   perspective.  The 72 kHz timestamp rate is the smallest found value
   that would make all of the frames the codec could produce result in
   an integer frame length in RTP timestamp ticks.  This way, a receiver
   can always correctly place the frames in relation to any other frame,
   even when the frame length changes.  The downside is that the decoder
   outputs for certain frame lengths are, in fact, partial samples.  The
   result is that the output in samples from the codec will vary from
   frame to frame, potentially making implementation more difficult.

RFC8088 - Page 39

   Video codecs have commonly been using 90 kHz; the reason is this is a
   common denominator between the usually used frame rates such as 24,
   25, 30, 50 and 60, and NTSC's odd 29.97 Hz.  There does, however,
   exist at least one exception in the payload format for SMPTE 292M
   video [RFC3497] that uses a clock rate of 148.5 MHz.  The reason here
   is that the timestamp then identify the exact start sample within a
   video frame.

   Timestamp rates below 1000 Hz are not appropriate, because this will
   cause a resolution too low in the RTCP measurements that are
   expressed in RTP timestamps.  This is the main reason that the text
   RTP payload formats, like T.140 [RFC4103], use 1000 Hz.

6.  Noteworthy Aspects in Payload Format Design

   This section provides a few examples of payload formats that are
   worth noting for good or bad design in general or in specific
   details.

6.1.  Audio Payloads

   The AMR [RFC4867], AMR-WB [RFC4867], EVRC [RFC3558], SMV [RFC3558]
   payload formats are all quite similar.  They are all for frame-based
   audio codecs and use a table of contents structure.  Each frame has a
   table of contents entry that indicates the type of the frame and if
   additional frames are present.  This is quite flexible, but produces
   unnecessary overhead if the ADU is of fixed size and if, when
   aggregating multiple ADUs, they are commonly of the same type.  In
   that case, a solution like the one in AMR-WB+ [RFC4352] may be more
   suitable.

   The RTP payload format for MIDI [RFC6295] contains some interesting
   features.  MIDI is an audio format sensitive to packet losses, as the
   loss of a "note off" command will result in a note being stuck in an
   "on" state.  To counter this, a recovery journal is defined that
   provides a summarized state that allows the receiver to recover from
   packet losses quickly.  It also uses RTCP and the reported highest
   sequence number to be able to prune the state the recovery journal
   needs to contain.  These features appear limited in applicability to
   media formats that are highly stateful and primarily use symbolic
   media representations.

   There exists a security concern with variable bitrate audio and
   speech codecs that changes their payload length based on the input
   data.  This can leak information, especially in structured
   communication like a speech recognition prompt service that asks
   people to enter information verbally.  This issue also exists to some
   degree for discontinuous transmission as that allows the length of

RFC8088 - Page 40

   phrases to be determined.  The issue is further discussed in
   "Guidelines for the Use of Variable Bit Rate Audio with Secure RTP"
   [RFC6562], which needs to be read by anyone writing an RTP payload
   format for an audio or speech codec with these properties.

6.2.  Video

   The definition of RTP payload formats for video has seen an evolution
   from the early ones such as H.261 [RFC4587] towards the latest for
   VP8 [RFC7741] and H.265/HEVC [RFC7798].

   The H.264 RTP payload format [RFC3984] can be seen as a smorgasbord
   of functionality: some of it, such as the interleaving, being pretty
   advanced.  The reason for this was to ensure that the majority of
   applications considered by the ITU-T and MPEG that can be supported
   by RTP are indeed supported.  This has created a payload format that
   rarely is fully implemented.  Despite that, no major issues with
   interoperability has been reported with one exception namely the
   Offer/Answer and parameter signaling, which resulted in a revised
   specification [RFC6184].  However, complaints about its complexity
   are common.

   The RTP payload format for uncompressed video [RFC4175] must be
   mentioned in this context as it contains a special feature not
   commonly seen in RTP payload formats.  Due to the high bitrate and
   thus packet rate of uncompressed video (gigabits rather than megabits
   per second) the payload format includes a field to extend the RTP
   sequence number since the normal 16-bit one can wrap in less than a
   second.  [RFC4175] also specifies a registry of different color sub-
   samplings that can be reused in other video RTP payload formats.

   Both the H.264 and the uncompressed video format enable the
   implementer to fulfill the goals of application-level framing, i.e.,
   each individual RTP Packet's payload can be independently decoded and
   its content used to create a video frame (or part of) and that
   irrespective of whether preceding packets has been lost (see
   Section 4) [RFC2736].  For uncompressed, this is straightforward as
   each pixel is independently represented from others and its location
   in the video frame known.  H.264 is more dependent on the actual
   implementation, configuration of the video encoder and usage of the
   RTP payload format.

   The common challenge with video is that, in most cases, a single
   compressed video frame doesn't fit into a single IP packet.  Thus,
   the compressed representation of a video frame needs to be split over
   multiple packets.  This can be done unintelligently with a basic
   payload level fragmentation method or more integrated by interfacing
   with the encoder's possibilities to create ADUs that are independent

RFC8088 - Page 41

   and fit the MTU for the RTP packet.  The latter is more robust and
   commonly recommended unless strong packet loss mechanisms are used
   and sufficient delay budget for the repair exist.  Commonly, both
   payload-level fragmentation as well as explaining how tailored ADUs
   can be created are needed in a video payload format.  Also, the
   handling of crucial metadata, like H.264 Parameter Sets, needs to be
   considered as decoding is not possible without receiving the used
   parameter sets.

6.3.  Text

   Only a single format text format has been standardized in the IETF,
   namely T.140 [RFC4103].  The 3GPP Timed Text format [RFC4396] should
   be considered to be text, even though in the end was registered as a
   video format.  It was registered in that part of the tree because it
   deals with decorated text, usable for subtitles and other
   embellishments of video.  However, it has many of the properties that
   text formats generally have.

   The RTP payload format for T.140 was designed with high reliability
   in mind as real-time text commonly is an extremely low bitrate
   application.  Thus, it recommends the use of RFC 2198 with many
   generations of redundancy.  However, the format failed to provide a
   text-block-specific sequence number and instead relies on the RTP one
   to detect loss.  This makes detection of missing text blocks
   unnecessarily difficult and hinders deployment with other robustness
   mechanisms that would involve switching the payload type, as that may
   result in erroneous error marking in the T.140 text stream.

6.4.  Application

   At the time of writing, the application content type contains two
   media types that aren't RTP transport robustness tools such as FEC
   [RFC3009] [RFC5109] [RFC6015] [RFC6682] and RTP retransmission
   [RFC4588].

   The first one is H.224 [RFC4573], which enables far-end camera
   control over RTP.  This is not an IETF-defined RTP format, only an
   IETF-performed registration.

   The second one is "RTP Payload Format for Society of Motion Picture
   and Television Engineers (SMPTE) ST 336 Encoded Data" [RFC6597],
   which carries generic key length value (KLV) triplets.  These pairs
   may contain arbitrary binary metadata associated with video
   transmissions.  It has a very basic fragmentation mechanism requiring
   reception without packet loss, not only of the triplet itself but
   also one packet before and after the sequence of fragmented KLV
   triplet, to ensure correct reception.  Specific KLV triplets

RFC8088 - Page 42

   themselves may have recommendations on how to handle incomplete ones
   allowing the use and repair of them.  In general, the application
   using such a mechanism must be robust to errors and also use some
   combination of application-level repetition, RTP-level transport
   robustness tools, and network-level requirements to achieve low
   levels of packet loss rates and repair of KLV triplets.

   An author should consider applying for a media subtype under the
   application media type (application/<foo>) when the payload format is
   of a generic nature or does not clearly match any of the media types
   described above (audio, video, or text).  However, existing
   limitations in, for example, SDP, have resulted in generic mechanisms
   normally registered in all media types possibly having been
   associated with any existing media types in an RTP session.

7.  Important Specification Sections

   A number of sections in the payload format draft need special
   consideration.  These include the Security Considerations and IANA
   Considerations sections that are required in all drafts.  Payload
   formats are also strongly recommended to have the media format
   description and congestion control considerations.  The included RTP
   payload format template (Appendix A) contains sample text for some of
   these sections.

7.1.  Media Format Description

   The intention of this section is to enable reviewers and other
   readers to get an overview of the capabilities and major properties
   of the media format.  It should be kept short and concise and is not
   a complete replacement for reading the media format specification.

   The actual specification of the RTP payload format generally uses
   normative references to the codec format specification to define how
   codec data elements are included in the payload format.  This
   normative reference can be to anything that have sufficient stability
   for a normative reference.  There exist no formal requirement on the
   codec format specification being publicly available or free to
   access.  However, it significantly helps in the review process if
   that specification is made available to any reviewer.  There exist
   RTP payload format RFCs for open-source project specifications as
   well as an individual company's proprietary format, and a large
   variety of standards development organizations or industrial forums.

RFC8088 - Page 43

7.2.  Security Considerations

   All Internet-Drafts require a Security Considerations section.  The
   Security Considerations section in an RTP payload format needs to
   concentrate on the security properties this particular format has.
   Some payload formats have very few specific issues or properties and
   can fully fall back on the security considerations for RTP in general
   and those of the profile being used.  Because those documents are
   always applicable, a reference to these is normally placed first in
   the Security Considerations section.  There is suggested text in the
   template below.

   The security issues of confidentiality, integrity protection, replay
   protection and source authentication are common issue for all payload
   formats.  These should be solved by mechanisms external to the
   payload and do not need any special consideration in the payload
   format except for a reminder on these issues.  There exist
   exceptions, such as payload formats that includes security
   functionality, like ISMAcrypt [ISMACrypt2].  Reasons for this
   division is further documented in "Securing the RTP Protocol
   Framework: Why RTP Does Not Mandate a Single Media Security Solution"
   [RFC7202].  For a survey of available mechanisms to meet these goals,
   review "Options for Securing RTP Sessions" [RFC7201].  This also
   includes key-exchange mechanisms for the security mechanisms, which
   can be both integrated or separate.  The choice of key-management can
   have significant impact on the security properties of the RTP-based
   application.  Suitable stock text to inform people about this is
   included in the template.

   Potential security issues with an RTP payload format and the media
   encoding that need to be considered if they are applicable:

   1.  The decoding of the payload format or its media results in
       substantial non-uniformity, either in output or in complexity to
       perform the decoding operation.  For example, a generic non-
       destructive compression algorithm may provide an output of almost
       an infinite size for a very limited input, thus consuming memory
       or storage space out of proportion with what the receiving
       application expected.  Such inputs can cause some sort of
       disruption, i.e., a denial-of-service attack on the receiver side
       by preventing that host from performing usable work.  Certain
       decoding operations may also vary in the amount of processing
       needed to perform those operations depending on the input.  This
       may also be a security risk if it is possible to raise processing
       load significantly above nominal simply by designing a malicious
       input sequence.  If such potential attacks exist, this must be

RFC8088 - Page 44

       made clear in the Security Considerations section to make
       implementers aware of the need to take precautions against such
       behavior.

   2.  The inclusion of active content in the media format or its
       transport.  "Active content" means scripts, etc., that allow an
       attacker to perform potentially arbitrary operations on the
       receiver.  Most active contents has limited possibility to access
       the system or perform operations outside a protected sandbox.
       RFC 4855 [RFC4855] has a requirement that it be noted in the
       media types registration whether or not the payload format
       contains active content.  If the payload format has active
       content, it is strongly recommended that references to any
       security model applicable for such content are provided.  A
       boilerplate text for "no active content" is included in the
       template.  This must be changed if the format actually carries
       active content.

   3.  Some media formats allow for the carrying of "user data", or
       types of data which are not known at the time of the
       specification of the payload format.  Such data may be a security
       risk and should be mentioned.

   4.  Audio or Speech codecs supporting variable bitrate based on
       'audio/speech' input or having discontinuous transmission support
       must consider the issues discussed in "Guidelines for the Use of
       Variable Bit Rate Audio with Secure RTP" [RFC6562].

   Suitable stock text for the Security Considerations section is
   provided in the template in Appendix A.  However, authors do need to
   actively consider any security issues from the start.  Failure to
   address these issues may block approval and publication.

7.3.  Congestion Control

   RTP and its profiles do discuss congestion control.  There is ongoing
   work in the IETF with both a basic circuit-breaker mechanism
   [RFC8083] using basic RTCP messages intended to prevent persistent
   congestion and also work on more capable congestion avoidance /
   bitrate adaptation mechanism in the RMCAT WG.

   Congestion control is an important issue in any usage in networks
   that are not dedicated.  For that reason, it is recommended that all
   RTP payload format documents discuss the possibilities that exist to
   regulate the bitrate of the transmissions using the described RTP
   payload format.  Some formats may have limited or step-wise
   regulation of bitrate.  Such limiting factors should be discussed.

RFC8088 - Page 45

7.4.  IANA Considerations

   Since all RTP payload formats contain a media type specification,
   they also need an IANA Considerations section.  The media type name
   must be registered, and this is done by requesting that IANA register
   that media name.  When that registration request is written, it shall
   also be requested that the media type is included under the "RTP
   Payload Format media types" subregistry of the RTP registry
   (http://www.iana.org/assignments/rtp-parameters).

   Parameters for the payload format need to be included in this
   registration and can be specified as required or optional ones.  The
   format of these parameters should be such that they can be included
   in the SDP attribute "a=fmtp" string (see Section 6 [RFC4566]), which
   is the common mapping.  Some parameters, such as "Channel" are
   normally mapped to the rtpmap attribute instead; see Section 3 of
   [RFC4855].

   In addition to the above request for media type registration, some
   payload formats may have parameters where, in the future, new
   parameter values need to be added.  In these cases, a registry for
   that parameter must be created.  This is done by defining the
   registry in the IANA Considerations section.  BCP 26 [BCP26] provides
   guidelines to specifying such registries.  Care should be taken when
   defining the policy for new registrations.

   Before specifying a new registry, it is worth checking the existing
   ones in the IANA "MIME Media Type Sub-Parameter Registries".  For
   example, video formats that need a media parameter expressing color
   sub-sampling may be able to reuse those defined for 'video/raw'
   [RFC4175].

8.  Authoring Tools

   This section provides information about some tools that may be used.
   Don't feel pressured to follow these recommendations.  There exist a
   number of alternatives, including the ones listed at
   <http://tools.ietf.org>.  But these suggestions are worth checking
   out before deciding that the grass is greener somewhere else.

   Note that these options are related to the old text only RFC format,
   and do not cover tools for at the time of publication recently
   approved new RFC format, see [RFC7990].

RFC8088 - Page 46

8.1.  Editing Tools

   There are many choices when it comes to tools to choose for authoring
   Internet-Drafts.  However, in the end, they need to be able to
   produce a draft that conforms to the Internet-Draft requirements.  If
   you don't have any previous experience with authoring Internet-
   Drafts, xml2rfc does have some advantages.  It helps by creating a
   lot of the necessary boilerplate in accordance with the latest rules,
   thus reducing the effort.  It also speeds up publication after
   approval as the RFC Editor can use the source XML document to produce
   the RFC more quickly.

   Another common choice is to use Microsoft Word and a suitable
   template (see [RFC5385]) to produce the draft and print that to file
   using the generic text printer.  It has some advantages when it comes
   to spell checking and change bars.  However, Word may also produce
   some problems, like changing formatting, and inconsistent results
   between what one sees in the editor and in the generated text
   document, at least according to the author's personal experience.

8.2.  Verification Tools

   There are a few tools that are very good to know about when writing a
   draft.  These help check and verify parts of one's work.  These tools
   can be found at <http://tools.ietf.org>.

   o  I-D Nits checker (https://tools.ietf.org/tools/idnits/).  It
      checks that the boilerplate and some other things that are easily
      verifiable by machine are okay in your draft.  Always use it
      before submitting a draft to avoid direct refusal in the
      submission step.

   o  ABNF Parser and verification (https://tools.ietf.org/tools/bap/
      abnf.cgi).  Checks that your ABNF parses correctly and warns about
      loose ends, like undefined symbols.  However, the actual content
      can only be verified by humans knowing what it intends to
      describe.

   o  RFC diff (https://tools.ietf.org/rfcdiff).  A diff tool that is
      optimized for drafts and RFCs.  For example, it does not point out
      that the footer and header have moved in relation to the text on
      every page.

RFC8088 - Page 47

9.  Security Considerations

   As this is an Informational RFC about writing drafts that are
   intended to become RFCs, there are no direct security considerations.
   However, the document does discuss the writing of Security
   Considerations sections and what should be particularly considered
   when specifying RTP payload formats.

(page 47 continued on part 4)