Tech-invite3GPPspecsGlossariesIETFRFCsGroupsSIPABNFsWorld Map

RFC 8088


How to Write an RTP Payload Format

Part 2 of 5, p. 13 to 31
Prev Section       Next Section


prevText      Top      ToC       Page 13 
3.3.  Important RTP Details

   This section reviews a number of RTP features and concepts that are
   available in RTP, independent of the payload format.  The RTP payload
   format can make use of these when appropriate, and even affect the
   behavior (RTP timestamp and marker bit), but it is important to note
   that not all features and concepts are relevant to every payload
   format.  This section does not remove the necessity to read up on
   RTP.  However, it does point out a few important details to remember
   when designing a payload format.

3.3.1.  The RTP Session

   The definition of the RTP session from RFC 3550 is:

      An association among a set of participants communicating with RTP.
      A participant may be involved in multiple RTP sessions at the same
      time.  In a multimedia session, each medium is typically carried
      in a separate RTP session with its own RTCP packets unless the
      encoding itself multiplexes multiple media into a single data
      stream.  A participant distinguishes multiple RTP sessions by
      reception of different sessions using different pairs of
      destination transport addresses, where a pair of transport
      addresses comprises one network address plus a pair of ports for
      RTP and RTCP.  All participants in an RTP session may share a
      common destination transport address pair, as in the case of IP
      multicast, or the pairs may be different for each participant, as
      in the case of individual unicast network addresses and port
      pairs.  In the unicast case, a participant may receive from all
      other participants in the session using the same pair of ports, or
      may use a distinct pair of ports for each.

      The distinguishing feature of an RTP session is that each session
      maintains a full, separate space of SSRC identifiers (defined
      next).  The set of participants included in one RTP session
      consists of those that can receive an SSRC identifier transmitted
      by any one of the participants either in RTP as the SSRC or a CSRC
      (also defined below) or in RTCP.  For example, consider a three-
      party conference implemented using unicast UDP with each
      participant receiving from the other two on separate port pairs.
      If each participant sends RTCP feedback about data received from
      one other participant only back to that participant, then the
      conference is composed of three separate point-to-point RTP
      sessions.  If each participant provides RTCP feedback about its

Top      Up      ToC       Page 14 
      reception of one other participant to both of the other
      participants, then the conference is composed of one multi-party
      RTP session.  The latter case simulates the behavior that would
      occur with IP multicast communication among the three

      The RTP framework allows the variations defined here, but a
      particular control protocol or application design will usually
      impose constraints on these variations.

3.3.2.  RTP Header

   The RTP header contains a number of fields.  Two fields always
   require additional specification by the RTP payload format, namely
   the RTP timestamp and the marker bit.  Certain RTP payload formats
   also use the RTP sequence number to realize certain functionalities,
   primarily related to the order of their application data units.  The
   payload type is used to indicate the used payload format.  The SSRC
   is used to distinguish RTP packets from multiple senders and media
   sources identifying the RTP stream.  Finally, [RFC5285] specifies how
   to transport payload format independent metadata relating to the RTP
   packet or stream.

   Marker Bit:  A single bit normally used to provide important
      indications.  In audio, it is normally used to indicate the start
      of a talk burst.  This enables jitter buffer adaptation prior to
      the beginning of the burst with minimal audio quality impact.  In
      video, the marker bit is normally used to indicate the last packet
      part of a frame.  This enables a decoder to finish decoding the
      picture, where it otherwise may need to wait for the next packet
      to explicitly know that the frame is finished.

   Timestamp:  The RTP timestamp indicates the time instance the media
      sample belongs to.  For discrete media like video, it normally
      indicates when the media (frame) was sampled.  For continuous
      media, it normally indicates the first time instance the media
      present in the payload represents.  For audio, this is the
      sampling time of the first sample.  All RTP payload formats must
      specify the meaning of the timestamp value and the clock rates
      allowed.  Selecting a timestamp rate is an active design choice
      and is further discussed in Section 5.2.

      Discontinuous Transmission (DTX) that is common among speech
      codecs, typically results in gaps or jumps in the timestamp values
      due to that there is no media payload to transmit and the next
      used timestamp value represent the actual sampling time of the
      data transmitted.

Top      Up      ToC       Page 15 
   Sequence Number:  The sequence number is monotonically increasing and
      is set as the packet is sent.  This property is used in many
      payload formats to recover the order of everything from the whole
      stream down to fragments of application data units (ADUs) and the
      order they need to be decoded.  Discontinuous transmissions do not
      result in gaps in the sequence number, as it is monotonically
      increasing for each sent RTP packet.

   Payload Type:  The payload type is used to indicate, on a per-packet
      basis, which format is used.  The binding between a payload type
      number and a payload format and its configuration are dynamically
      bound and RTP session specific.  The configuration information can
      be bound to a payload type value by out-of-band signaling
      (Section 3.4).  An example of this would be video decoder
      configuration information.  Commonly, the same payload type is
      used for a media stream for the whole duration of a session.
      However, in some cases it may be necessary to change the payload
      format or its configuration during the session.

   SSRC:  The synchronization source (SSRC) identifier is normally not
      used by a payload format other than to identify the RTP timestamp
      and sequence number space a packet belongs to, allowing
      simultaneously reception of multiple media sources.  However, some
      of the RTP mechanisms for improving resilience to packet loss uses
      multiple SSRCs to separate original data and repair or redundant
      data, as well as multi-stream transmission of scalable codecs.

   Header Extensions:  RTP payload formats often need to include
      metadata relating to the payload data being transported.  Such
      metadata is sent as a payload header, at the start of the payload
      section of the RTP packet.  The RTP packet also includes space for
      a header extension [RFC5285]; this can be used to transport
      payload format independent metadata, for example, an SMPTE time
      code for the packet [RFC5484].  The RTP header extensions are not
      intended to carry headers that relate to a particular payload
      format, and must not contain information needed in order to decode
      the payload.

   The remaining fields do not commonly influence the RTP payload
   format.  The padding bit is worth clarifying as it indicates that one
   or more bytes are appended after the RTP payload.  This padding must
   be removed by a receiver before payload format processing can occur.
   Thus, it is completely separate from any padding that may occur
   within the payload format itself.

Top      Up      ToC       Page 16 
3.3.3.  RTP Multiplexing

   RTP has three multiplexing points that are used for different
   purposes.  A proper understanding of this is important to correctly
   use them.

   The first one is separation of RTP streams of different types or
   usages, which is accomplished using different RTP sessions.  So, for
   example, in the common multimedia session with audio and video, RTP
   commonly multiplexes audio and video in different RTP sessions.  To
   achieve this separation, transport-level functionalities are used,
   normally UDP port numbers.  Different RTP sessions can also be used
   to realize layered scalability as it allows a receiver to select one
   or more layers for multicast RTP sessions simply by joining the
   multicast groups over which the desired layers are transported.  This
   separation also allows different Quality of Service (QoS) to be
   applied to different media types.  Use of multiple transport flows
   has potential issues due to NAT and firewall traversal.  The choices
   how one applies RTP sessions as well as transport flows can affect
   the transport properties an RTP media stream experiences.

   The next multiplexing point is separation of different RTP streams
   within an RTP session.  Here, RTP uses the SSRC to identify
   individual sources of RTP streams.  An example of individual media
   sources would be the capture of different microphones that are
   carried in an RTP session for audio, independently of whether they
   are connected to the same host or different hosts.  There also exist
   cases where a single media source, is transmitted using multiple RTP
   streams.  For each SSRC, a unique RTP sequence number and timestamp
   space is used.

   The third multiplexing point is the RTP header payload type field.
   The payload type identifies what format the content in the RTP
   payload has.  This includes different payload format configurations,
   different codecs, and also usage of robustness mechanisms like the
   one described in RFC 2198 [RFC2198].

3.3.4.  RTP Synchronization

   There are several types of synchronization, and we will here describe
   how RTP handles the different types:

   Intra media:  The synchronization within a media stream from a
      synchronization source (SSRC) is accomplished using the RTP
      timestamp field.  Each RTP packet carries the RTP timestamp, which
      specifies the position in time of the media payload contained in
      this packet relative to the content of other RTP packets in the
      same RTP stream (i.e., a given SSRC).  This is especially useful

Top      Up      ToC       Page 17 
      in cases of discontinuous transmissions.  Discontinuities can be
      caused by network conditions; when extensive losses occur the RTP
      timestamp tells the receiver how much later than previously
      received media the present media should be played out.

   Inter-media:  Applications commonly have a desire to use several
      media sources, possibly of different media types, at the same
      time.  Thus, there exists a need to synchronize different media
      from the same endpoint.  This puts two requirements on RTP: the
      possibility to determine which media are from the same endpoint
      and if they should be synchronized with each other and the
      functionality to facilitate the synchronization itself.

   The first step in inter-media synchronization is to determine which
   SSRCs in each session should be synchronized with each other.  This
   is accomplished by comparing the CNAME fields in the RTCP source
   description (SDES) packets.  SSRCs with the same CNAME sent in any of
   multiple RTP sessions can be synchronized.

   The actual RTCP mechanism for inter-media synchronization is based on
   the idea that each RTP stream provides a position on the media
   specific time line (measured in RTP timestamp ticks) and a common
   reference time line.  The common reference time line is expressed in
   RTCP as a wall-clock time in the Network Time Protocol (NTP) format.
   It is important to notice that the wall-clock time is not required to
   be synchronized between hosts, for example, by using NTP [RFC5905].
   It can even have nothing at all to do with the actual time; for
   example, the host system's up-time can be used for this purpose.  The
   important factor is that all media streams from a particular source
   that are being synchronized use the same reference clock to derive
   their relative RTP timestamp time scales.  The type of reference
   clock and its timebase can be signaled using RTP Clock Source
   Signaling [RFC7273].

   Figure 1 illustrates how if one receives RTCP Sender Report (SR)
   packet P1 for one RTP stream and RTCP SR packet P2 for the other RTP
   stream, then one can calculate the corresponding RTP timestamp values
   for any arbitrary point in time T.  However, to be able to do that,
   it is also required to know the RTP timestamp rates for each RTP
   stream currently used in the sessions.

Top      Up      ToC       Page 18 
   TS1   --+---------------+------->
           |               |
          P1               |
           |               |
   NTP  ---+-----+---------T------>
                 |         |
                P2         |
                 |         |
   TS2  ---------+---------+---X-->

   Figure 1: RTCP Synchronization

   Assume that medium 1 uses an RTP timestamp clock rate of 16 kHz, and
   medium 2 uses a clock rate of 90 kHz.  Then, TS1 and TS2 for point T
   can be calculated in the following way: TS1(T) = TS1(P1) + 16000 *
   (NTP(T)-NTP(P1)) and TS2(T) = TS2(P2) + 90000 * (NTP(T)-NTP(P2)).
   This calculation is useful as it allows the implementation to
   generate a common synchronization point for which all time values are
   provided (TS1(T), TS2(T) and T).  So, when one wishes to calculate
   the NTP time that the timestamp value present in packet X corresponds
   to, one can do that in the following way: NTP(X) = NTP(T) + (TS2(X) -

   Improved signaling for layered codecs and fast tune-in have been
   specified in "Rapid Synchronization for RTP Flows" [RFC6051].

   Leap seconds are extra seconds added or seconds removed to keep our
   clocks in sync with the earth's rotation.  Adding or removing seconds
   can impact the reference clock as discussed in "RTP and Leap Seconds"
   [RFC7164]; also, in cases where the RTP timestamp values are derived
   using the wall clock during the leap second event, errors can occur.
   Implementations need to consider leap seconds and should consider the
   recommendations in [RFC7164].

3.4.  Signaling Aspects

   RTP payload formats are used in the context of application signaling
   protocols such as SIP [RFC3261] using the Session Description
   Protocol (SDP) [RFC4566] with Offer/Answer [RFC3264], RTSP [RFC7826],
   or the Session Announcement Protocol [RFC2974].  These examples all
   use out-of-band signaling to indicate which type of RTP streams are
   desired to be used in the session and how they are configured.  To be
   able to declare or negotiate the media format and RTP payload
   packetization, the payload format must be given an identifier.  In
   addition to the identifier, many payload formats also have the need
   to signal further configuration information out-of-band for the RTP
   payloads prior to the media transport session.

Top      Up      ToC       Page 19 
   The above examples of session-establishing protocols all use SDP, but
   other session description formats may be used.  For example, there
   was discussion of a new XML-based session description format within
   the IETF (SDP-NG).  In the end, the proposal did not get beyond draft
   protocol specification because of the enormous installed base of SDP
   implementations.  However, to avoid locking the usage of RTP to SDP
   based out-of-band signaling, the payload formats are identified using
   a separate definition format for the identifier and associated
   parameters.  That format is the media type.

3.4.1.  Media Types

   Media types [RFC6838] are identifiers originally created for
   identifying media formats included in email.  In this usage, they
   were known as MIME types, where the expansion of the MIME acronym
   includes the word "mail".  The term "media type" was introduced to
   reflect a broader usage, which includes HTTP [RFC7231], Message
   Session Relay Protocol (MSRP) [RFC4975], and many other protocols to
   identify arbitrary content carried within the protocols.  Media types
   also provide a media hierarchy that fits RTP payload formats well.
   Media type names are of two parts and consist of content type and
   sub-type separated with a slash, e.g., 'audio/PCMA' or 'video/
   h263-2000'.  It is important to choose the correct content-type when
   creating the media type identifying an RTP payload format.  However,
   in most cases, there is little doubt what content type the format
   belongs to.  Guidelines for choosing the correct media type and
   registration rules for media type names are provided in "Media Type
   Specifications and Registration Procedures" [RFC6838].  The
   additional rules for media types for RTP payload formats are provided
   in "Media Type Registration of RTP Payload Formats" [RFC4855].

   Registration of the RTP payload name is something that is required to
   avoid name collision in the future.  Note that "x-" names are not
   suitable for any documented format as they have the same problem with
   name collision and can't be registered.  The list of already-
   registered media types can be found at

   Media types are allowed any number of parameters, which may be
   required or optional for that media type.  They are always specified
   on the form "name=value".  There exist no restrictions on how the
   value is defined from the media type's perspective, except that
   parameters must have a value.  However, the usage of media types in

Top      Up      ToC       Page 20 
   SDP, etc., has resulted in the following restrictions that need to be
   followed to make media types usable for RTP-identifying payload

   1.  Arbitrary binary content in the parameters is allowed, but it
       needs to be encoded so that it can be placed within text-based
       protocols.  Base64 [RFC4648] is recommended, but for shorter
       content Base16 [RFC4648] may be more appropriate as it is simpler
       to interpret for humans.  This needs to be explicitly stated when
       defining a media type parameter with binary values.

   2.  The end of the value needs to be easily found when parsing a
       message.  Thus, parameter values that are continuous and not
       interrupted by common text separators, such as space and
       semicolon characters, are recommended.  If that is not possible,
       some type of escaping should be used.  Usage of quote (") is
       recommended; do not forget to provide a method of encoding any
       character used for quoting inside the quoted element.

   3.  A common representation form for the media type and its
       parameters is on a single line.  In that case, the media type is
       followed by a semicolon-separated list of the parameter value
       pairs, e.g.:

       audio/amr octet-align=0; mode-set=0,2,5,7; mode-change-period=2

3.4.2.  Mapping to SDP

   Since SDP [RFC4566] is so commonly used as an out-of-band signaling
   protocol, a mapping of the media type into SDP exists.  The details
   on how to map the media type and its parameters into SDP are
   described in [RFC4855].  However, this is not sufficient to explain
   how certain parameters must be interpreted, for example, in the
   context of Offer/Answer negotiation [RFC3264].  The Offer/Answer Model

   The Offer/Answer (O/A) model allows SIP to negotiate which media
   formats and payload formats are to be used in a session and how they
   are to be configured.  However, O/A does not define a default
   behavior; instead, it points out the need to define how parameters
   behave.  To make things even more complex, the direction of media
   within a session has an impact on these rules, so that some cases may
   require separate descriptions for RTP streams that are send-only,
   receive-only, or both sent and received as identified by the SDP
   attributes a=sendonly, a=recvonly, and a=sendrecv.  In addition, the
   usage of multicast adds further limitations as the same RTP stream is

Top      Up      ToC       Page 21 
   delivered to all participants.  If those multicast-imposed
   restrictions are too limiting for unicast, then separate rules for
   unicast and multicast will be required.

   The simplest and most common O/A interpretation is that a parameter
   is defined to be declarative; i.e., the SDP Offer/Answer sending
   agent can declare a value and that has no direct impact on the other
   agent's values.  This declared value applies to all media that are
   going to be sent to the declaring entity.  For example, most video
   codecs have a level parameter that tells the other participants the
   highest complexity the video decoder supports.  The level parameter
   can be declared independently by two participants in a unicast
   session as it will be the media sender's responsibility to transmit a
   video stream that fulfills the limitation the other side has
   declared.  However, in multicast, it will be necessary to send a
   stream that follows the limitation of the weakest receiver, i.e., the
   one that supports the lowest level.  To simplify the negotiation in
   these cases, it is common to require any answerer to a multicast
   session to take a yes or no approach to parameters.

   A "negotiated" parameter is a different case, for which both sides
   need to agree on its value.  Such a parameter requires the answerer
   to either accept it as it is offered or remove the payload type the
   parameter belonged to from its answer.  The removal of the payload
   type from the answer indicates to the offerer the lack of support for
   the parameter values presented.  An unfortunate implication of the
   need to use complete payload types to indicate each possible
   configuration so as to maximize the chances of achieving
   interoperability, is that the number of necessary payload types can
   quickly grow large.  This is one reason to limit the total number of
   sets of capabilities that may be implemented.

   The most problematic type of parameters are those that relate to the
   media the entity sends.  They do not really fit the O/A model, but
   can be shoehorned in.  Examples of such parameters can be found in
   the H.264 video codec's payload format [RFC6184], where the name of
   all parameters with this property starts with "sprop-".  The issue
   with these parameters is that they declare properties for a RTP
   stream that the other party may not accept.  The best one can make of
   the situation is to explain the assumption that the other party will
   accept the same parameter value for the media it will receive as the
   offerer of the session has proposed.  If the answerer needs to change
   any declarative parameter relating to streams it will receive, then
   the offerer may be required to make a new offer to update the
   parameter values for its outgoing RTP stream.

Top      Up      ToC       Page 22 
   Another issue to consider is the send-only RTP streams in offers.
   Parameters that relate to what the answering entity accepts to
   receive have no meaning other than to provide a template for the
   answer.  It is worth pointing out in the specification that these
   really provide a set of parameter values that the sender recommends.
   Note that send-only streams in answers will need to indicate the
   offerer's parameters to ensure that the offerer can match the answer
   to the offer.

   A further issue with Offer/Answer that complicates things is that the
   answerer is allowed to renumber the payload types between offer and
   answer.  This is not recommended, but allowed for support of gateways
   to the ITU conferencing suite.  This means that it must be possible
   to bind answers for payload types to the payload types in the offer
   even when the payload type number has been changed, and some of the
   proposed payload types have been removed.  This binding must normally
   be done by matching the configurations originally offered against
   those in the answer.  This may require specification in the payload
   format of which parameters that constitute a configuration, for
   example, as done in Section 8.2.2 of the H.264 RTP Payload format
   [RFC6184], which states: "The parameters identifying a media format
   configuration for H.264 are profile-level-id and packetization-mode".  Declarative Usage in RTSP and SAP

   SAP (Session Announcement Protocol) [RFC2974] was experimentally used
   for announcing multicast sessions.  Similar but better protocols are
   using SDP in a declarative style to configure multicast-based
   applications.  Independently of the usage of Source-Specific
   Multicast (SSM) [RFC3569] or Any-Source Multicast (ASM), the SDP
   provided by these configuration delivery protocols applies to all
   participants.  All media that is sent to the session must follow the
   RTP stream definition as specified by the SDP.  This enables everyone
   to receive the session if they support the configuration.  Here, SDP
   provides a one-way channel with no possibility to affect the
   configuration that the session creator has decided upon.  Any RTP
   payload format that requires parameters for the send direction and
   that needs individual values per implementation or instance will fail
   in a SAP session for a multicast session allowing anyone to send.

   Real-Time Streaming Protocol (RTSP) [RFC7826] allows the negotiation
   of transport parameters for RTP streams that are part of a streaming
   session between a server and client.  RTSP has divided the transport
   parameters from the media configuration.  SDP is commonly used for
   media configuration in RTSP and is sent to the client prior to
   session establishment, either through use of the DESCRIBE method or

Top      Up      ToC       Page 23 
   by means of an out-of-band channel like HTTP, email, etc.  The SDP is
   used to determine which RTP streams and what formats are being used
   prior to session establishment.

   Thus, both SAP and RTSP use SDP to configure receivers and senders
   with a predetermined configuration for a RTP stream including the
   payload format and any of its parameters.  All parameters are used in
   a declarative fashion.  This can result in different treatment of
   parameters between Offer/Answer and declarative usage in RTSP and
   SAP.  Any such difference will need to be spelled out by the payload
   format specification.

3.5.  Transport Characteristics

   The general channel characteristics that RTP flows experience are
   documented in Section 3 of "Guidelines for Writers of RTP Payload
   Format Specifications" [RFC2736].  The discussion below provides
   additional information.

3.5.1.  Path MTU

   At the time of writing, the most common IP Maximum Transmission Unit
   (MTU) in commonly deployed link layers is 1500 bytes (Ethernet data
   payload).  However, there exist both links with smaller MTUs and
   links with much larger MTUs.  An example for links with small MTU
   size is older generation cellular links.  Certain parts of the
   Internet already support an IP MTU of 8000 bytes or more, but these
   are limited islands.  The most likely places to find MTUs larger than
   1500 bytes are within enterprise networks, university networks, data
   centers, storage networks, and over high capacity (10 Gbps or more)
   links.  There is a slow, ongoing evolution towards larger MTU sizes.
   However, at the same time, it has become common to use tunneling
   protocols, often multiple ones, whose overhead when added together
   can shrink the MTU significantly.  Thus, there exists a need both to
   consider limited MTUs as well as enable support of larger MTUs.  This
   should be considered in the design, especially in regard to features
   such as aggregation of independently decodable data units.

3.5.2.  Different Queuing Algorithms

   Routers and switches on the network path between an IP sender and a
   particular receiver can exhibit different behaviors affecting the
   end-to-end characteristics.  One of the more important aspects of
   this is queuing behavior.  Routers and switches have some amount of
   queuing to handle temporary bursts of data that designated to leave
   the switch or router on the same egress link.  A queue, when not
   empty, results in an increased path delay.

Top      Up      ToC       Page 24 
   The implementation of the queuing affects the delay and also how
   congestion signals (Explicit Congestion Notification (ECN) [RFC6679]
   or packet drops) are provided to the flow.  The other aspects are if
   the flow shares the queue with other flows and how the implementation
   affects the flow interaction.  This becomes important, for example,
   when real-time flows interact with long-lived TCP flows.  TCP has a
   built-in behavior in its congestion control that strives to fill the
   buffer; thus, all flows sharing the buffer experienced the delay
   build up.

   A common, but quite poor, queue-handling mechanism is tail-drop,
   i.e., only drop packets when the incoming packet doesn't fit in the
   queue.  If a bad queuing algorithm is combined with too much queue
   space, the queuing time can grow to be very significant and can even
   become multiple seconds.  This is called "bufferbloat" [BLOAT].
   Active Queue Management (AQM) is a term covering mechanisms that try
   to do something smarter by actively managing the queue, for example,
   sending congestion signals earlier by dropping packets earlier in the
   queue.  The behavior also affects the flow interactions.  For
   example, Random Early Detection (RED) [RED] selects which packet(s)
   to drop randomly.  This gives flows that have more packets in the
   queue a higher probability to experience the packet loss (congestion
   signal).  There is ongoing work in the IETF WG AQM to find suitable
   mechanisms to recommend for implementation and reduce the use of

3.5.3.  Quality of Service

   Using best-effort Internet has no guarantees for the path's
   properties.  QoS mechanisms are intended to provide the possibility
   to bound the path properties.  Where Diffserv [RFC2475] markings
   affect the queuing and forwarding behaviors of routers, the mechanism
   provides only statistical guarantees and care in how much marked
   packets of different types that are entering the network.  Flow-based
   QoS, like IntServ [RFC1633], has the potential for stricter
   guarantees as the properties are agreed on by each hop on the path,
   at the cost of per-flow state in the network.

4.  Standardization Process for an RTP Payload Format

   This section discusses the recommended process to produce an RTP
   payload format in the described venues.  This is to document the best
   current practice on how to get a well-designed and specified payload
   format as quickly as possible.  For specifications that are defined
   by standards bodies other than the IETF, the primary milestone is the
   registration of the media type for the RTP payload format.  For

Top      Up      ToC       Page 25 
   proprietary media formats, the primary goal depends on whether
   interoperability is desired at the RTP level.  However, there is also
   the issue of ensuring best possible quality of any specification.

4.1.  IETF

   For all standardized media formats, it is recommended that the
   payload format be specified in the IETF.  The main reason is to
   provide an openly available RTP payload format specification that has
   been reviewed by people experienced with RTP payload formats.  At the
   time of writing, this work is done in the PAYLOAD Working Group (WG),
   but that may change in the future.

4.1.1.  Steps from Idea to Publication

   There are a number of steps that an RTP payload format should go
   through from the initial idea until it is published.  This also
   documents the process that the PAYLOAD WG applies when working with
   RTP payload formats.

   Idea:   Determine the need for an RTP payload format as an IETF

   Initial effort:   Using this document as a guideline, one should be
      able to get started on the work.  If one's media codec doesn't fit
      any of the common design patterns or one has problems
      understanding what the most suitable way forward is, then one
      should contact the PAYLOAD WG and/or the WG Chairs.  The goal of
      this stage is to have an initial individual draft.  This draft
      needs to focus on the introductory parts that describe the real-
      time media format and the basic idea on how to packetize it.  Not
      all the details are required to be filled in.  However, the
      security chapter is not something that one should skip, even
      initially.  From the start, it is important to consider any
      serious security risks that need to be solved.  The first step is
      completed when one has a draft that is sufficiently detailed for a
      first review by the WG.  The less confident one is of the
      solution, the less work should be spent on details; instead,
      concentrate on the codec properties and what is required to make
      the packetization work.

   Submission of the first version:   When one has performed the above,
      one submits the draft as an individual draft
      (  This can be done at any
      time, except for a period prior to an IETF meeting (see important
      dates related to the next IETF meeting for draft submission cutoff
      date).  When the Internet-Draft announcement has been sent out on

Top      Up      ToC       Page 26 
      the draft announcement list
      (, forward it
      to the PAYLOAD WG (
      and request that it be reviewed.  In the email, outline any issues
      the authors currently have with the design.

   Iterative improvements:   Taking the feedback received into account,
      one updates the draft and tries resolve issues.  New revisions of
      the draft can be submitted at any time (again except for a short
      period before meetings).  It is recommended to submit a new
      version whenever one has made major updates or has new issues that
      are easiest to discuss in the context of a new draft version.

   Becoming a WG document:   Given that the definition of RTP payload
      formats is part of the PAYLOAD WG's charter, RTP payload formats
      that are going to be published as Standards Track RFCs need to
      become WG documents.  Becoming a WG document means that the WG
      Chairs or an appointed document shepherd are responsible for
      administrative handling, for example, issuing publication
      requests.  However, be aware that making a document into a WG
      document changes the formal ownership and responsibility from the
      individual authors to the WG.  The initial authors normally
      continue being the document editors, unless unusual circumstances
      occur.  The PAYLOAD WG accepts new RTP payload formats based on
      their suitability and document maturity.  The document maturity is
      a requirement to ensure that there are dedicated document editors
      and that there exists a good solution.

   Iterative improvements:  The updates and review cycles continue until
      the draft has reached the level of maturity suitable for
      publication.  The authors are responsible for judging when the
      document is ready for the next step, most likely WG Last Call, but
      they can ask the WG chairs or Shepherd.

   WG Last Call:   A WG Last Call of at least two weeks is always
      performed for payload formats in the PAYLOAD WG (see Section 7.4
      of [RFC2418]).  The authors request WG Last Call for a draft when
      they think it is mature enough for publication.  The WG Chairs or
      shepherd perform a review to check if they agree with the authors'
      assessment.  If the WG Chairs or shepherd agree on the maturity,
      the WG Last Call is announced on the WG mailing list.  If there
      are issues raised, these need to be addressed with an updated
      draft version.  For any more substantial changes to the draft, a
      new WG Last Call is announced for the updated version.  Minor
      changes, like editorial fixes, can be progressed without an
      additional WG Last Call.

Top      Up      ToC       Page 27 
   Publication requested:   For WG documents, the WG Chairs or shepherd
      request publication of the draft after it has passed WG Last Call.
      After this, the approval and publication process described in BCP
      9 [BCP9] is performed.  The status after the publication has been
      requested can be tracked using the IETF Datatracker [TRACKER].
      Documents do not expire as they normally do after publication has
      been requested, so authors do not have to issue keep-alive
      updates.  In addition, any submission of document updates requires
      the approval of WG Chair(s).  The authors are commonly asked to
      address comments or issues raised by the IESG.  The authors also
      do one last review of the document immediately prior to its
      publication as an RFC to ensure that no errors or formatting
      problems have been introduced during the publication process.

4.1.2.  WG Meetings

   WG meetings are for discussing issues, not presentations.  This means
   that most RTP payload formats should never need to be discussed in a
   WG meeting.  RTP payload formats that would be discussed are either
   those with controversial issues that failed to be resolved on the
   mailing list or those including new design concepts worth a general

   There exists no requirement to present or discuss a draft at a WG
   meeting before it becomes published as an RFC.  Thus, even authors
   who lack the possibility to go to WG meetings should be able to
   successfully specify an RTP payload format in the IETF.  WG meetings
   may become necessary only if the draft gets stuck in a serious debate
   that cannot easily be resolved.

4.1.3.  Draft Naming

   To simplify the work of the PAYLOAD WG Chairs and WG members, a
   specific Internet-Draft file-naming convention shall be used for RTP
   payload formats.  Individual submissions shall be named using the
   template: draft-<lead author family name>-payload-rtp-<descriptive
   name>-<version>.  The WG documents shall be named according to this
   template: draft-ietf-payload-rtp-<descriptive name>-<version>.  The
   inclusion of "payload" in the draft file name ensures that the search
   for "payload-" will find all PAYLOAD-related drafts.  Inclusion of
   "rtp" tells us that it is an RTP payload format draft.  The
   descriptive name should be as short as possible while still
   describing what the payload format is for.  It is recommended to use
   the media format or codec abbreviation.  Please note that the version
   must start at 00 and is increased by one for each submission to the
   IETF secretary of the draft.  No version numbers may be skipped.  For
   more details on draft naming, please see Section 7 of [ID-GUIDE].

Top      Up      ToC       Page 28 
4.1.4.  Writing Style

   When writing an Internet-Draft for an RTP payload format, one should
   observe some few considerations (that may be somewhat divergent from
   the style of other IETF documents and/or the media coding spec's
   author group may use):

   Include Motivations:  In the IETF, it is common to include the
      motivation for why a particular design or technical path was
      chosen.  These are not long statements: a sentence here and there
      explaining why suffice.

   Use the Defined Terminology:  There exists defined terminology both
      in RTP and in the media codec specification for which the RTP
      payload format is designed.  A payload format specification needs
      to use both to make clear the relation of features and their
      functions.  It is unwise to introduce or, worse, use without
      introduction, terminology that appears to be more accessible to
      average readers but may miss certain nuances that the defined
      terms imply.  An RTP payload format author can assume the reader
      to be reasonably familiar with the terminology in the media coding

   Keeping It Simple:  The IETF has a history of specifications that are
      focused on their main usage.  Historically, some RTP payload
      formats have a lot of modes and features, while the actual
      deployments have only included the most basic features that had
      very clear requirements.  Time and effort can be saved by focusing
      on only the most important use cases and keeping the solution
      simple.  An extension mechanism should be provided to enable
      backward-compatible extensions, if that is an organic fit.

   Normative Requirements:  When writing specifications, there is
      commonly a need to make it clear when something is normative and
      at what level.  In the IETF, the most common method is to use "Key
      words for use in RFCs to Indicate Requirement Levels" [RFC2119],
      which defines the meaning of "MUST", "MUST NOT", "REQUIRED",

Top      Up      ToC       Page 29 
4.1.5.  How to Speed Up the Process

   There a number of ways to lose a lot of time in the above process.
   This section discusses what to do and what to avoid.

   o  Do not update the draft only for the meeting deadline.  An update
      to each meeting automatically limits the draft to three updates
      per year.  Instead, ignore the meeting schedule and publish new
      versions as soon as possible.

   o  Try to avoid requesting reviews when people are busy, like the few
      weeks before a meeting.  It is actually more likely that people
      have time for them directly after a meeting.

   o  Perform draft updates quickly.  A common mistake is that the
      authors let the draft slip.  By performing updates to the draft
      text directly after getting resolution on an issue, things speed
      up.  This minimizes the delay that the author has direct control
      over.  The time taken for reviews, responses from Area Directors
      and WG Chairs, etc., can be much harder to speed up.

   o  Do not fail to take human nature into account.  It happens that
      people forget or need to be reminded about tasks.  Send a kind
      reminder to the people you are waiting for if things take longer
      than expected.  Ask people to estimate when they expect to fulfill
      the requested task.

   o  Ensure there is enough review.  It is common that documents take a
      long time and many iterations because not enough review is
      performed in each iteration.  To improve the amount of review you
      get on your own document, trade review time with other document
      authors.  Make a deal with some other document author that you
      will review their draft if they review yours.  Even inexperienced
      reviewers can help with language, editorial, or clarity issues.
      Also, try approaching the more experienced people in the WG and
      getting them to commit to a review.  The WG Chairs cannot, even if
      desirable, be expected to review all versions.  Due to workload,
      the Chairs may need to concentrate on key points in a draft
      evolution like checking on initial submissions, a draft's
      readiness to become a WG document, or its readiness for WG Last

4.2.  Other Standards Bodies

   Other standards bodies may define RTP payloads in their own
   specifications.  When they do this, they are strongly recommended to
   contact the PAYLOAD WG Chairs and request review of the work.  It is
   recommended that at least two review steps are performed.  The first

Top      Up      ToC       Page 30 
   should be early in the process when more fundamental issues can be
   easily resolved without abandoning a lot of effort.  Then, when
   nearing completion, but while it is still possible to update the
   specification, a second review should be scheduled.  In that pass,
   the quality can be assessed; hopefully, no updates will be needed.
   Using this procedure can avoid both conflicting definitions and
   serious mistakes, like breaking certain aspects of the RTP model.

   RTP payload media types may be registered in the standards tree by
   other standards bodies.  The requirements on the organization are
   outlined in the media types registration documents [RFC4855] and
   [RFC6838]).  This registration requires a request to the IESG, which
   ensures that the filled-in registration template is acceptable.  To
   avoid last-minute problems with these registrations the registration
   template must be sent for review both to the PAYLOAD WG and the media
   types list ( and is something that should be
   included in the IETF reviews of the payload format specification.

4.3.  Proprietary and Vendor Specific

   Proprietary RTP payload formats are commonly specified when the real-
   time media format is proprietary and not intended to be part of any
   standardized system.  However, there are reasons why also proprietary
   formats should be correctly documented and registered:

   o  Usage in a standardized signaling environment, such as SIP/SDP.
      RTP needs to be configured with the RTP profiles, payload formats,
      and their payload types being used.  To accomplish this, it is
      desirable to have registered media type names to ensure that the
      names do not collide with those of other formats.

   o  Sharing with business partners.  As RTP payload formats are used
      for communication, situations often arise where business partners
      would like to support a proprietary format.  Having a well-written
      specification of the format will save time and money for both
      parties, as interoperability will be much easier to accomplish.

   o  To ensure interoperability between different implementations on
      different platforms.

   To avoid name collisions, there is a central registry keeping track
   of the registered media type names used by different RTP payload
   formats.  When it comes to proprietary formats, they should be
   registered in the vendor's own tree.  All vendor-specific
   registrations use sub-type names that start with "vnd.<vendor-name>".
   Names in the vendor's own tree are not required to be registered with
   IANA.  However, registration [RFC6838] is recommended if the media
   type is used at all in public environments.

Top      Up      ToC       Page 31 
   If interoperability at the RTP level is desired, a payload type
   specification should be standardized in the IETF following the
   process described above.  The IETF does not require full disclosure
   of the codec when defining an RTP payload format to carry that codec,
   but a description must be provided that is sufficient to allow the
   IETF to judge whether the payload format is well designed.  The media
   type identifier assigned to a standardized payload format of this
   sort will lie in the standards tree rather than the vendor tree.

4.4.  Joint Development of Media Coding Specification and RTP Payload

   In the last decade, there have been a few cases where the media codec
   and the associated RTP payload format have been developed
   concurrently and jointly.  Developing the two specs not only
   concurrently but also jointly, in close cooperation with the group
   developing the media codec, allows one to leverage the benefits joint
   source/channel coding can provide.  Doing so has historically
   resulted in well-performing payload formats and in success of both
   the media coding specification and associated RTP payload format.
   Insofar, whenever the opportunity presents it, it may be useful to
   closely keep the media coding group in the loop (through appropriate
   liaison means whatever those may be) and influence the media coding
   specification to be RTP friendly.  One example for such a media
   coding specification is H.264, where the RTP payload header co-serves
   as the H.264 NAL unit header and vice versa, and is documented in
   both specifications.

(page 31 continued on part 3)

Next Section