tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search     info

RFC 7655

Proposed STD
Pages: 32
Top     in Index     Prev     Next
in Group Index     Prev in Group     Next in Group     Group: PAYLOAD

RTP Payload Format for G.711.0

Part 1 of 2, p. 1 to 10
None       Next RFC Part

 


Top       ToC       Page 1 
Internet Engineering Task Force (IETF)                   M. Ramalho, Ed.
Request for Comments: 7655                                      P. Jones
Category: Standards Track                                  Cisco Systems
ISSN: 2070-1721                                                N. Harada
                                                                     NTT
                                                              M. Perumal
                                                                Ericsson
                                                                 L. Miao
                                                     Huawei Technologies
                                                           November 2015


                     RTP Payload Format for G.711.0

Abstract

   This document specifies the Real-time Transport Protocol (RTP)
   payload format for ITU-T Recommendation G.711.0.  ITU-T Rec. G.711.0
   defines a lossless and stateless compression for G.711 packet
   payloads typically used in IP networks.  This document also defines a
   storage mode format for G.711.0 and a media type registration for the
   G.711.0 RTP payload format.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc7655.

Page 2 
Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Top       Page 3 
Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   4
   2.  Requirements Language . . . . . . . . . . . . . . . . . . . .   4
   3.  G.711.0 Codec Background  . . . . . . . . . . . . . . . . . .   4
     3.1.  General Information and Use of the ITU-T G.711.0 Codec  .   4
     3.2.  Key Properties of G.711.0 Design  . . . . . . . . . . . .   6
     3.3.  G.711 Input Frames to G.711.0 Output Frames . . . . . . .   8
       3.3.1.  Multiple G.711.0 Output Frames per RTP Payload
               Considerations  . . . . . . . . . . . . . . . . . . .   9
   4.  RTP Header and Payload  . . . . . . . . . . . . . . . . . . .  10
     4.1.  G.711.0 RTP Header  . . . . . . . . . . . . . . . . . . .  10
     4.2.  G.711.0 RTP Payload . . . . . . . . . . . . . . . . . . .  12
       4.2.1.  Single G.711.0 Frame per RTP Payload Example  . . . .  12
       4.2.2.  G.711.0 RTP Payload Definition  . . . . . . . . . . .  13
         4.2.2.1.  G.711.0 RTP Payload Encoding Process  . . . . . .  14
       4.2.3.  G.711.0 RTP Payload Decoding Process  . . . . . . . .  15
       4.2.4.  G.711.0 RTP Payload for Multiple Channels . . . . . .  17
   5.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  19
     5.1.  Media Type Registration . . . . . . . . . . . . . . . . .  20
     5.2.  Mapping to SDP Parameters . . . . . . . . . . . . . . . .  22
     5.3.  Offer/Answer Considerations . . . . . . . . . . . . . . .  22
     5.4.  SDP Examples  . . . . . . . . . . . . . . . . . . . . . .  23
       5.4.1.  SDP Example 1 . . . . . . . . . . . . . . . . . . . .  23
       5.4.2.  SDP Example 2 . . . . . . . . . . . . . . . . . . . .  23
   6.  G.711.0 Storage Mode Conventions and Definition . . . . . . .  24
     6.1.  G.711.0 PLC Frame . . . . . . . . . . . . . . . . . . . .  24
     6.2.  G.711.0 Erasure Frame . . . . . . . . . . . . . . . . . .  25
     6.3.  G.711.0 Storage Mode Definition . . . . . . . . . . . . .  26
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  27
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  27
   9.  Congestion Control  . . . . . . . . . . . . . . . . . . . . .  28
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  29
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  29
     10.2.  Informative References . . . . . . . . . . . . . . . . .  30
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  31
   Contributors  . . . . . . . . . . . . . . . . . . . . . . . . . .  31
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  31

Top      ToC       Page 4 
1.  Introduction

   The International Telecommunication Union (ITU-T) Recommendation
   G.711.0 [G.711.0] specifies a stateless and lossless compression for
   G.711 packet payloads typically used in Voice over IP (VoIP)
   networks.  This document specifies the Real-time Transport Protocol
   (RTP) RFC 3550 [RFC3550] payload format and storage modes for this
   compression.

2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

3.  G.711.0 Codec Background

   ITU-T Recommendation G.711.0 [G.711.0] is a lossless and stateless
   compression mechanism for ITU-T Recommendation G.711 [G.711] and thus
   is not a "codec" in the sense of "lossy" codecs typically carried by
   RTP.  When negotiated end-to-end, ITU-T Rec. G.711.0 is negotiated as
   if it were a codec, with the understanding that ITU-T Rec. G.711.0
   losslessly encoded the underlying (lossy) G.711 Pulse Code Modulation
   (PCM) sample representation of an audio signal.  For this reason,
   ITU-T Rec. G.711.0 will be interchangeably referred to in this
   document as a "lossless data compression algorithm" or a "codec",
   depending on context.  Within this document, individual G.711 PCM
   samples will be referred to as "G.711 symbols" or just "symbols" for
   brevity.

   This section describes the ITU-T Recommendation G.711 [G.711] codec,
   its properties, typical uses cases, and its key design properties.

3.1.  General Information and Use of the ITU-T G.711.0 Codec

   ITU-T Recommendation G.711 is the benchmark standard for narrowband
   telephony.  It has been successful for many decades because of its
   proven voice quality, ubiquity, and utility.  A new ITU-T
   recommendation, G.711.0, has been established for defining a
   stateless and lossless compression for G.711 packet payloads
   typically used in VoIP networks.  ITU-T Rec. G.711.0 is also known as
   ITU-T Rec. G.711 Annex A [G.711-A1], as ITU-T Rec. G.711 Annex A is
   effectively a pointer ITU-T Rec. G.711.0.  Henceforth in this
   document, ITU-T Rec. G.711.0 will simply be referred to as "G.711.0"
   and ITU-T Rec. G.711 simply as "G.711".

Top      ToC       Page 5 
   G.711.0 may be employed end-to-end, in which case the RTP payload
   format specification and use is nearly identical to the G.711 RTP
   specification found in RFC 3551 [RFC3551].  The only significant
   difference for G.711.0 is the required use of a dynamic payload type
   (the static PT of 0 or 8 is presently almost always used with G.711
   even though dynamic assignment of other payload types is allowed) and
   the recommendation not to use Voice Activity Detection (see
   Section 4.1).

   G.711.0, being both lossless and stateless, may also be employed as a
   lossless compression mechanism for G.711 payloads anywhere between
   end systems that have negotiated use of G.711.  Because the only
   significant difference between the G.711 RTP payload format header
   and the G.711.0 payload format header defined in this document is the
   payload type, a G.711 RTP packet can be losslessly converted to a
   G.711.0 RTP packet simply by compressing the G.711 payload (thus
   creating a G.711.0 payload), changing the payload type to the dynamic
   value desired and copying all the remaining G.711 RTP header fields
   into the corresponding G.711.0 RTP header.  In a similar manner, the
   corresponding decompression of the G.711.0 RTP packet thus created
   back to the original source G.711 RTP packet can be accomplished by
   losslessly decompressing the G.711.0 payload back to the original
   source G.711 payload, changing the payload type back to the payload
   type of the original G.711 RTP packet and copying all the remaining
   G.711.0 RTP header fields into the corresponding G.711 RTP header.
   As a packet produced by the compression and decompression as
   described above is indistinguishable in every detail to the source
   G.711 packet, such compression can be made invisible to the end
   systems.  Specification of how systems on the path between the end
   systems discover each other and negotiate the use of G.711.0
   compression as described in this paragraph is outside the scope of
   this document.

   It is informative to note that G.711.0, being both lossless and
   stateless, can be employed multiple times (e.g., on multiple,
   individual hops or series of hops) of a given flow with no
   degradation of quality relative to end-to-end G.711.  Stated another
   way, multiple "lossless transcodes" from/to G.711.0/G.711 do not
   affect voice quality as typically occurs with lossy transcodes to/
   from dissimilar codecs.

   Lastly, it is expected that G.711.0 will be used as an archival
   format for recorded G.711 streams.  Therefore, a G.711.0 Storage Mode
   Format is also included in this document.

Top      ToC       Page 6 
3.2.  Key Properties of G.711.0 Design

   The fundamental design of G.711.0 resulted from the desire to
   losslessly encode and compress frames of G.711 symbols independent of
   what types of signals those G.711 frames contained.  The primary
   G.711.0 use case is for G.711 encoded, zero-mean, acoustic signals
   (such as speech and music).

   G.711.0 attributes are below:

   A1  Compression for zero-mean acoustic signals: G.711.0 was designed
         as its primary use case for the compression of G.711 payloads
         that contained "speech" or other zero-mean acoustic signals.
         G.711.0 obtains greater than 50% average compression in service
         provider environments [ICASSP].

   A2  Lossless for any G.711 payload: G.711.0 was designed to be
         lossless for any valid G.711 payload - even if the payload
         consisted of apparently random G.711 symbols (e.g., a modem or
         FAX payload).  G.711.0 could be used for "aggregate 64 kbps
         G.711 channels" carried over IP without explicit concern if a
         subset of these channels happened to be carrying something
         other than voice or general audio.  To the extent that a
         particular channel carried something other than voice or
         general audio, G.711.0 ensured that it was carried losslessly,
         if not significantly compressed.

   A3  Stateless: Compression of a frame of G.711 symbols was only to be
         dependent on that frame and not on any prior frame.  Although
         greater compression is usually available by observing a longer
         history of past G.711 symbols, it was decided that the
         compression design would be stateless to completely eliminate
         error propagation common in many lossy codec designs (e.g.,
         ITU-T Rec. G.729 [G.729] and ITU-T Rec. G.722 [G.722]).  That
         is, the decoding process need not be concerned about lost prior
         packets because the decompression of a given G.711.0 frame is
         not dependent on potentially lost prior G.711.0 frames.  Owing
         to this stateless property, the frames input to the G.711.0
         encoder may be changed "on-the-fly" (a 5 ms encoding could be
         followed by a 20 ms encoding).

   A4  Self-describing: This property is defined as the ability to
         determine how many source G.711 samples are contained within
         the G.711.0 frame solely by information contained within the
         G.711.0 frame.  Generally, the number of source G.711 symbols
         can be determined by decoding the initial octets of the
         compressed G.711.0 frame (these octets are called "prefix
         codes" in the standard).  A G.711.0 decoder need not know how

Top      ToC       Page 7 
         many symbols are contained in the original G.711 frame (e.g.,
         parameter ptime in the Session Description Protocol (SDP)
         [RFC4566]), as it is able to decompress the G.711.0 frame
         presented to it without signaling knowledge.

   A5  Accommodate G.711 payload sizes typically used in IP: G.711 input
         frames of length typically found in VoIP applications represent
         SDP ptime values of 5 ms, 10 ms, 20 ms, 30 ms, or 40 ms.
         Because the dominant sampling frequency for G.711 is 8000
         samples per second, G.711.0 was designed to compress G.711
         input frames of 40, 80, 160, 240, or 320 samples.

   A6  Bounded expansion: Since attribute A2 above requires G.711.0 to
         be lossless for any payload (which could consist of any
         combination of octets with each octet spanning the entire space
         of 2^8 values), by definition there exists at least one
         potential G.711 payload that must be "uncompressible".  Since
         the quantum of compression is an octet, the minimum expansion
         of such an uncompressible payload was designed to be the
         minimum possible of one octet.  Thus, G.711.0 "compressed"
         frames can be of length one octet to X+1 octets, where X is the
         size of the input G.711 frame in octets.  G.711.0 can therefore
         be viewed as a Variable Bit Rate (VBR) encoding in which the
         size of the G.711.0 output frame is a function of the G.711
         symbols input to it.

   A7  Algorithmic delay: G.711.0 was designed to have the algorithmic
         delay equal to the time represented by the number of samples in
         the G.711 input frame (i.e., no "look-ahead").

   A8  Low Complexity: Less than 1.0 Weighted Million Operations Per
         Second (WMOPS) average and low memory footprint (~5k octets
         RAM, ~5.7k octets ROM, and ~3.6 basic operations) [ICASSP]
         [G.711.0].

   A9  Both A-law and mu-law supported: G.711 has two operating laws,
         A-law and mu-law.  These two laws are also known as PCMA and
         PCMU in RTP applications [RFC3551].

   These attributes generally make it trivial to compress a G.711 input
   frame consisting of 40, 80, 160, 240, or 320 samples.  After the
   input frame is presented to a G.711.0 encoder, a G.711.0 "self-
   describing" output frame is produced.  The number of samples
   contained within this frame is easily determined at the G.711.0
   decoder by virtue of attribute A4.  The G.711.0 decoder can decode
   the G.711.0 frame back to a G.711 frame by using only data within the
   G.711.0 frame.

Top      ToC       Page 8 
   Lastly we note that losing a G.711.0 encoded packet is identical in
   effect to losing a G.711 packet (when using RTP); this is because a
   G.711.0 payload, like the corresponding G.711 payload, is stateless.
   Thus, it is anticipated that existing G.711 Packet Loss Concealment
   (PLC) mechanisms will be employed when a G.711.0 packet is lost and
   an identical MOS degradation relative to G.711 loss will be achieved.

3.3.  G.711 Input Frames to G.711.0 Output Frames

   G.711.0 is a lossless and stateless compression of G.711 frames.
   Figure 1 depicts this where "A" is the process of G.711.0 encoding
   and "B" is the process of G.711.0 decoding.

    |--------------------------|  A   |------------------------------|
    |    G.711 Input Frame     |----->|     G.711.0 Output Frame     |
    |       of X Octets        |      |  containing 1 to X+1 Octets  |
    | (where X MUST be 40, 80, |      | (precise value dependent on  |
    | 160, 240, or 320 octets) |<-----| G.711.0 ability to compress) |
    |__________________________|  B   |______________________________|

   Figure 1: 1:1 Mapping from G.711 Input Frame to G.711.0 Output Frame

   Note that the mapping is 1:1 (lossless) in both directions, subject
   to two constraints.  The first constraint is that the input frame
   provided to the G.711.0 encoder (process "A") has a specific number
   of input G.711 symbols consistent with attribute A5 (40, 80, 160,
   240, or 320 octets).  The second constraint is that the companding
   law used to create the G.711 input frame (A-law or mu-law) must be
   known, consistent with attribute A9.

   Subject to these two constraints, the input G.711 frame is processed
   by the G.711.0 encoder ("process A") and produces a "self-describing"
   G.711.0 output frame, consistent with attribute A4.  Depending on the
   source G.711 symbols, the G.711.0 output frame can contain anywhere
   from 1 to X+1 octets, where X is the number of input G.711 symbols.
   Compression results for virtually every zero-mean acoustic signal
   encoded by G.711.0.

   Since the G.711.0 output frame is "self-describing", a G.711.0
   decoder (process "B") can losslessly reproduce the original G.711
   input frame with only the knowledge of which companding law was used
   (A-law or mu-law).  The first octet of a G.711.0 frame is called the
   "Prefix Code" octet; the information within this octet conveys how
   many G.711 symbols the decoder is to create from a given G.711.0
   input frame (i.e., 0, 40, 80, 160, 240, or 320).  The Prefix Code
   value of 0x00 is used to denote zero G.711 source symbols, which
   allows the use of 0x00 as a payload padding octet (described later in
   Section 3.3.1).

Top      ToC       Page 9 
   Since G.711.0 was designed with typical G.711 payload lengths as a
   design constraint (attribute A5), this lossless encoding can be
   performed only with knowledge of the companding law being used.  This
   information is anticipated to be signaled in SDP and is described
   later in this document.

   If the original inputs were known to be from a zero-mean acoustic
   signal coded by G.711, an intelligent G.711.0 encoder could infer the
   G.711 companding law in use (via G.711 input signal amplitude
   histogram statistics).  Likewise, an intelligent G.711.0 decoder
   producing G.711 from the G.711.0 frames could also infer which
   encoding law is in use.  Thus, G.711.0 could be designed for use in
   applications that have limited stream signaling between the G.711
   endpoints (i.e., they only know "G.711 at 8k sampling is being used",
   but nothing more).  Such usage is not further described in this
   document.  Additionally, if the original inputs were known to come
   from zero-mean acoustic signals, an intelligent G.711.0 encoder could
   tell if the G.711.0 payload had been encrypted -- as the symbols
   would not have the distribution expected in either companding law and
   would appear random.  Such determination is also not further
   discussed in this document.

   It is easily seen that this process is 1:1 and that lossless
   compression based on G.711.0 can be employed multiple times, as the
   original G.711 input symbols are always reproduced with 100%
   fidelity.

3.3.1.  Multiple G.711.0 Output Frames per RTP Payload Considerations

   As a general rule, G.711.0 frames containing more source G.711
   symbols (from a given channel) will typically result in higher
   compression, but there are exceptions to this rule.  A G.711.0
   encoder may choose to encode 20 ms of input G.711 symbols as: 1) a
   single 20 ms G.711.0 frame, or 2) as two 10 ms G.711.0 frames, or 3)
   any other combination of 5 ms or 10 ms G.711.0 frames -- depending on
   which encoding resulted in fewer bits.  As an example, an intelligent
   encoder might encode 20 ms of G.711 symbols as two 10 ms G.711.0
   frames if the first 10 ms was "silence" and two G.711.0 frames took
   fewer bits than any other possible encoding combination of G.711.0
   frame sizes.

   During the process of G.711.0 standardization, it was recognized that
   although it is sometimes advantageous to encode integer multiples of
   40 G.711 symbols in whatever input symbol format resulted in the most
   compression (as per above), the simplest choice is to encode the
   entire ptime's worth of input G.711 symbols into one G.711.0 frame
   (if the ptime supported it).  This is especially so since the larger
   number of source G.711 symbols typically resulted in the highest

Top      ToC       Page 10 
   compression anyway and there is added complexity in searching for
   other possibilities (involving more G.711.0 frames) that were
   unlikely to produce a more bit efficient result.

   The design of ITU-T Rec. G.711.0 [G.711.0] foresaw the possibility of
   multiple G.711.0 input frames in that the decoder was defined to
   decode what it refers to as an incoming "bit stream".  For this
   specification, the bit stream is the G.711.0 RTP payload itself.
   Thus, the decoder will take the G.711.0 RTP payload and will produce
   an output frame containing the original G.711 symbols independent of
   how many G.711.0 frames were present in it.  Additionally, any number
   of 0x00 padding octets placed between the G.711.0 frames will be
   silently (and safely) ignored by the G.711.0 decoding process
   Section 4.2.3).

   To recap, a G.711.0 encoder may choose to encode incoming G.711
   symbols into one or more than one G.711.0 frames and put the
   resultant frame(s) into the G.711.0 RTP payload.  Zero or more 0x00
   padding octets may also be included in the G.711.0 RTP payload.  The
   G.711.0 decoder, being insensitive to the number of G.711.0 encoded
   frames that are contained within it, will decode the G.711.0 RTP
   payload into the source G.711 symbols.  Although examples of single
   or multiple G.711 frame cases are illustrated in Section 4.2, the
   multiple G.711.0 frame cases MUST be supported and there is no need
   for negotiation (SDP or otherwise) required for it.



(page 10 continued on part 2)

Next RFC Part