Network Working Group K. McKay Request for Comments: 2658 QUALCOMM Incorporated Category: Standards Track August 1999 RTP Payload Format for PureVoice(tm) Audio Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (1999). All Rights Reserved. ABSTRACT This document describes the RTP payload format for PureVoice(tm) Audio. The packet format supports variable interleaving to reduce the effect of packet loss on audio quality. 1] may be formatted for use as an RTP payload type. A method is provided to interleave the output of the compressor to reduce quality degradation due to lost packets. Furthermore, the sender may choose various interleave settings based on the importance of low end-to-end delay versus greater tolerance for lost packets. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 . 1] defines an audio compression algorithm for use in CDMA applications. In addition to being the standard CODEC for all wireless CDMA terminals, the Qualcomm PureVoice CODEC (a.k.a. Qcelp) is used in several Internet applications most notably JFax(tm), Apple(r) QuickTime(tm), and Eudora(r).
The Qcelp CODEC  compresses each 20 milliseconds of 8000 Hz, 16- bit sampled input speech into one of four different size output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 (54 bits) or Rate 1/8 (20 bits). The CODEC chooses the output frame rate based on analysis of the input speech and the current operating mode (either normal or reduced rate). For typical speech patterns, this results in an average output of 6.8 k bits/sec for normal mode and 4.7 k bits/sec for reduced rate mode. 2] | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |RR | LLL | NNN | | +-+-+-+-+-+-+-+-+ one or more codec data frames | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The RTP header has the expected values as described in . The extension bit is not set and this payload type never sets the marker bit. The codec data frames are aligned on octet boundaries. When interleaving is in use and/or multiple codec data frames are present in a single RTP packet, the timestamp is, as always, that of the oldest data represented in the RTP packet. The other fields have the following meaning: Reserved (RR): 2 bits MUST be set to zero by sender, SHOULD be ignored by receiver. Interleave (LLL): 3 bits MUST have a value between 0 and 5 inclusive. The remaining two values (6 and 7) MUST not be used by senders. If this field is non-zero, interleaving is enabled. All receivers MUST support interleaving. Senders MAY support interleaving. Senders that do not support interleaving MUST set field LLL and NNN to zero. Interleave Index (NNN): 3 bits MUST have a value less than or equal to the value of LLL. Values of NNN greater than the value of LLL are invalid.
section 4. 1] from highest to lowest are packed into octets. The highest numbered bit (265 for Rate 1, 123 for Rate 1/2, 53 for Rate 1/4 and 19 for Rate 1/8) is placed in the most significant bit (Internet bit 0) of octet 1 of the CODEC data frame. The second highest numbered bit (264 for Rate 1, etc.) is placed in the second most significant bit (Internet bit 1) of octet 1 of the data frame. This continues so that bit 258 from the standard Rate 1 frame is placed in the least significant bit of octet 1. Bit 257 from the standard is placed in the most significant bit of octet 2 and so on until bit 0 from the standard Rate 1 frame is placed in Internet bit 1 of octet 34 of the CODEC data frame. The remaining unused bits of the last octet of the CODEC data frame MUST be set to zero.
Here is a detail of how a Rate 1/8 frame is converted into a CODEC data frame: CODEC data frame 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |1|1|1|1|1|1|1|1|1|1| | | | | | | | | | | | | | | | 1 (Rate 1/8) |9|8|7|6|5|4|3|2|1|0|9|8|7|6|5|4|3|2|1|0|Z|Z|Z|Z| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Octet 0 of the data frame has value 1 (see table above) indicating the total data frame length (including octet 0) is 4 octets. Bits 19 through 0 from the standard Rate 1/8 frame are placed as indicated with bits marked with "Z" being set to zero. The Rate 1, 1/4 and 1/2 standard frames are converted similarly. section 3, more than one CODEC data frame MAY be included in a single RTP packet by a sender. Receivers MUST handle bundles of up to 10 CODEC data frames in a single RTP packet. Furthermore, senders have the following additional restrictions: o MUST not bundle more CODEC data frames in a single RTP packet than will fit in the MTU of the RTP transport protocol. For the purpose of computing the maximum bundling value, all CODEC data frames should be assumed to have the Rate 1 size. o MUST never bundle more than 10 CODEC data frames in a single RTP packet. o Once beginning transmission with a given SSRC and given bundling value, MUST NOT increase the bundling value. If the bundling value needs to be increased, a new SSRC number MUST be used. o MAY decrease the bundling value only between interleave groups (see section 3.4). If the bundling value is decreased, it MUST NOT be increased (even to the original value), although it may be decreased again at a later time.
Additionally, senders have the following restrictions: o Once beginning transmission with a given SSRC and given interleave value, MUST NOT increase the interleave value. If the interleave value needs to be increased, a new SSRC number MUST be used. o MAY decrease the interleave value only between interleave groups. If the interleave value is decreased, it MUST NOT be increased (even to the original value), although it may be decreased again at a later time.
Second L+1 frames: Frame 1 from packet 0 of interleave group Frame 1 from packet 1 of interleave group And so on up to... Frame 1 from packet L of interleave group And so on up to... Bth L+1 frames: Frame B from packet 0 of interleave group Frame B from packet 1 of interleave group And so on up to... Frame B from packet L of interleave group section 4, an erasure frame will be sent to the Qcelp CODEC. Now, assume that packet n of the interleave group arrives before frame x+1 of that packet is needed. Receivers SHOULD use frame x+1 of the newly received packet n rather than substituting an erasure frame. In other words, just because packet n wasn't available the first time it was needed to reconstruct the interleaved audio, the receiver SHOULD NOT assume it's not available when it's subsequently needed for interleaved audio reconstruction.
Specifically when reconstructing interleaved audio, a missing RTP packet in the interleave group should be treated as containing B erasure CODEC data frames where B is the bundling value for that interleave group. sections 3.3 and 3.4 guarantee that after receipt of the first payload packet from the sender, the receiver can allocate a well-known amount of buffer space that will be sufficient for all future reception from the same SSRC value. Less buffer space may be required at some point in the future if the sender decreases the bundling value or interleave, but never more buffer space. This prevents the possibility of the receiver needing to allocate more buffer space (with the possible result that none is available) should the bundling value or interleave value be increased by the sender. Also, were the interleave or bundling value to increase, the receiver could be forced to pause playback while it receives the additional packets necessary for playback at an increased bundling value or increased interleave. 2], and any appropriate profile (for example ). This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed after compression so there is no conflict between the two operations.
A potential denial-of-service threat exists for data encodings using compression techniques that have non-uniform receiver-end computational load. The attacker can inject pathological datagrams into the stream which are complex to decode and cause the receiver to be overloaded. However, this encoding does not exhibit any significant non-uniformity. As with any IP-based protocol, in some circumstances, a receiver may be overloaded simply by the receipt of too many packets, either desired or undesired. Network-layer authentication may be used to discard packets from undesired sources, but the processing cost of the authentication itself may be too high. In a multicast environment, pruning of specific sources may be implemented in future versions of IGMP  and in multicast routing protocols to allow a receiver to select which sources are allowed to reach it.  TIA/EIA/IS-733. TR45: High Rate Speech Service Option for Wideband Spread Spectrum Communications Systems. Available from Global Engineering +1 800 854 7179 or +1 303 792 2181. May also be ordered online at http://www.eia.org/eng/.  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996.  Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.  Schulzrinne, H., "RTP Profile for Audio and Video Conferences with Minimal Control", RFC 1890, January 1996.  Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC 1112, August 1989.
Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.