Content for TS 26.071 Word version: 17.0.0
The present document is an introduction to the speech processing parts of the narrowband telephony speech service employing the Adaptive Multi-Rate (AMR) speech coder. A general overview of the speech processing functions is given, with reference to the documents where each function is specified in detail.
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
References are either specific (identified by date of publication, edition number, version number, etc.) or non specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
GSM 03.50: "Digital cellular telecommunications system (Phase 2); Transmission planning aspects of the speech service in the GSM Public Land Mobile Network (PLMN) system".
: "Adaptive Multi-Rate (AMR); ANSI C source code".
: "Adaptive Multi-Rate (AMR); Test sequences".
: "Source Controlled Rate operation".
: "AMR Speech Codec; Voice Activity Detector".
: "Mandatory Speech Codec speech processing functions; AMR Speech Codec; Comfort noise aspects".
: "Mandatory Speech Codec speech processing functions; AMR Speech Codec; Error concealment of lost frames".
: "AMR Speech Codec; Interface to Iu snd Uu".
: "AMR wideband speech codec feasibility study report ".
ITU-T Recommendation G.711: "Pulse code modulation (PCM) of voice frequencies".
ITU-T Recommendation H.324: "Terminal for low bit-rate multimedia communication".
For the purposes of this TS, the following abbreviations apply:
Algebraic Code Excited Linear Prediction
Bad Frame Indication
Global System for Mobile communications
International Telecommunication Union - Telecommunication standardisation sector (former CCITT)
Pulse Code Modulation
Public Land Mobile Network
Public Switched Telephone Network
Source Controlled Rate
User Equipment (terminal)
The AMR speech coder consists of the multi-rate speech coder, a source controlled rate scheme including a voice activity detector and a comfort noise generation system, and an error concealment mechansim to combat the effects of transmission errors and lost packets.
The multi-rate speech coder is a single integrated speech codec with eight source rates from 4.75 kbit/s to 12.2 kbit/s, and a low rate background noise encoding mode. The speech coder is capable of switching its bit-rate every 20 ms speech frame upon command.
A reference configuration where the various speech processing functions are identified is given in Figure 1
. In this Figure, the relevant specifications for each function are also indicated.
In Figure 1
, the audio parts including analogue to digital and digital to analogue conversion are included, to show the complete speech path between the audio input/output in the User Equipment (UE) and the digital interface of the network. The detailed specification of the audio parts is not within the scope of the present document. These aspects are only considered to the extent that the performance of the audio parts affect the performance of the speech transcoder.
8 bit A law or μ-law PCM (ITU-T Recommendation G.711 ), 8 000 samples/s;
13 bit uniform PCM, 8 000 samples/s;
Voice Activity Detector (VAD) flag;
Encoded speech frame, 50 frames/s, number of bits/frame depending on the AMR codec mode;
SIlence Descriptor (SID) frame;
TX_TYPE, 2 bits, indicates whether information bits are available and if they are speech or SID information;
Information bits delivered to the 3G AN;
Information bits received from the 3G AN;
RX_TYPE, the type of frame received quantized into three bits.
The adaptive multi-rate speech codec is described in TS 26.090
. The technical content is identical to that of TS 26.090
As shown in Figure 1
, the speech encoder takes its input as a 13 bit uniform Pulse Code Modulated (PCM) signal either from the audio part of the UE or on the network side, from the Public Switched Telephone Network (PSTN) via an 8-bit A-law or μ-law to 13-bit uniform PCM conversion. The encoded speech at the output of the speech encoder is packetized and delivered to the network interface. In the receive direction, the inverse operations take place.
The detailed mapping between input blocks of 160 speech samples in 13 bit uniform PCM format to encoded blocks (in which the number of bits depends on the presently used codec mode) and from these to output blocks of 160 reconstructed speech samples is described in TS 26.090
. The coding scheme is Multi-Rate Algebraic Code Excited Linear Prediction. The bit-rates of the source codec are listed in Table 1
An AMR speech codec capable UE shall support all source rates listed in Table 1
||Source codec bit-rate
|AMR_12.20||12,20 kbit/s (GSM EFR)|
|AMR_7.40||7,40 kbit/s (IS-641)|
|AMR_6.70||6,70 kbit/s (PDC-EFR)|
|AMR_SID||1,80 kbit/s (see note 1)|
The ANSI-C code of the speech codec, VAD and CNG system are described in TS 26.073
. The ANSI C-code is mandatory. The ANSI C-code is identical to that of TS 26.073
A set of digital test sequences is specified in TS 26.074
, thus enabling the verification of compliance, i.e. bit-exactness, to a high degree of confidence. The test vectors are identical to those of TS 26.074
The test sequences are defined separately for:
The adaptive multi-rate speech transcoder, VAD, SCR system and comfort noise parts of the audio processing functions (see Figure 1
) are defined in bit exact arithmetic. Consequently, they shall react on a given input sequence always with the corresponding bit exact output sequence, provided that the internal state variables are also always exactly in the same state at the beginning of the test.
The input test sequences provided shall force the corresponding output test sequences, provided that the tested modules are in their home state when starting.
The modules may be set into their home states by provoking the appropriate homing functions.
Special inband signalling frames (encoder homing frame and decoder homing frame) described in TS 26.090
have been defined to provoke these homing functions also in remotely placed modules.
At the end of the first received homing frame, the audio functions that are defined in a bit exact way shall go into their predefined home states. The output corresponding to the first homing frame is dependent on the codec state when the frame was received. Any consecutive homing frames shall produce corresponding homing frames at the output.
The source controlled rate operation of the adaptive multi-rate speech codec is defined in TS 26.093
During a normal telephone conversation, the participants alternate so that, on the average, each direction of transmission is occupied about 50 % of the time. Source controlled rate (SCR) is a mode of operation where the speech encoder encodes speech frames containing only background noise with a lower bit-rate than normally used for encoding speech. A network may adapt its transmission scheme to take advantage of the varying bit-rate. This may be done for the following two purposes:
In the UE, battery life will be prolonged or a smaller battery could be used for a given operational duration.
The average required bit-rate is reduced, leading to a more efficient transmission with decreased load and hence increased capacity.
The following functions are required for the source controlled rate operation:
a Voice Activity Detector (VAD) on the TX side;
evaluation of the background acoustic noise on the TX side, in order to transmit characteristic parameters to the RX side;
generation of comfort noise on the RX side during periods when no normal speech frames are received.
The transmission of comfort noise information to the RX side is achieved by means of a Silence Descriptor (SID) frame, which is sent at regular intervals.
The adaptive multi-rate VAD function is described in TS 26.094
The input to the VAD is the input speech itself together with a set of parameters computed by the adaptive multi-rate speech encoder. The VAD uses this information to decide whether each 20 ms speech coder frame contains speech or not.
The VAD algorithm is described in TS 26.094
, and the corresponding C code is defined in TS 26.073
. The verification of compliance to TS 26.094
. is achieved by use of digital test sequences applied to the same interface as the test sequences for the speech codec.
The adaptive multi-rate comfort noise insertion function is described in TS 26.092
When speech is absent, the synthesis in the speech decoder is different from the case when normal speech frames are received. The synthesis of an artificial noise based on the received non-speech parameters is termed comfort noise generation.
The comfort noise generation process is as follows:
the evaluation of the acoustic background noise in the transmitter;
the noise parameter encoding (SID frames) and decoding, and
the generation of comfort noise in the receiver.
The comfort noise processes and the algorithm for updating the noise parameters during speech pauses are defined in detail in TS 26.092
, and the corresponding C code is defined in TS 26.073
. The comfort noise mechanism is based on the adaptive multi-rate speech codec defined in TS 26.090
The adaptive multi-rate speech codec error concealment of lost frames is described in TS 26.091
Frames may be lost due to transmission errors or frame stealing in a wireless environment. Actions which shall be taken in these cases, both for lost speech frames and for lost SID frames are described in TS 26.091
. Error concealment actions shall be used also in the case of lost speech packets in the transport network. The methods described in TS 26.091
may with some modifications be used as a basis for such actions.
In order to mask the effect of isolated lost frames, the speech decoder shall be informed and the error concealment actions shall be initiated, whereby a set of predicted parameters are used in the speech synthesis. Insertion of speech signal independent silence frames is not allowed. For several subsequent lost frames, a muting technique shall be used to indicate to the listener that transmission has been interrupted.
The adaptive multi-rate speech frame structure is described in TS 26.101
. The output interface format from the encoder and input interface format to the decoder is divided into two parts; the core speech data part, which is the speech coded bits, and the other part is an additional data part with mode information.
The interface format described in TS 26.101
is termed AMR interface format 1 (AMR IF1).
of TS 26.101
describes an octet aligned frame format which shall be used in applications requiring octet alignment, such as for ITU-T Recommendation H.324 
. This format is termed AMR interface format 2 (AMR IF2).
The adaptive multi-rate speech service interface to RAN is described in TS 26.102
The adaptive multi-rate speech channel performance characterisation is described in