TS 26.110
Codec for Circuit Switched Multimedia Telephony Service –
General Description

3GPP‑Page ETSI‑search CONTENT_↓

V18.0.0 (PDF) 2024/03 17 p.

V17.0.0 2022/03 17 p.

V16.0.0 2020/06 17 p.

V15.0.0 2018/06 17 p.

V14.0.0 2017/03 17 p.

V13.0.0 2015/12 17 p.

V12.0.0 2014/09 17 p.

V11.0.0 2012/09 17 p.

V10.0.0 2011/04 17 p.

V9.0.0 2009/12 16 p.

V8.0.0 2008/12 17 p.

V7.0.0 2007/06 17 p.

V6.0.0 2005/01 17 p.

V5.0.0 2002/06 17 p.

V4.1.0 2001/04 16 p.

V3.1.0 2001/07 17 p.

Rapporteur:: Dr. Jung, Kyunghun
Samsung Electronics Co., Ltd

Content for TS 26.110 Word version: 18.0.0

0 Introduction p. 4

This document contains a specification for H.324 based multimedia codecs for circuit switched 3GPP networks. The term codec is usually associated with a single media type. However, many multimedia services require a close integration of disparate media types. In this sense, the representations of these media types (in the form of media streams) are at least logically bound into a single multimedia stream. As such, a H.324 based multimedia codec must handle multiplexing/de-multiplexing and skew. It will also have to provide codecs for each of the derived media streams. End-to-end, in-band control is also required for the purposes of configuration and establishing individual media streams. Finally, since 3GPP networks are inherently error prone, error detection and/or correction must also be provided by the multimedia codec since it has a comprehensive view of the bit stream it produces and therefore can apply the most efficient form of error detection and/or correction.

1 Scope p. 5

This specification introduces the set of specifications which apply to 3G-324M multimedia terminals.

2 References p. 5

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.

[1]

ITU-T Recommendation H.223: "Multiplexing protocol for low bitrate multimedia communication"

[2]

ITU-T Recommendation H.223 - Annex A: "Multiplexing protocol for low bitrate multimedia communication over low error-prone channels"

[3]

ITU-T Recommendation H.223 - Annex B: "Multiplexing protocol for low bitrate multimedia communication over moderate error-prone channels"

[4]

ITU-T Recommendation H.223 - Annex C: "Multiplexing protocol for low bitrate multimedia communication over highly error-prone channels"

[5]

ITU-T Recommendation H.223 - Annex D: "Optional multiplexing protocol for low bitrate multimedia communication over highly error-prone channels"

[6]

ITU-T Recommendation H.245: "Control protocol for multimedia communication"

[7]

ITU-T Recommendation G.723.1: "Dual rate speech coder for multimedia communication transmitting at 5.3 & 6.3 kbit/s"

[8]

ITU-T Recommendation H.263: "Video coding for low bitrate communication"

[9]

ITU-T Recommendation H.261: "Video CODEC for audiovisual services at p X 64 kbit/s"

[10]

ITU-T Recommendation H.324: "Terminal for low bitrate multimedia communication"

[11]

TS 26.111: "Modifications to H.324"

[12]

TR 26.911: "Terminal Implementor's Guide"

[13]

ITU-T Recommendation X.691: "Information Technology - ASN.1 Encoding Rules - Specification of Packed Encoding Rules (PER)"

[14]

International Standard ISO/IEC 14494-2: "Information technology - Generic coding of audio-visual object - Part 2: Visual, 1999"

[15]

TS 26.071: "Mandatory Speech Codec; General Description"

[16]

TS 26.090: "Mandatory Speech Codec; Speech Transcoding Functions"

[17]

TS 26.073: "Mandatory Speech Codec; ANSI C-Code"

[18]

ITU-T T.140 (1998): Presentation protocol for text conversation application.

[19]

TS 22.226: "Global Text Telephony; Stage 1"

[20]

TS 24.008: "Mobile Radio Interface - Layer 3 MM/CC Specification".

[21]

TS 27.001: "General on Terminal Adaptation Functions (TAF) for Mobile Stations (MS)".

[22]

TS 29.007: "General requirements on interworking between the Public Land Mobile Network (PLMN) and the Integrated Services Digital Network (ISDN) or Public Switched Telephone Network (PSTN)".

[23]

TS 23.108: "Mobile radio interface layer 3 specification; Core Network protocols; stage 2".

[24]

ITU-T Recommendation G.712: "Transmission performance characteristics of pulse code modulation channels".

[25]

ITU-T Recommendation T.120: "Data protocols for multimedia conferencing".

3 Definitions and abbreviations p. 6

3.1 Definitions p. 6

For the purposes of the present document, the following terms and definitions apply.

H.324 terminal:

ITU-T H.324 recommendation, including Annex C

3G-324M terminal:

Based on ITU-T H.324 recommendation modified by 3GPP for purposes of 3GPP circuit switched network based video telephony

3.2 Abbreviations p. 6

For the purposes of the present document, the following abbreviations apply:

ACELP

Algebraic-Code-Excited Linear-Prediction

ADC

Analogue Digital Converter

AEC

Acoustic Echo Cancellation

Adaptation Layer

CCSRL

Control Channel Segmentation and Reassembly Layer

CELP

Code-Excited Linear-Prediction

Correlation Threshold

DAC

Digital Analogue Converter

DCT

Discrete Cosine Transformation

Error Indication

EOB

End Of Block

FEC

Forward Error Correction

GOB

Group Of Blocks

GQUANT

Group Quantizer information

GTT

Global Text Telephony

HDLC

High-Level Data Link Control

HEC

Header Error Control

ISDN

Integrated Services Digital Network

LAPM

Link Access Procedure for Modems

Logical Channel

Multiplex Code

MCU

Multipoint Communication Unit

MP-MLQ

Multipulse Maximum Likelihood Quantization

MPL

Multiplex Payload Length

MR-ACELP

Multi-rate ACELP

Personal Computer

MCU

Multipoint Conference Unit

MUX

H.223 Multiplex layer

PDU

Protocol Data Unit

Sequence Number

VLC

Variable Length Code

4 General p. 7

3G-324M terminals provide real-time video, audio, or data, in any combination, including none, over 3GPP circuit-switched, radio networks. They are based on ITU-T H.324 with Annex C, and Annex H when mobile multilink operation is supported. Communication may be either 1-way or 2-way. Such terminals may be part of a portable device or integrated into an automobile or other non fixed location device. They may also be fixed, stand-alone devices; for example, a video telephone or kiosk. 3G-324M terminals may also be integrated into PCs and workstations.

In addition to 3G-324M to 3G-324M communication, interoperation with other types of multimedia telephone terminals is possible, however a gateway may be required.

Multipoint communication between more than two 3G-324M terminals is possible using a Multipoint Communication Unit (MCU). MCU functionality is for further study.

3G-324M terminals are based on ITU-T H.324 with Annex C, and Annex H when mobile multilink operation is supported. For performance reasons and to reference the call set-up procedures, some modifications to H.324 were made. These are described in TS 26.111, except call set-up procedures are described in TS 24.008, TS 27.001, TS 29.007 and TS 23.108. 3G-324M terminals shall conform to these specifications. Because of the many options in H.324, an implementor's guide, TR 26.911, provides preferred options for 3G-324M implementations.

Figure 1 below shows the functional components of a generic 3GPP multimedia terminal. The video, speech, data and multilink components are optional. If a media type is supported, the standards indicated are mandatory except those enclosed in square brackets are optional.

Copy of original 3GPP image for 3GPP TS 26.110, Fig. 1: Scope of circuit switched multimedia 3GPP specification. Items in [brackets] are optional.

Figure 1: Scope of circuit switched multimedia 3GPP specification. Items in [brackets] are optional.
(⇒ copy of original 3GPP image)

Short descriptions of ITU-T H.324, TS 26.111, and TR 26.911 are given below.

5 ITU-T H.324 p. 8

ITU-T H.324 describes terminals for low bitrate multimedia communication. That ITU-T recommendation contains "ANNEX C, Multimedia Telephone Terminals Over Error Prone Channels" (sometimes referred to as H.324/M) and "ANNEX H, Mobile Multilink Operation". These annexes are considered an integral part of the recommendation. Therefore, herewith H.324 shall mean ITU-T H.324 with Annex C. When multilink operation is utilized, H.324 shall also mean to include H.324 Annex H.

Originally designed for V.34 modems, H.324 now supports ISDN and wireless networks. Therefore, it is well suited as a basis for 3GPP multimedia codecs. Relevant to wireless networks, H.324 describes the overall system architecture and introduces control (H.245), mux (H.223), video (H.261 and H.263), text (T.140), and audio (G.723.1).

Annex A provides a short overview of H.324 and multimedia codecs.

6 Modifications to H.324 (3GPP TS 26.111) p. 8

To enable cost-effective, high-quality H.324 terminals for 3GPP networks, some modifications were made to H.324. These modifications are described in TS 26.111. Terminals adhering to this specification are herewith known as 3G-324M terminals. 3G-324M terminals shall conform to TS 26.111.

7 Call set-up requirements p. 8

H.324 does not describe call set-up procedures for 3GPP networks. These are described in TS 24.008, TS 27.001, TS 29.007, TS 23.108 and shall be used for 3G-324M terminals.

8 Terminal implementor's guide (3GPP TR 26.911) p. 8

A successful 3G-324M terminal will have to function well at bandwidths as low as 32 KBPS and in potentially high error rate environments. 3G-324M contains many options that may be employed by an implementor. To help choose which options and combinations of options are useful, an implementor's guide is provided in TR 26.911.

A Background information p. 9

The section is intended for informational purposes only. This is not an integral part of this specification. Each section below relates to the functional components in Figure 1.

A.1 Video I/O Equipment p. 9

For a video telephone this would most likely consist of a video camera and display monitor. Other possible input sources could be a VCR or disk drive. While most applicable I/O equipment relies on a standard format for the video signal or bit stream, this format is likely to differ from that mandated by the video codec. In such cases, circuitry or software is used to transcode between the two formats.

A.2 Video Codec p. 9

ITU-R 601 (NTSC or PAL) is a typical video input signal and represents a bit stream of 20.7 Mbyte/s for the actual image (excluding blanking intervals). The first order of compression occurs by reducing the resolution of the input signal. For example, CIF resolution at 30 fps produces a bit stream 4.6 Mbyte/s. Additional savings occur by dropping frames. In a videoconference, where motion is relatively slow, 10 fps is considered adequate. Thus, the original signal of 20.7 Mbytes/s could be reduced to 1.5 Mbyte/s with just these techniques. However, this is still 188 times greater than can be transmitted on, for example, a 64 KBPS channel. Substantial compression is still required, especially considering that framing, control, and audio would as well require a portion of the available bandwidth.

To achieve the degree of compression required for video telephony, all of the video codecs that can currently be employed in a 3GPP multimedia codec use a combination of spatial and temporal redundancy reduction to reduce the bandwidth required by the video media stream. Spatial redundancy can be reduced by converting the input signal from the time domain to the frequency domain using a DCT. This produces a DC value and other coefficients, where most of the scene energy is concentrated in the coefficients corresponding to the lower frequencies. Next, a coarse quantizer is applied (which, in this domain, has little effect on image quality). This results in many of the coefficients being encoded to 0. The significant coefficients are encoded to a much smaller range of values. The coefficients are then reordered so that, typically, the larger magnitude values will occur first followed by 0 value coefficients. Finally, the coefficients are replaced with a count of the number of zero value coefficients followed by the value of a nonzero coefficient. This combination is translated into a VLC. Applying this type of compression to the entire video frame produces an intra frame.

Despite the efficiency of intra coding, significantly more compression is required. In addition to removing spatial redundancy, all video codecs apply temporal reduction as well. This is achieved by comparing the current frame to the previous and estimating the set of vectors which when applied to their respective areas of the scene would create the new, current frame based on the old, previous frame. The match is usually not perfect, so an error component is transmitted as well. The error component is also transformed to the frequency domain, so the same compression efficiency achieved in the intra frame is achieved here as well - enhanced by the fact the range of error coefficients is less than intra coefficients. Since generally only a few areas of a scene change from frame to frame, high compression can be achieved by sending a series of inter frames. If the error component for a particular block is too large, it can be encoded as an intra block.

Since, by their nature, VLCs are not fixed length, a single bit error can make it impossible to decode an entire frame. Unfortunately, each inter coded frame relies on its previous frame to be decoded. Thus, a single bit error can destroy the entire remaining bit stream. Video codecs have various ways of handling errors. The simplest is to use error detection to determine if a frame contains an error. The transmitter is then signalled that an error occurred. It then sends an intra coded frame, which does not depend on any previous frames. This approach consumes considerable bandwidth and is only practical for very low error networks. Other, more sophisticated schemes are available using the video codecs available to 3G-324M terminals.

A.2.1 H.261 p. 10

H.261 supports CIF and QCIF images as input. It provides good video quality at 64 kbit/s or higher. It uses BCH codes for Forward Error Correction (FEC). However, this is not recommended for H.324.

A.2.2 H.263 p. 10

H.263 is an extension of H.261. It allows sub-QCIF, 4CIF and 16CIF as additional input formats. H.263, in its original version, provides four annexes that describe optional modes for enhanced coding.

Advanced prediction mode (Annex F) provides half-pel motion estimation, median-based motion vector prediction, 4 motion vectors per macroblock (one per block), and overlapped block motion compensation
Unrestricted motion vectors (Annex D) work in conjunction with advanced prediction mode and allow motion vectors to point outside the picture area
Arithmetic coding (Annex E) can be used instead of variable length coding
PB-frames (Annex G) allow bi-directional prediction similar to MPEG

Other significant differences exist, but require a level of detail to explain that renders them outside the scope of this document.

A second version of H.263 (known as H.263+) adds annexes I through T, some of which address error prone environments and are therefore of special interest to 3GPP multimedia codecs.

A.2.3 MPEG-4 p. 10

MPEG-4 Visual (ISO/IEC 14496-2) is a generic video codec. One of its target areas is mobile communications. Error resiliency and high efficiency make this codec particularly well suited for 3G-324M.

MPEG-4 Visual is organised into Profiles. Within a Profile, various Levels are defined. Profiles define subsets of tool sets. Levels are related to computational complexity. Among these Profiles, Simple Visual Profile provides error resilience (through data partitioning, RVLC, resynchronization marker and header extension code) and low complexity.

MPEG-4 allows various input formats, including general formats such as QCIF and CIF. It is also baseline compatible with H.263.

A.3 Audio I/O Codec p. 10

Generally, a video telephone would require a handset, headset, or microphone and speaker. Often, integrated circuits are employed that convert the typically analogue input signal to a PCM format bit stream (ADC) and convert PCM to an analogue signal for acoustic output (DAC). This is helpful since many speech codecs use PCM for input and output. Video telephones often use a separate microphone and speaker. This allows the user to be seen without a handset or headset. However, if this is so, AEC will be required.

A.4 Speech Codec p. 10

A.4.1 3GPP AMR p. 10

The AMR codec uses eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s. The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8000 sample/s. It performs the mapping from input blocks of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits and from encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits to output blocks of 160 reconstructed speech samples. The coding scheme for the multi-rate coding modes is the so-called Algebraic Code Excited Linear Prediction Coder (ACELP). The multi-rate ACELP coder is referred to as MR-ACELP. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is synthesised by filtering the reconstructed excitation signal through the LP synthesis filter.

The adaptive multi-rate speech codec is described in a bit-exact arithmetic in form of a fixed-point ANSI-C code to allow for easy type approval as well as general testing purposes of the adaptive multi-rate speech codec.

The DTX mechanism includes a Voice Activity Detector (VAD) on the TX side; evaluation of the background acoustic noise on the TX side, in order to transmit characteristic parameters to the RX side; and generation of comfort noise on the RX side during periods where the radio transmission is turned off.

The AMR specification contains error concealment. The purpose of frame substitution is to conceal the effect of lost AMR speech frames. The purpose of muting the output in the case of several lost frames is to indicate the breakdown of the channel to the user and to avoid generating possible annoying sounds as a result from the frame substitution procedure.

A.4.2 G.723.1 p. 11

G.723.1 can be used for compressing the speech or other audio signal component of multimedia services at a very low bitrate as part of H.324. This coder has two bit-rates associated with it, 5.3 and 6.3 kbit/s. The higher bitrate has greater quality. The lower bit-rate gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the encoder and decoder. It is possible to switch between the two rates at any frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during non-speech intervals is also possible using a series of silence frames or a single silence frame followed by no frames until speech is detected.

G.723.1 encodes speech or other audio signals in frames using linear predictive analysis-by-synthesis coding. The excitation signal for the high rate coder is Multipulse Maximum Likelihood Quantization (MP-MLQ) and for the low rate coder is Algebraic-Code-Excited Linear-Prediction (ACELP). The frame size is 30 ms and there is an additional look ahead of 7.5 msec,. This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (ITU-T Recommendation G.712) of the analogue input, then sampling at 8000 Hz and then converting to 16-bit linear PCM for the input to the encoder. The output of the decoder is converted back to analogue by similar means.

G.723.1 has been designed to be robust for indicated frame erasures. An error concealment strategy for frame erasures has been included in the decoder. However, this strategy must be triggered by an external indication that the bit stream for the current frame has been erased. This can be achieved in H.324 using the AL2 Error Indication (EI) flag and the optional AL2 Sequence Number (SN). Because the coder was designed for burst errors, there is no error correction mechanism provided for random bit errors. If a frame erasure has occurred, the decoder switches from regular decoding to frame erasure concealment mode.

G.723.1 contains three annexes. Annex A describes the silence compression system designed for the G.723.1 speech coder (mentioned above). Annex B describes an alternative implementation of G.723.1 contained in floating point C source code. Annex C specifies a channel coding scheme which can be used with the triple rate speech codec G.723.1. The channel codec is scalable in bit-rate and is designed for mobile multimedia applications as a part of the overall H.324 family of standards.

A.5 User Data Applications p. 11

A.5.1 Data conferencing - T.120 p. 11

An example of a User Data Application is T.120. This protocol allows multipoint data conferencing that includes data and image transferral. Other functions, such as shared whiteboards and applications, are possible.

A.5.2 Text conversation - T.140 p. 12

The real time text conversation application, is supported by the presentation protocol ITU-T T.140 [19]. The Global Text Telephony feature is implemented in the CS Multimedia environment by applying T.140, as specified in H.324. The text stream may be opened simultaneously with voice, video and other data applications. Text-only sessions are also possible. Further requirements applicable to the Global Text Telephony feature are specified in TS 22.226.

The data protocol for T.140 is specified in H.324 to be AL1.

A.6 Data Protocols p. 12

Various data protocols can be supported. These always support data applications (see A.5 User Data Applications). A specific protocol or set of protocols is often stipulated by the data application. Each protocol provides varying degrees of error detection and/or correction.

A.7 System Control p. 12

In general, system control constitutes the overall state machine for the terminal. It usually has to be aware of when a connection has been established. At that point it can begin H.245 procedures such as master/slave determination, capabilities exchange, and opening logical channels. Upon call termination, either initiated at the near or far ends, system control generally initiates H.245 end session procedures.

A.8 Call Set-up p. 12

All out-of-band network signalling for the purpose of call control is handled by call set-up, which is usually implemented as a state machine. This includes initiating, answering, and tearing down calls.

A.9 H.245 p. 12

H.245 specifies the syntax and semantics for in-band, terminal-to-terminal control messages and the procedures for their use. Most importantly, H.245 is used for master/slave determination, capabilities exchange, H.223 mux table transmission, and opening and closing logical channels. There is also a large array of general control and indication messages. H.245 addresses a wide range of terminals and applications. Therefore, only a subset of the messages listed in H.245 pertain to 3G-324M terminals. Messages fall into one of four categories: Request (requires a Response), Response (in response to a Request), Command (requires an action), and Indication (informative only).

H.245 messages are carried on a single logical channel within the H.223 mux. This channel is labelled LC 0 and is considered to be open upon establishing digital communications end-to-end and survives until digital communication is terminated. Due to the characteristics of the H.223 mux, bandwidth for H.245 messages is allocated on an as-needed basis. Since most H.245 traffic occurs and the beginning and end of the session, this conserves much needed bandwidth for video and audio. Error control is not provided for within H.245 and is specified elsewhere.

A.10 H.223 p. 12

H.223 describes the multiplexing protocol used between H.324 terminals. It is packet oriented and each packet can contain a subset of a maximum of 65536 LCs. Each LC represents a single media, information, or control channel. The H.223 protocol is split into two layers, the lowest being the Multiplex Layer.

The Multiplex Layer exchanges data with the end terminal via MUX-PDUs. Multiplex table entries, of which there are 16 (and can be changed during a session), describe which octets from within the PDU are allocated to which logical channels. The multiplex table entry employed for a particular PDU is indicated by the 4 bit MC field in the MUX-PDU header. MUX-PDUs contain an integer number of octets. Errors within the MUX-PDU header are controlled using the HEC field in the MUX-PDU header. H.324 terminals utilising the V.34 transmission protocol frame MUX-PDUs with HDLC. Bit stuffing is used for data transparency in this case.

Above the Multiplex Layer is the Adaptation Layer, of which there are three different types.

AL1 is designed primarily for control information and data protocols. It can be either framed or unframed and does not provide any error control.
AL2 is designed primarily for the transfer of digital audio. AL2 PDUs contain 1 octet for an 8-bit CRC and an optional octet for a sequence number.
AL3 is designed primarily for the transfer of digital video. AL3 PDUs contain 2 octets for a 16-bit CRC. There is also optionally 1 or 2 octets for control. AL3 also allows limited retransmission.

For purposes of video telecommunications over wireless networks, four annexes to H.223 have been created. These create four levels of error detection and error correction.

A.10.1 Level 0 p. 13

Level 0 applies to H.223 as described above.

A.10.2 Level 1 p. 13

Level 1, described in Annex A, replaces HDLC framing with 1 or 2 16 bit flags. Unlike HDLC, Level 1 does not guarantee data transparency. However, if the MUX-PDU header is constructed in such a way as to make emulating the Level 1 framing flags impossible, data transparency can be achieved by correctly decoding the MUX-PDU. Should there be and error in the MUX-PDU header, resynchronization techniques will have to be applied.

A.10.3 Level 2 p. 13

Level 2, described in Annex B, uses the same framing as Level 1, but utilises a 3 octet header. This header starts with a 4 bit MC, which is the same as in Level 0. This is followed by an 8-bit MPL-field, with a range of values 0 - 254. Lastly, a 12 bit extended Golay code is used for parity bits. The PM in Level 2 is signalled through the polarity of the MUX-PDU flag. If the output of the correlator is greater than or equal to CT, the PM is 0. If it is less than or equal to -CT, the PM equals 1. After the parity bits, there can be an optional MUX-PDU header for the previous (corrupted) MUX-PDU. This 1 octet field uses the format described in Level 0. Level 2 also offers enhanced packet resynchronization.

A.10.4 Level 3 p. 13

Level 3, described in Annexes C and D, provides error correction capabilities at the mux level.

B Bibliography p. 13

(void)