TS 26.254
Codec for Immersive Voice and Audio Services (IVAS)
Rendering

3GPP‑Page ETSI‑search CONTENT_↓

V18.1.0 (PDF) 2024/06 … p.

Rapporteur:: Dr. Szczerba, Marek
Philips International B.V.

Content for TS 26.254 Word version: 18.0.0

1 Scope p. 6

The present document provides a comprehensive description of the rendering functions of the decoder/renderer for Immersive Voice and Audio Services (IVAS codec).

2 References p. 6

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.

[1]

TR 21.905: "Vocabulary for 3GPP Specifications".

[2]

TS 26.250: "Codec for Immersive Voice and Audio Services (IVAS); General overview".
→ to date, still a draft

[3]

TS 26.251: "Codec for Immersive Voice and Audio Services (IVAS); C code (fixed-point)".
→ to date, still a draft

[4]

TS 26.253: "Codec for Immersive Voice and Audio Services (IVAS); Detailed Algorithmic Description incl. RTP payload format and SDP parameter definitions".

[5]

TS 26.258: "Codec for Immersive Voice and Audio Services (IVAS); C code (floating point)".

3 Definitions of terms, symbols and abbreviations p. 6

3.1 Terms p. 6

For the purposes of the present document, the terms given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.

rendering:

a process of generating digital audio output from the decoded digital audio signal.

3.2 Symbols p. 6

Void.

3.3 Abbreviations p. 6

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.

BRIR

Binaural Room Impulse Response

CPE

Channel Pair Element

EVS

Enhanced Voice Services

HRIR

Head-Related Impulse Response

HRTF

Head-Related Transfer Function

ISM

Individual Stream with Metadata

IVAS

Immersive Voice and Audio Services

MASA

Metadata Assisted Spatial Audio

MCT

Multi-channel Coding Tool

RTP

Real-Time Protocol

SCE

Single Channel Element

User Equipment

4 General p. 7

4.1 IVAS receiver side processing p. 7

The codec for Immersive Voice and Audio Services is part of a framework comprising of an encoder, decoder, and renderer. An overview of the audio processing functions of the receive side of the codec is shown in Figure 4.1-1. This diagram is based on [2], with rendering features highlighted.

Copy of original 3GPP image for 3GPP TS 26.254, Fig. 4.1-1: Overview of IVAS audio processing functions - receiver side

Figure 4.1-1: Overview of IVAS audio processing functions - receiver side
(⇒ copy of original 3GPP image)

Interfaces:

Encoded audio frames (50 frames/s), number of bits depending on IVAS codec mode

Encoded Silence Insertion Descriptor (SID) frames

RTP Payload packets

Lost Frame Indicator (BFI)

Renderer config data

Head-tracker pose information and scene orientation control data

Audio output channels (16-bit linear PCM, sampled at 8 (only EVS), 16, 32, or 48 kHz)

10:

Metadata associated with output audio

Please note that the interface numbering is consistent with IVAS General Overview [2].

4.2 IVAS rendering p. 7

Rendering is the process of generating digital audio output from the decoded digital audio signal. Rendering is used when output format is different than input format. In case output format is the same as input format, the decoded audio channels are simply passed through to the output channels. Binaural rendering is a special case, where binaural output channels are prepared for headphone reproduction. This process includes head-tracking and scene orientation control, head-related transfer function processing, and room acoustic synthesis. Rendering for loudspeaker reproduction is also supported for preset or custom loudspeaker configurations.

IVAS rendering is available as an integral component of the IVAS decoder (internal renderer) or can be operated standalone as external rendering. The external renderer can be applied e.g., in the case of rendering outputs originating from multiple sources, such as decoders or audio streams.

IVAS rendering features reflect related design constraints, including:

support for provisioning of HRIR/BRIR filter sets as control data for binaural rendering. The format of HRIR/BRIR data is provided in clause 5.10 of TS 26.258,
support for default HRIR/BRIR sets for binaural rendering,
support for head-tracking data as control data for binaural audio rendering in quaternions and in Euler notation. The format of head-tracking data is provided in clause 5.11 of TS 26.258,
support for binaural reverb and early reflections controlled by reverb parameters, the format of reverb parameters is provided in clause 5.14.1, and in Annex B of TS 26.258.

This document provides a high-level specification of the internal (clause 5) and external renderer (clause 6). Furthermore, the rendering library interface is provided (clause 6). Specific rendering algorithms and processing paths are out of scope of this specification and are provided in TS 26.253.

5 Internal renderer p. 8

5.1 Overview p. 8

The internal IVAS renderer is integrated into the IVAS decoder. In case of specific operating points, this integration allows for combining decoding and rendering processes, resulting in efficient processing. The internal renderer supports rendering for loudspeaker and headphone reproduction. In the case of loudspeaker rendering, the audio output is mapped to the loudspeaker positions of the loudspeaker setup. In the case of headphone reproduction, binaural rendering is applied. The following binaural output modes are supported:

Binaural output without room acoustic synthesis (no room), command line option BINAURAL,
Binaural output with room acoustics synthesized using impulse responses (room with IR), command line option BINAURAL_ROOM_IR,
Binaural output with room acoustics synthesized using parametric reverb, with or without early-reflections (room with reverb), command line option BINAURAL_ROOM_REVERB.

There are four binaural renderer implementations available in the IVAS codebase: parametric binaural renderer, FastConv renderer, Crend convolution renderer, and time-domain object renderer. The application of these renderers depends on IVAS input format, bitrate, IVAS encoding mode, and binaural rendering output mode. These dependencies are summarized in Table 5.1-1.

Table 5.1-1: Input format to renderer mapping

IVAS Input Format	Bitrate Range [kbps]	IVAS Mode (if applicable)	Binaural rendering output mode (if applicable)	Renderer Used
SBA	13.2 - 80	-	-	Parametric Binaural Renderer
SBA	96 - 512	-	-	FastConv Binaural Renderer
MASA	13.2 - 512	-	-	Parametric Binaural Renderer
ISM (3 or 4 objects)	24.4 - 32	ParamISM	-	Parametric Binaural Renderer
ISM	13.2 - 512	DiscISM	No room or room with reverb	Time Domain Object Renderer
ISM	13.2 - 512	DiscISM	Room with IR	Crend Binaural Renderer
MC	See Table 5.1-2	McMASA	-	Parametric Binaural Renderer
MC	See Table 5.1-2	ParamMC	-	FastConv Binaural Renderer
MC	See Table 5.1-2	ParamUpmix	-	FastConv Binaural Renderer
MC	See Table 5.1-2	DiscMC	All except below	Crend Binaural Renderer
MC Planar Layouts (5.1 and 7.1)	See Table 5.1-2	DiscMC	Head tracking enabled for either no room or room with reverb	Time Domain Object Renderer
OMASA	See text below	-	-	Same as non-combined format
OSBA	See text below	-	-	Same as non-combined format

The IVAS modes applicable for multi-channel input formats are summarized in Table 5.1-2. More details regarding multi-channel operation are provided in clause 5.7 of TS 26.253.

Table 5.1-2: Multi-channel format and bitrate mapping to IVAS coding modes

Bitrate [kbps]	MC layout
Bitrate [kbps]	5.1	7.1	5.1.2	5.1.4	7.1.4
13.2 - 32	McMASA	McMASA	McMASA	McMASA	McMASA
48 - 80	ParamMC	ParamMC	ParamMC	McMASA	McMASA
96	DiscMC	ParamMC	ParamMC	ParamMC	McMASA
128	DiscMC	DiscMC	DiscMC	ParamMC	ParamMC
160	DiscMC	DiscMC	DiscMC	DiscMC	ParamUpmix
192 - 512	DiscMC	DiscMC	DiscMC	DiscMC	DiscMC

For the OMASA and OSBA cases, the IVAS coding modes depend on the number of objects and the total IVAS bitrate. For details refer to clause 5.9.2 and 6.9.7 of TS 26.253 for OMASA, and clause 5.8.1 and 6.8 for OSBA.

The details on binaural rendering algorithms are provided in clause 7.2.2 of TS 26.253.

5.2 Time-Domain Renderer p. 9

The time domain (TD) renderer operates on signals in time domain. In the IVAS internal renderer it is used for binaural rendering of discrete ISM, where each audio signal is encoded and decoded with a dedicated SCE module. This covers all ISM bit rates, except 3-4 objects for bit rates 24.4 kbps and 32 kbps. Further it is used in the decoder for binaural rendering of 5.1 and 7.1 signals when headtracking is enabled. An overview of the TD binaural renderer is found in Figure 7-2.1 in TS 26.253. An HRIR model accepts the object position metadata along with the headtracking data and generates an HRIR filter pair. The ITD may be modelled as a part of the HRIR, or it may be modelled as a separate parameter. In case an ITD parameter is output, the ITD is synthesis is performed in the ITD synthesis stage. The time aligned signals are then convolved with the HRIR filter pair to form a binauralized signal. Details are described in clause 7.2.2.2 in TS 26.253.

5.3 Parametric Binauralizer and Parametric Stereo Renderer p. 9

The parametric binauralizer and stereo renderer operates on the following IVAS formats and operations: MASA, OMASA, multi-channel (in McMASA mode), SBA, OSBA, and ISM, i.e., the input to the encoder has been audio signals (and potentially spatial metadata) in one of these formats, and it is now being rendered to binaural or stereo output. Details are described in clause 7.2.2.3 in TS 26.253.

5.4 Fast Convolution Binaural Renderer p. 10

The fast convolution binaural renderer operates on signals in the CLDFB domain. It is used for binaural rendering for the following IVAS formats and operating points (cf. Table 5.1-1): SBA (96 kbps upwards), OSBA and Multi-channel (ParamMC and ParamUpmix modes). Details are described in clause 7.2.2.4 in TS 26.253.

5.5 Crend Binaural Renderer p. 10

The Crend binaural renderer operates on signals in time domain. In the IVAS decoder, it is used for binaural rendering of multichannel signals, where each audio signal is encoded and decoded using discrete multi-channel mode or for discrete ISM with binaural output with room acoustics synthesized using impulse responses. The convolver uses a zero-delay block DFT implementation. DFT/IDFT is implanted using MDFT/IMDFT allowing buffer size being equal to the decoder frame size. Details are described in clause 7.2.2.5 in TS 26.253.

6 External renderer p. 10

6.1 Overview p. 10

The external IVAS renderer offers a standalone rendering capability employing the same rendering algorithms as the internal IVAS renderer. It is intended to receive the outputs of the IVAS decoder and further render them to other output formats. In addition, the IVAS external renderer is able to receive multiple different input streams that are rendered into a single output format. This provides a mixing functionality to use with multiple IVAS decoder outputs and a pre-renderer functionality for use before IVAS encoding. More details on pre-rendering algorithms are provided in clause 7.5 of TS 26.253.

The external IVAS renderer supports inputs of Ambisonics, ISM, multi-channel, and MASA format streams. The available output formats are binaural (with head-tracking and room effect options), Ambisonics, multi-channel, and MASA format (limited to pre-renderer mixing).

Supported input and output format mapping is provided in Table 6.1-1.

Table 6.1-1: Supported pre-rendering input/output mapping

Input format	Output Format
Input format	Channel based	SBA	MASA	Binaural
Channel based	●	●	●	●
SBA	●	● (mixing)	●	●
MASA	●	●	● (mixing)	●
ISM	●	●	●	●

In the case of rendering to binaural formats, the renderer implementations as discussed in clause 5.1 are used. Similarly to the case of internal rendering, binaural output modes with and without room acoustics are supported.

6.2 Time-Domain Renderer p. 10

In the external renderer the TD renderer is used for all ISM configurations, custom loudspeaker configurations, and multichannel formats 5.1 and 7.1 with headtracking enabled. Details are described in clause 7.2.2.2 in TS 26.253.

6.3 Parametric Binauralizer and Parametric Stereo Renderer p. 10

The parametric binauralizer and stereo renderer is used for the MASA input format in the external renderer. That is, if the output format is any form of binaural output or stereo output, then this renderer is used. Details of the renderer are described in clause 7.2.2.3 in TS 26.253.

6.4 Fast Convolution Binaural Renderer p. 11

Fast convolution rendering is currently not applicable for external rendering.

6.5 Crend Binaural Renderer p. 11

Crend binaural renderer (clause 7.2.2.5 of TS 26.253) is used when input format are SBA and MC except for rendering of MC formats 5.1 and 7.1 with headtracking enabled see clause 6.2.

7 Rendering interface p. 11

7.1 High-level rendering interface description p. 11

IVAS renderer and its interface provide support to IVAS codec design constraints. The rendering modes and rendering control mechanisms are discussed in clause of TS 26.253.

The details of the rendering library API are provided in TS 26.251 [3] for the fixed-point code and TS 26.258 for the floating-point code. The API functions of the IVAS rendering library provide access to the following groups of functionalities:

Initialization,
Configuration (input/output),
Metadata (input/output),
Audio (input/output),
Head tracking and orientation tracking (input/output).

A (Normative) Renderer control metadata processing tools p. 11

[A placeholder for the renderer control metadata processing scripts, including custom HRIR/BRIR conversion to binary format, etc. The actual scripts to be provided as Tdoc attachment.]