Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x
Top   in Index   Prev   Next

TS 26.254
Codec for Immersive Voice and Audio Services
Rendering

V18.0.0 (PDF)2024/03  15 p.
Rapporteur:
Dr. Szczerba, Marek
Philips International B.V.

Content for  TS 26.254  Word version:  18.0.0

Here   Top

 

1  Scopep. 6

The present document provides a comprehensive description of the rendering functions of the decoder/renderer for Immersive Voice and Audio Services (IVAS codec).

2  Referencesp. 6

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
  • References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
  • For a specific reference, subsequent revisions do not apply.
  • For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
[1]
TR 21.905: "Vocabulary for 3GPP Specifications".
[2]
TS 26.250: "Codec for Immersive Voice and Audio Services (IVAS); General overview".
→ to date, still a draft
[3]
TS 26.251: "Codec for Immersive Voice and Audio Services (IVAS); C code (fixed-point)".
→ to date, still a draft
[4]
TS 26.253: "Codec for Immersive Voice and Audio Services (IVAS); Detailed Algorithmic Description incl. RTP payload format and SDP parameter definitions".
[5]
TS 26.258: "Codec for Immersive Voice and Audio Services (IVAS); C code (floating point)".
Up

3  Definitions of terms, symbols and abbreviationsp. 6

3.1  Termsp. 6

For the purposes of the present document, the terms given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.
rendering:
a process of generating digital audio output from the decoded digital audio signal.

3.2  Symbolsp. 6

Void.

3.3  Abbreviationsp. 6

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
BRIR
Binaural Room Impulse Response
CPE
Channel Pair Element
EVS
Enhanced Voice Services
HRIR
Head-Related Impulse Response
HRTF
Head-Related Transfer Function
ISM
Individual Stream with Metadata
IVAS
Immersive Voice and Audio Services
MASA
Metadata Assisted Spatial Audio
MCT
Multi-channel Coding Tool
RTP
Real-Time Protocol
SCE
Single Channel Element
UE
User Equipment
Up

4  Generalp. 7

4.1  IVAS receiver side processingp. 7

The codec for Immersive Voice and Audio Services is part of a framework comprising of an encoder, decoder, and renderer. An overview of the audio processing functions of the receive side of the codec is shown in Figure 4.1-1. This diagram is based on [2], with rendering features highlighted.
Copy of original 3GPP image for 3GPP TS 26.254, Fig. 4.1-1: Overview of IVAS audio processing functions - receiver side
Up
Interfaces:
3:
Encoded audio frames (50 frames/s), number of bits depending on IVAS codec mode
4:
Encoded Silence Insertion Descriptor (SID) frames
5:
RTP Payload packets
6:
Lost Frame Indicator (BFI)
7:
Renderer config data
8:
Head-tracker pose information and scene orientation control data
9:
Audio output channels (16-bit linear PCM, sampled at 8 (only EVS), 16, 32, or 48 kHz)
10:
Metadata associated with output audio
Please note that the interface numbering is consistent with IVAS General Overview [2].
Up

4.2  IVAS renderingp. 7

Rendering is the process of generating digital audio output from the decoded digital audio signal. Rendering is used when output format is different than input format. In case output format is the same as input format, the decoded audio channels are simply passed through to the output channels. Binaural rendering is a special case, where binaural output channels are prepared for headphone reproduction. This process includes head-tracking and scene orientation control, head-related transfer function processing, and room acoustic synthesis. Rendering for loudspeaker reproduction is also supported for preset or custom loudspeaker configurations.
IVAS rendering is available as an integral component of the IVAS decoder (internal renderer) or can be operated standalone as external rendering. The external renderer can be applied e.g., in the case of rendering outputs originating from multiple sources, such as decoders or audio streams.
IVAS rendering features reflect related design constraints, including:
  • support for provisioning of HRIR/BRIR filter sets as control data for binaural rendering. The format of HRIR/BRIR data is provided in clause 5.10 of TS 26.258,
  • support for default HRIR/BRIR sets for binaural rendering,
  • support for head-tracking data as control data for binaural audio rendering in quaternions and in Euler notation. The format of head-tracking data is provided in clause 5.11 of TS 26.258,
  • support for binaural reverb and early reflections controlled by reverb parameters, the format of reverb parameters is provided in clause 5.14.1, and in Annex B of TS 26.258.
This document provides a high-level specification of the internal (clause 5) and external renderer (clause 6). Furthermore, the rendering library interface is provided (clause 6). Specific rendering algorithms and processing paths are out of scope of this specification and are provided in TS 26.253.
Up

5  Internal rendererp. 8

5.1  Overviewp. 8

The internal IVAS renderer is integrated into the IVAS decoder. In case of specific operating points, this integration allows for combining decoding and rendering processes, resulting in efficient processing. The internal renderer supports rendering for loudspeaker and headphone reproduction. In the case of loudspeaker rendering, the audio output is mapped to the loudspeaker positions of the loudspeaker setup. In the case of headphone reproduction, binaural rendering is applied. The following binaural output modes are supported:
  • Binaural output without room acoustic synthesis (no room), command line option BINAURAL,
  • Binaural output with room acoustics synthesized using impulse responses (room with IR), command line option BINAURAL_ROOM_IR,
  • Binaural output with room acoustics synthesized using parametric reverb, with or without early-reflections (room with reverb), command line option BINAURAL_ROOM_REVERB.
There are four binaural renderer implementations available in the IVAS codebase: parametric binaural renderer, FastConv renderer, Crend convolution renderer, and time-domain object renderer. The application of these renderers depends on IVAS input format, bitrate, IVAS encoding mode, and binaural rendering output mode. These dependencies are summarized in Table 5.1-1.
IVAS Input Format Bitrate Range
[kbps]
IVAS Mode
(if applicable)
Binaural rendering output mode
(if applicable)
Renderer Used
SBA13.2 - 80--Parametric Binaural Renderer
SBA96 - 512--FastConv Binaural Renderer
MASA13.2 - 512--Parametric Binaural Renderer
ISM (3 or 4 objects)24.4 - 32ParamISM-Parametric Binaural Renderer
ISM13.2 - 512DiscISMNo room or room with reverbTime Domain Object Renderer
ISM13.2 - 512DiscISMRoom with IRCrend Binaural Renderer
MC See Table 5.1-2McMASA-Parametric Binaural Renderer
MC See Table 5.1-2ParamMC-FastConv Binaural Renderer
MC See Table 5.1-2ParamUpmix-FastConv Binaural Renderer
MC See Table 5.1-2DiscMCAll except belowCrend Binaural Renderer
MC Planar Layouts
(5.1 and 7.1)
See Table 5.1-2DiscMCHead tracking enabled for either no room or room with reverbTime Domain Object Renderer
OMASASee text below--Same as non-combined format
OSBASee text below--Same as non-combined format
The IVAS modes applicable for multi-channel input formats are summarized in Table 5.1-2. More details regarding multi-channel operation are provided in clause 5.7 of TS 26.253.
Bitrate [kbps] MC layout
5.1 7.1 5.1.2 5.1.4 7.1.4
13.2 - 32McMASAMcMASAMcMASAMcMASAMcMASA
48 - 80ParamMCParamMCParamMCMcMASAMcMASA
96DiscMCParamMCParamMCParamMCMcMASA
128DiscMCDiscMCDiscMCParamMCParamMC
160DiscMCDiscMCDiscMCDiscMCParamUpmix
192 - 512DiscMCDiscMCDiscMCDiscMCDiscMC
For the OMASA and OSBA cases, the IVAS coding modes depend on the number of objects and the total IVAS bitrate. For details refer to clause 5.9.2 and 6.9.7 of TS 26.253 for OMASA, and clause 5.8.1 and 6.8 for OSBA.
The details on binaural rendering algorithms are provided in clause 7.2.2 of TS 26.253.
Up

5.2  Time-Domain Rendererp. 9

The time domain (TD) renderer operates on signals in time domain. In the IVAS internal renderer it is used for binaural rendering of discrete ISM, where each audio signal is encoded and decoded with a dedicated SCE module. This covers all ISM bit rates, except 3-4 objects for bit rates 24.4 kbps and 32 kbps. Further it is used in the decoder for binaural rendering of 5.1 and 7.1 signals when headtracking is enabled. An overview of the TD binaural renderer is found in Figure 7-2.1 in TS 26.253. An HRIR model accepts the object position metadata along with the headtracking data and generates an HRIR filter pair. The ITD may be modelled as a part of the HRIR, or it may be modelled as a separate parameter. In case an ITD parameter is output, the ITD is synthesis is performed in the ITD synthesis stage. The time aligned signals are then convolved with the HRIR filter pair to form a binauralized signal. Details are described in clause 7.2.2.2 in TS 26.253.
Up

5.3  Parametric Binauralizer and Parametric Stereo Rendererp. 9

The parametric binauralizer and stereo renderer operates on the following IVAS formats and operations: MASA, OMASA, multi-channel (in McMASA mode), SBA, OSBA, and ISM, i.e., the input to the encoder has been audio signals (and potentially spatial metadata) in one of these formats, and it is now being rendered to binaural or stereo output. Details are described in clause 7.2.2.3 in TS 26.253.

5.4  Fast Convolution Binaural Rendererp. 10

The fast convolution binaural renderer operates on signals in the CLDFB domain. It is used for binaural rendering for the following IVAS formats and operating points (cf. Table 5.1-1): SBA (96 kbps upwards), OSBA and Multi-channel (ParamMC and ParamUpmix modes). Details are described in clause 7.2.2.4 in TS 26.253.

5.5  Crend Binaural Rendererp. 10

The Crend binaural renderer operates on signals in time domain. In the IVAS decoder, it is used for binaural rendering of multichannel signals, where each audio signal is encoded and decoded using discrete multi-channel mode or for discrete ISM with binaural output with room acoustics synthesized using impulse responses. The convolver uses a zero-delay block DFT implementation. DFT/IDFT is implanted using MDFT/IMDFT allowing buffer size being equal to the decoder frame size. Details are described in clause 7.2.2.5 in TS 26.253.
Up

6  External rendererp. 10

6.1  Overviewp. 10

The external IVAS renderer offers a standalone rendering capability employing the same rendering algorithms as the internal IVAS renderer. It is intended to receive the outputs of the IVAS decoder and further render them to other output formats. In addition, the IVAS external renderer is able to receive multiple different input streams that are rendered into a single output format. This provides a mixing functionality to use with multiple IVAS decoder outputs and a pre-renderer functionality for use before IVAS encoding. More details on pre-rendering algorithms are provided in clause 7.5 of TS 26.253.
The external IVAS renderer supports inputs of Ambisonics, ISM, multi-channel, and MASA format streams. The available output formats are binaural (with head-tracking and room effect options), Ambisonics, multi-channel, and MASA format (limited to pre-renderer mixing).
Supported input and output format mapping is provided in Table 6.1-1.
Input format Output Format
Channel based SBA MASA Binaural
Channel based
SBA● (mixing)
MASA● (mixing)
ISM
In the case of rendering to binaural formats, the renderer implementations as discussed in clause 5.1 are used. Similarly to the case of internal rendering, binaural output modes with and without room acoustics are supported.
Up

6.2  Time-Domain Rendererp. 10

In the external renderer the TD renderer is used for all ISM configurations, custom loudspeaker configurations, and multichannel formats 5.1 and 7.1 with headtracking enabled. Details are described in clause 7.2.2.2 in TS 26.253.

6.3  Parametric Binauralizer and Parametric Stereo Rendererp. 10

The parametric binauralizer and stereo renderer is used for the MASA input format in the external renderer. That is, if the output format is any form of binaural output or stereo output, then this renderer is used. Details of the renderer are described in clause 7.2.2.3 in TS 26.253.

6.4  Fast Convolution Binaural Rendererp. 11

Fast convolution rendering is currently not applicable for external rendering.

6.5  Crend Binaural Rendererp. 11

Crend binaural renderer (clause 7.2.2.5 of TS 26.253) is used when input format are SBA and MC except for rendering of MC formats 5.1 and 7.1 with headtracking enabled see clause 6.2.

7  Rendering interfacep. 11

7.1  High-level rendering interface descriptionp. 11

IVAS renderer and its interface provide support to IVAS codec design constraints. The rendering modes and rendering control mechanisms are discussed in clause of TS 26.253.
The details of the rendering library API are provided in TS 26.251 [3] for the fixed-point code and TS 26.258 for the floating-point code. The API functions of the IVAS rendering library provide access to the following groups of functionalities:
  • Initialization,
  • Configuration (input/output),
  • Metadata (input/output),
  • Audio (input/output),
  • Head tracking and orientation tracking (input/output).
Up

A (Normative)  Renderer control metadata processing toolsp. 11

[A placeholder for the renderer control metadata processing scripts, including custom HRIR/BRIR conversion to binary format, etc. The actual scripts to be provided as Tdoc attachment.]

$  Change historyp. 11


Up   Top