Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 26.996  Word version:  18.1.0

Top   Top   Up   Prev   Next
0…   4…   6…   6.1.2…   7…   A…

 

4  Generalp. 7

4.1  Project Historyp. 7

The work item "Immersive Audio for Split Rendering Scenarios (ISAR)" started early 2023 with the target to provide split immersive audio rendering solutions within Rel-18 timeframe. Essential milestones of this work were:
  • Provisioning of requirements underlying the work in TR 26.865
  • Provisioning of TS 26.249 describing split immersive audio rendering solutions meeting the requirements
  • Provisioning of CRs to IVAS codec specifications [10]-[12] adding the ISAR split rendering feature to IVAS
  • Provisioning of this TR documenting quality performance and technical properties of the specified split immersive audio rendering solutions.
An important split of the work was to develop IVAS specific solutions ISAR baseline solutions under a work track A and to develop and specify codec/renderer agnostic solutions under a work track B. This split is reflected in both the ISAR TS 26.249 and this TR where baseline solutions in the context of the IVAS codec and additional ISAR features are described separately.
The selection of IVAS specific solutions to be standardized as ISAR baseline followed a rigorous selection process within 3GPP SA4. The selection process applied the following selection rules:
Rule 1: Provision of full set of selection phase deliverables
The proponents of a candidate solution shall provide all required items listed in PD on ISAR/IVAS selection deliverables in due time in order to be considered further in the selection process of the split rendering solution of the IVAS codec standard.
Rule 2: Compliance with design constraints
The proponents of a candidate solution shall report on compliance of the candidate solution with the IVAS related ISAR design constraints in TR 26.865.
Rule 3: Performance
The performance of the candidate solution(s) will be analysed against the IVAS related ISAR performance requirements in TR 26.865.
The selection procedure involved the following steps:
The selection procedure will consist of the following pre-selection steps for a given candidate solution:
  1. The selection deliverables associated with the candidate solution according to rule 1 are reviewed by SA4 and a determination will be made if they meet the requirements.
  2. The compliance with design constraints of the candidate solution is evaluated based on the report provided according to rule 2 and a determination will be made if the design constraints are met.
  3. The performance of the candidate solution will be analysed against the IVAS related ISAR performance requirements in TR 26.865 and a determination will be made if the performance requirements are met.
  4. Based on the outcome of steps 1-3, SA4 will discuss and determine whether the candidate solution is eligible to be adopted as split rendering solution of the IVAS codec standard. This ends the pre-selection steps for the candidate.
The selection procedure will further consist of the following main selection steps for the pre-selected candidate solution(s):
  1. If there is only a single pre-selected candidate solution, its status will be elevated to selected candidate solution.
  2. If there are more than a single pre-selected candidate solution, SA4 will enter a discussion which of the pre-selected candidate solutions has the highest merit in terms of meeting or exceeding ISAR WID objectives and particularly the relevant IVAS specific performance requirements and design constraints. SA4 will then seek agreement on the most meritful solution and elevate its status to selected candidate solution.
  3. Agreement will be declared on the selection.
  4. SA will be requested to approve the selection and the relevant associated deliverables such as CRs to IVAS specifications.
The actual selection considered only a single candidate solution.
Up

4.2  Overview of the ISAR Work Itemp. 8

4.2.1  Work item justificationp. 8

TS 26.119 assumes a common XR Baseline Client architecture. An essential characteristic is that a functional split is envisioned between a Presentation Engine comprising a set of composite renderers that are controlled by a Scene Manager and an XR Runtime performing a set of functions that interface with a platform to perform commonly required operations, e.g. post-rendering, prior to final output. The relevant interface between Presentation engine and end device may be a 5G physical interface between, e.g., between a smartphone or 5G EDGE and a lightweight device (AR glasses) like those considered in 5G EDGe-Dependent AR (EDGAR) and 5G Wireless Tethered AR UEs as described in TR 26.998 or those considered in TR 26.806.
The functional split assumed in split renderer architectures is a result of stringent implementation and operational requirements applicable for rendering of XR media on XR devices. For head-tracked immersive audio, the need to rely on a split renderer architecture, may depend on various factors among which the round-trip latency between the renderer in the presentation engine and the lightweight device is a decisive parameter. There are scenarios where this latency may be substantial which may prefer a split rendering approach with pose correction in the end device for binaural audio in a similar way as for video unless decoding and head-tracked binaural audio rendering on the lightweight device does not exceed its strict complexity constraints. In other scenarios, that latency may be sufficiently low, in which case the head-tracked binaural rendering can exclusively be done in the presentation engine. It is notable that the transmission over the interface may generally be bit rate constrained and dependent on the specific physical interface.
Binaural audio rendering comprises of signal processing functionalities that may include:
  • Binauralization of audio input based on head rotation (3DoF),
  • Binauralization of audio input based on listener position and head rotation (6DoF),
  • Room acoustics synthesis.
Audio input to be rendered may be a combination of diegetic immersive (3D audio) and non-diegetic sounds. The diegetic immersive sounds need to be binauralized using the up-to-date head rotation data. The head rotation data is typically originating from the head-tracker available from the lightweight end device. The room acoustic synthesis can be performed using room impulse response data or parametric representation thereof, typically supplied to the Presentation Engine.
Depending on constraints and design preferences of the lightweight device (AR glasses, earbuds, etc.) and the properties of the interface between Presentation Engine and end device, solutions are needed that among more are compliant with TR 26.928 and TR 26.998. The solutions shall address given interface characteristics and not impose any new requirements for them.
Another aspect is the currently ongoing standardization of the EVS Codec Extension for Immersive Voice and Audio Services (IVAS) codec. While low complex rendering for lightweight devices is not a specific design objective, the IVAS codec work item should ideally provide solutions that would enable using IVAS services over head-tracked lightweight clients meeting relevant requirements.
Bearing in mind the evolution of the AR/XR technologies, it would be desirable to design low complex solutions for head-tracked binaural audio rendering on lightweight devices that under certain limitations are agnostic in a sense that the pre-renderer component in the presentation engine could be connected with any immersive binaural audio framework through suitable APIs.
The solutions to be specified are intended to add to the number of rendering options to enable immersive audio services on a broad range of devices, including light-weight AR glasses or earbuds. The pre-rendering part of the solutions is expected to become non-mandatory but shall fulfill the relevant requirements set out under this work item. It should interface through a fully specified intermediate bitstream with a fully specified split rendering decoder. For end device implementations claiming support of a specific solution, a fully compliant implementation of at least the split rendering decoder shall be required. Other end device implementations not claiming support of a specific solution remain at the discretion of the implementor.
Up

4.2.2  Work item objectivesp. 9

The overall objective of this work item is to develop solutions for immersive binaural audio on head-tracked devices that are compatible with the envisaged split architectures (TS 26.119, TR 26.998). The solutions should consider low-complex and lightweight devices and demonstrate operational benefits over solutions with full decoding and rendering in the end device. The following objectives should be achieved with the work item:
  • Provide format specification for intermediate representation(s).
    • Provide functional requirements for (pre-)renderer operations to be carried out by Presentation Engine.
    • Define suitable APIs.
  • Provide encoder, bitstream and decoder specification for intermediate representations including audio with and without post-rendering control metadata.
  • Provide a specification for decoded intermediate representations to provide binaural audio output with and without head-tracker input and post-rendering control metadata.
  • Consider potential solutions offered by the IVAS work item, and specify the necessary interfaces.
The work item shall in a first phase identify and agree relevant requirements to be documented in a TR. This shall cover:
  • Design constraints related to complexity and memory as well as constraints related to relevant interfaces between presentation engine and end device such as bit rate, latency, down- and upstream traffic characteristics.
  • Design constraints related to functional capability requirements such as rendering of non-diegetic sounds, 3DoF rendering of diegetic immersive sounds, 6DoF rendering of diegetic immersive sounds, including simultaneous rendering of different sound categories, and room acoustics synthesis.
  • Performance requirements.
The solution(s) are characterized for the range of relevant interface characteristics between presentation engine and lightweight device. The case where the immersive audio is decoded and rendered within the end device should be considered as a reference.
The requirements will be documented in a first technical report. The developments under this work item shall lead to a new specification defining among others textual descriptions of the involved renderers and codec (incl. frame loss concealment) of the intermediate representation(s). The performance of the developed solutions in relation to the requirements will be documented in a second technical report. Solutions meeting the ISAR split rendering requirement may be added to the set of IVAS codec specifications (by means of CRs) if they are found suitable for IVAS. The developed solutions should also be referenced in TS 26.119.
Specific split rendering solutions for IVAS should comprise a non-mandatory default split rendering encoder for the specified internal and stand-alone IVAS renderers. In addition, for a given specific solution there should be specified interfaces offering the possibility either to connect a given (proprietary) renderer for IVAS to the intermediate representation encoder or to use proprietary pre-renderers/intermediate encoders to produce compliant intermediate bitstreams. Such proprietary solutions shall be compliant with the relevant requirements documented in the first technical report. ISAR end device implementations for IVAS claiming support of a specific solution shall be required to have at least a fully compliant split rendering decoder. Other decoder/post-renderer implementations not claiming support of a specific ISAR solution for IVAS remain at the discretion of the implementor.
Up

5  Terms of Referencep. 10

The design constraints and performance requirements defined in the requirements TR 26.865 constitute the Terms of Reference for the ISAR work. Notably, specific design constraints and performance requirements could only be defined given IVAS as target codec/renderer of the baseline ISAR solutions. For codec/renderer agnostic solutions, the TR provides guidelines that can be turned into specific design constraints and performance requirements once a specific target system is defined.
Up

Up   Top   ToC