After a series of feasibility studies and normative work on Virtual Reality (VR), the feasibility study on eXtended Reality (XR) in 5G (FS_XR5G), documented in TR 26.928
, analysed the frameworks for eXtended Reality (XR) as a conceptual umbrella for representing the virtual, augmented, and mixed realities in a consistent fashion. This study, 5G glass-type Augmented Reality (AR) / Mixed Reality (MR) classified the XR devices into different form factors, including AR glasses, also referred to as optical head-mounted displays (OHMDs) and pointed out their power and tethering issues with the sidelink. A key aspect of this study is to identify the details of AR glasses including the capabilities for communication, processing, media handling, and offloading of power consumption.
Augmented reality composites virtual objects with reality. The compositing is a combination of light from real world and light presented on display to make them visible together to a user's eyes. Challenges in AR include the prediction of the real-world image visible to the human eye, and the accurate positioning and presentation of the virtual image on the display of AR glasses. To display a virtual image in front of a user's eyes, an optical see-through display is installed between the real world and the user's eyes, typically and most conveniently done with AR glasses.
In order to track the real-world space in which to place the virtual objects, sensors and in particular cameras are required to capture a live-action image visible to the human eye. Typically, multiple sensors and cameras are needed to construct a three-dimensional geometry around the user, called the user's geometry. The perception of geometry and the mapping of AR glass in geometry is referred to Simultaneous Localization and Mapping (SLAM), also introduced in some details in TR 26.928
When AR objects are placed in the user's geometry, these objects are anchored to a part of the real-world geometry. With users moving, maintaining augmentation and consistency between reality and the user's geometry is challenging and requires continuous data flows and processing. In order to support devices with low power consumption, split processing such as split rendering or split perception are technologies to offload processing to powerful network servers. Such split processing approaches are considered beneficial or even essential for satisfying AR experiences, but add new necessary processes and challenges. This encoding, transmission, decoding, correction of rendering data and sensor/camera data over 5G networks, altogether pose challenges on bitrates, reliability requirements and latencies.
Based on the findings in clause 8 of TR 26.928
, this clause follows up on some parts of the conclusions and proposed short term actions:
Develop a flexible XR centric device reference architecture as well as a collection of device requirements and recommendations for XR device classes based on the considerations in clause 7.2 of TR 26.928.
Study detailed functionalities and requirements for glass-type AR/MR UEs with standalone capabilities according to clause 7.6 of TR 26.928 and address exchange formats for AR centric media, taking into account different processing capabilities of AR devices.
Three different types of device reference architectures are identified in this clause. One major distinction among these types is the device capabilities of whether stand-alone processing of required AR media processing (in clause 18.104.22.168
) or having dependencies on an entity in charge of offloading of power consuming processes, which the entity may be a cloud/edge service (in clause 22.214.171.124
) and 5G wireless connectivity (in clause 126.96.36.199
For the detailed functionalities for the device reference architecture of AR glasses, AR runtime (in clause 4.2.3
) is identified for AR/MR system capability discovery, AR/MR session management, tracking of surrounding area, and rendering of AR/MR content in scene graph. Scene manager (in clause 4.2.4
) is able to process a scene graph and render the corresponding 3D scene. 5G Media Access Function (in clause 4.2.4
) is identified to support AR UE and the scene manager to access and stream components of AR content (in clause 4.4
AR content consists of one or more AR objects in terms of primitives (in clause 4.4.4
) and their spatial and temporal compositions described by a scene description (in clause 4.4.2
). Processing of AR/MR functions may require additional metadata (in clause 4.4.3
) to properly recognize user's pose and surrounding area.
Key performance indicators and metrics for AR/MR based on TR 26.928
are provided (in clause 4.5
) and related works (in clause 4.6
) on AR/MR in 3GPP, MPEG, and ETSI are identified for the considerations on collaborative work on device function architecture and AR content formats and codecs.