
Content for  TR 26.928  Word version:  17.0.0

Top   Top   Up   Prev   Next
0…   4…   4.1.2…   4.2…   4.3…   4.4…   4.5…   4.6…   4.6.7   4.7…   4.9…   5…   6…   7…   8   A…   A.4…   A.7…   A.10…   A.13   A.14   A.15   A.16…   A.19…   A.22…


A.14  Use Case 13: 3D shared experiencep. 104

Use Case Description: 3D shared experience
In this shared 3D use case two friends (Eilean and Bob) are sharing a virtual experience. The experience builds around a crime investigation showing an investigation of two murder suspects and allowing the users to discuss and identify who committed the murder. Both Eileen and Bob are joining from home wearing a VR HMD and being captured via an RGB+depth camera. In VR they experience a 3-dimensional room (6DOF, police station), being represented in 3D and including a self-representation that allows them to point at items in the room and at each other. This representation can be based on the same capture that is made with the RGB+depth camera for communication purposes. Further, in the virtual police station each one of them has a window to follow a different interrogation (windowed 6DOF / 3DOF+), allowing them to collect information to solve the murder together (see figure 2).
Copy of original 3GPP image for 3GPP TS 26.928, Fig. A.14-1: example image of a virtual 3D experience with photo-realistic user representations
Type: AR, MR, VR
Degrees of Freedom: 3DoF+ / 6DOF
Delivery: Conversational
Device: Mobile / Laptop
The above use case results into the following hardware requirements:
  • Each user needs a VR HMD (mobile, stand alone, wired/wireless VR HMD).
  • Each user needs a depth camera to be captured (based on Bluetooth, integrated into a mobile phone or wired)
  • Each user needs a microphone and audio headset for audio upload and spatial audio playback
  • Each user needs to be connected and registered to a network that is able to facitilate the end-to-end audio/video call.
    Requirements and QoS/QoE Considerations
    The following QoS requirements are considered:
  • Bandwidth: As minimal bandwidth it is expected at least 6Mbit/s (this is for a single 2D+ user stream with RGB + depth video), however this requirement can increase with more complex and higher resolution streams.
  • Delay: suitable for real-time communication
  • Delay (self-view): suitable for feeling of embodiment
    The main goal of this use case is to create a shared presence and immersion in a 3DOF+/6DOF experience. Thus the following QoE Considerations are relevant:
  • Capture & Processing:
  • The resolution of the rgb+depth camera needs to be sufficient.
  • The foreground / background extraction needs to result into an accurate cut-out of a user
  • Transmission:
  • The compression of audio and video data should follow similar constraints as traditional video conferencing.
  • Rendering:
  • Users, needs to be scaled and positioned in the AR/VR environment in a natural way
  • Audio playback needs to match the spatial orientation of the user
  • A self view needs to be properly aligned with the actual body movement to align proprioceptive and visual experience. Also, delay for this needs to be kept to a minimum.
    Demos & Technology overview:
  • M. J. Prins, S. N. B. Gunkel, H. M. Stokking, and O. A. Niamut. TogetherVR: A Framework for photorealistic shared media experiences in 360-degree VR. SMPTE Motion Imaging Journal 127.7:39-44, August 2018.
  • S. N. B. Gunkel, H. M. Stokking, M. J. Prins, N. van der Stap, F.B.T. Haar, and O.A. Niamut, 2018, June. Virtual Reality Conferencing: Multi-user immersive VR experiences on the web. In Proceedings of the 9th ACM Multimedia Systems Conference (pp. 498-501). ACM.
  • 2018, IBC Demo:
    In summary:
  • Users are captured with an RGB+depth device, e.g. Microsoft Kinect or Intel Realsense Camera
  • This capture is processed locally for foreground/background segmentation and optionally for creation of a self-view.
  • WebRTC is used for setting up streams to the other call participants.
  • A-Frame / WebVR is used for rendering the virtual environment.
    Existing Service:
    Summery of steps:
  • Copy of original 3GPP image for 3GPP TS 26.928, Fig. A.14-2: Functional blocks of end-to-end communication
    Furthermore to realize this use case it is mapped into the following functional blocks:
  • Capture & Processing: The Data from the rgb+depth camera needs to be acquired and further processed (to remove the user from its background), particularly the depth information might need further possessing before transmission
  • Transmission: There needs to be a two-way end to end link between individual participants to transmit audio and video data. The video data should include a both the rgb colour and depth information.
  • Rendering: The transferred user representation has to be blended into the VR environment (according to its geometrical properties based on the RGB + Depth data) and any audio needs to be played according to its special origin within the environment. Further the self-representation of the user has to be displayed aligned so that the view of the user and its physical position match.
    Please not that all 3 functional blocks can be executed either on one device, multiple devices or the network.
    Potential Standardization Status and Needs
    The following aspects may require standardization work:
  • System
  • Architecture
  • Communication interfaces (signalling)
  • Media Orchestration (i.e. metadata)
  • Position and scaling of people
  • Spatial Audio (e.g. including audio directionality of users)
  • Background audio
  • Shared content, i.e. multi-device media synchronization
  • Allow Network based processing (e.g. cloud rendering, foreground /background removal of user capture, image enhancements like hole filling, replace HMD of user with a photo-realistic representation of there face, etc.)
  • Transmission
  • The end-to-end system (including the network) needs to support the RGB+Depth video data.
  • Up

    Up   Top   ToC