Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 26.928  Word version:  17.0.0

Top   Top   Up   Prev   Next
0…   4…   4.1.2…   4.2…   4.3…   4.4…   4.5…   4.6…   4.6.7   4.7…   4.9…   5…   6…   7…   8   A…   A.4…   A.7…   A.10…   A.13   A.14   A.15   A.16…   A.19…   A.22…

 

A.16  Use Case 15: XR Meetingp. 110

Use Case Name
XR Meeting
Description
This use case is a mix of a physical and a virtual meeting. It is an XR extension of the virtual meeting place use case described in TR 26.918. The use case is exemplified as follows:
Company X organizes a workshop with discussions in a couple of smaller subgroups in a conference room, as for instance shown in the figure below. Each subgroup gathers around dedicated spots or tables and discusses a certain topic and participants are free to move to the subgroup of their interest. Remote participation is enabled.
The main idea for the remote participants is to create a virtual, 3D-rendered space where they can meet and interact through their avatars with other people. This 3D-rendered virtual space is a simplified representation of the real conference room, with tables at the same positions as in the real world. Remote participants are equipped with HMD supporting binaural playback. A remote participant can move freely in the virtual conference room and interact with the different subgroups of people depending, for example, on the discussion they are having. A remote participant can speak to other participants in their immediate proximity and obtain a spatial audio rendering of what the other participants are saying. They can hear the real participants from their relative positions in the virtual world, and they can freely walk from one subgroup to another to seamlessly join different conversations that may happen concurrently in the meeting space. Consistent with the auditory scene, the remote participant will be able to see on the HMD a rendered "Scene view" of the complete virtual meeting space from their viewpoint, i.e. relative to position and viewing direction. As options, the remote participant may also select to see a "Top view" of the complete meeting space with all participants (or their avatars) or a "Table view". The latter is generated from a 360-degree video capture at the relevant table. The audio experience remains in any case as during "Scene view".
The physical participants see and hear avatars representing the remote participants through AR Glasses supporting binaural playback. They interact with the avatars in the discussions as if these were physically present participants. For physical participants, the interactions with other physical and virtual participants happen in an augmented reality. In addition, at each subgroup meeting spot, a video screen displays a 360-degree panoramic "Table view" taken from the middle of the respective table, including the overlaid avatars of the remote participants taking part in the subgroup discussion. Also displayed is the complete meeting space with all participants (or their avatars) in a top view.
A schematic of the configuration at the physical meeting space is shown in the following figure. In that figure, P1 through P4 represent the physical participants while V1 through V3 are the remote participants. Also shown are two subgroup meeting spots (tables), each with a 360-degree camera mounted on its center. Further, at each table the two video screens are shown for the 360-degree panoramic "Table view" and for the "Top view".
Categorization
Type: AR, VR, XR
Degrees of Freedom: 6DoF
Delivery: Interactive, Conversational
Device: Phone, HMD with binaural playback support, AR Glasses with binaural playback support
Preconditions
On a general level the assumption is that all physical attendees (inside the meeting facilities) wear a device capable of binaural playback and, preferably, AR glasses. Remote participants are equipped with HMDs supporting binaural playback. The meeting facility is a large conference room with a number of spatially separated spots (tables) for subgroup discussions. Each of these spots is equipped with at least one video screen. At each of the spots a 360-degree camera system is installed.
Specific minimum preconditions
Remote participants:
  • UE with render capability through connected HMD supporting binaural playback.
  • Mono audio capture.
  • 6DOF Position tracking.
    Physical participants:
  • UE with render capability through a non-occluded binaural playback system and preferably, but not necessarily, AR Glasses.
  • Mono audio capture of each individual participant e.g. using attached mic or detached mic with suitable directivity and/or acoustic scene capture at dedicated subgroup spots (tables).
  • 6DOF Position tracking.
    Meeting facilities:
  • Acoustic scene capture at dedicated subgroup spots (tables) and/or mono audio capture of each individual participant.
  • 360-degree video capture at dedicated subgroup spots (tables).
  • Video screens (connected to driving UE/PC-client) at dedicated subgroup meeting spots visualizing participants including remote participants at a subgroup spot ("Table view") and/or positions of participants in shared meeting space in "Top view".
    Conference call server:
  • Maintenance of participant position data in shared virtual meeting space.
  • (Optional) synthesis of graphics visualizing positions of participants in shared meeting space in "Top view".
  • (Optional) generation of overlay/merge of synthesized avatars with 360-degree video to "Table view".
    Media preconditions:
    Audio:
  • The capability of simultaneous spatial render of multiple received audio streams according to their associated 6DOF attributes.
  • Adequate adjustments of the rendered scene upon rotational and translational movements of the listener's head.
    Video/Graphics:
  • 360-degree video capture at subgroup meeting spots.
  • Support of simultaneous graphics render of multiple avatars according to their associated 6DOF attributes, including position, orientation, directivity:
  • Render on AR glasses.
  • Render on HMDs.
  • Overlay/merge synthesized avatars with 360-degree video to "Table view":
  • Render as panoramic view on video screen.
  • VR Render on HMD excluding a segment containing the remote participant itself.
  • Synthesis of "Top view" graphics visualizing positions of participants in shared meeting space.
    Media synchronization and presentation format control:
  • Required for controlling the flow and proper render of the various used media types.
    System preconditions:
  • A metadata framework for the representation and transmission of positional information of an audio sending endpoint, including 6DOF attributes, including position, orientation, directivity.
  • Maintenance of a shared virtual meeting space that intersects consistently with the physical meeting space:
    Real and virtual participant positions are merged into a combined shared virtual meeting space that is consistent with the positions of the real participant positions in the physical meeting space.
    Requirements and QoS/QoE Considerations
    QoS: conversational requirements as for MTSI, using RTP for Audio and Video transport.
  • Audio: Relatively low bit rate requirements, that will meet conversational latency requirements.
  • 360-degree video: Specified in TS 26.118, and will meet conversational latency requirements. It is assumed that remote participants will at each time receive only the 360-degree video stream of a single subgroup meeting spot (typically the closest).
  • Graphics for representing participants in shared meeting space may rely on a vector-graphics media format, see e.g. TS 26.140. The associated bit rates are low. Graphics synthesis may also be done locally in render devices, based on positional information of participants in shared meeting space.
    QoE: Immersive voice/audio and visual experience, Quality of the mixing of virtual objects into real scenes.
    The described scenario provides the remote users with a 6DOF VR meeting experience and the auditory experience of being physically present in the physical meeting space. Quality of Experience for the audio aspect can further be enhanced if the user's UEs not only share their position but also their orientation. This will allow render of the other virtual users not only at their positions in the virtual conference space but additionally with proper rotational orientation. This is of use if the audio subsystem and the avatars associated with the virtual users support directivity, such as specific audio characteristics related to face and back.
    The "Scene view" for the remote participants allows consistent rendering of the audio with the 3D-rendered graphics video of the meeting space. However, that view obviously compromises naturalness and "being-there" experience through the mere visual presentation of the participants through avatars. The optional "Table view" may improve the naturalness as it relies on a real 360-degree video capture. However, QoE of that view is compromised since the 360-degree camera position does not coincide with virtual position of remote user. Viewpoint correction techniques may be used to mitigate this problem.
    The physical meeting users experience the remote participants audio-visually at virtual positions as if these were physically present and as if they could come closer or move around like physical users. The AR glasses display the avatars of the remote participants at positions and in orientation matching the auditory perception. Physical participants without AR glasses get a visual impression of where the remote participants are located in relation to the own position through the video screens at the subgroup meeting spots with the offered "Table view" and/or the "Top view".
    Feasibility
    Under "Preconditions" the minimum preconditions are detailed and broken down by all involved nodes of the service, such as remote participants, physical participants, meeting facilities and conference call server. In summary, the following capabilities and technologies are required:
  • UE with render capability through connected HMD supporting binaural playback.
  • UE with render capability through a non-occluded binaural playback system and preferably, but not necessarily, AR Glasses.
  • Mono audio capture and/or acoustic scene capture.
  • 6DOF position tracking.
  • 360-degree video capture at dedicated subgroup spots.
  • Video screens (connected to driving UE/PC-client) at dedicated subgroup meeting spots visualizing participants including remote participants at a subgroup spot ("Table view") and/or positions of participants in shared meeting space in "Top view".
  • Maintenance of participant position data in shared virtual meeting space.
  • (Optional) synthesis of graphics visualizing positions of participants in shared meeting space in "Top view".
  • (Optional) generation of overlay/merge of synthesized avatars with 360-degree video to "Table view".
    While the suggested AR glasses for the physical meeting participants are very desirable for high QoE, the use case is fully feasible without glasses. Immersion is in that case merely provided through the audio media component. Thus, none of the preconditions constitute a feasibility barrier, given the technologies widely available and affordable today.
    Potential Standardization Status and Needs
  • Requires standardization of at least a 6DOF metadata framework and a 6DOF capable renderer for immersive voice and audio.
  • The presently ongoing IVAS codec work item may provide an immersive voice and audio codec/renderer and a metadata framework that may meet these requirements.
  • Other media (non-audio) may rely on existing video/graphics coding standards available to 3GPP.
  • Also required are suitable session protocols coordinating the distribution and proper rendering of the media flows.
  • Up

    A.17  Use Case 16: Convention / Poster Sessionp. 114

    Use Case Name
    Convention / Poster Session
    Description
    This use case is exemplified with a conference with poster session that offers virtual participation from a remote location.
    It is assumed that the poster session may be real, however, to contribute to meeting climate goals, the conference organizers are offering a green participation option. This is, a virtual attendance option is offered to participants and presenters, as an ecological alternative avoiding travelling.
    The conference space is organized in a few poster booths, possibly separated by some shields. In some of the booths, posters are presented by real presenters, in some other booths, posters are presented by remote presenters. The audience of the poster presentations may be a mix of physically present and remote participants. Each booth is equipped with a first video screen for the poster display and one or two additional video screens for the display of a "Top view" and/or the display of a panoramic "Poster presentation view". Each booth is further equipped with a 360-degree camera system capturing the scene next to the poster. The conference space is visualized in the following figure, which essentially corresponds to the "Top view". In this figure, P1-P6 represent physical attendees, V1-V4 are remote attendees, PX and VY are real and, respectively, remote presenters. There are two poster presentations of posters X and Y. Participants V4, P5 and P6 are standing together for a chat.
    Physical attendees and presenters have the experience of an almost conventional poster conference, with the difference that they see remote persons through their AR glasses, represented as avatars. They hear the remote persons through their binaural playback systems. They can also interact in discussions with remote persons like they were physically present. Physical presenters use a digital pointing device to highlight the parts of their poster that they want to explain. The physical audience attends the poster presentation of a remote presenter in some dedicated physical spots of the conference area that is very similar to the poster booth of a physical presenter. The participants see and hear the virtual presenter through their AR glasses supporting binaural playback. They also see and hear the other audience that may be physically present or just be represented though avatars.
    Remote participants are equipped with HMD supporting binaural playback. They are virtually present and can walk from poster to poster. They can listen to ongoing poster presentations and move closer to a presentation if they think the topic or the ongoing discussion is interesting. A remote participant can speak to other participant in his/her immediate proximity and obtain a spatial rendering of what the other participants in his/her immediate proximity are saying. He/she can hear them from the relative positions they have to him/her in the virtual world. Consistent with the auditory scene, the remote participant will be able to see on the HMD a synthesized "Scene view" of the complete conference space (including the posters) from his/her viewpoint, i.e. relative to position and viewing direction. The remote participant may also select to see a "Top view", which is an overview of the complete conference space with all participants (or their avatars) and posters or to see a "Poster presentation view". The latter is a VR view generated from the 360-degree video capture at the relevant poster but excluding a segment containing the remote participant itself. The audio experience remains in any case as during "Scene view". In order to give the remote participants the possibility to interact in the poster discussions, they also have the possibility to use their VR controller as a pointing device to highlight certain parts of the poster, for instance when they have a specific question.
    Remote presenters are equipped with HMD supporting binaural playback and a VR controller. Most relevant for them is the "Scene view" in which they see (in their proximity) their audience represented by avatars. This view is overlaid with their own poster. They use their VR controller as a pointing device to highlight a part of the poster that they want to explain to the audience. It may happen that a remote presenter sees some colleague passing by and, to attract her/him to the poster, they may take some steps towards that colleague and call out to her/him.
    The remote participants are represented at the real event through their avatars, which the real participants and presenters see and hear through their AR glasses supporting binaural playback. The real and virtual participants and the presenter interact in discussions as if everybody was physically present.
    Categorization
    Type: AR, VR, XR
    Degrees of Freedom: 6DoF
    Delivery: Interactive, Conversational
    Device: Phone, HMD with binaural playback support, AR Glasses with binaural playback support, VR controller/pointing device
    Preconditions
    On a general level the assumption is all physical attendees (inside the conference facilities) wear a device capable of binaural playback. Remote participants are equipped with HMD supporting binaural playback. The meeting facility is a large conference room with a number of spatially separated booths for the different poster presentations. Each of these spots is equipped with a video screen for the poster and at least one other video screen. At each of the poster spots a 360-degree camera system is installed.
    Specific minimum preconditions
    Remote participant:
  • UE with connected VR controller.
  • UE with render capability through connected HMD supporting binaural playback.
  • Mono audio capture.
  • 6DOF Position tracking.
    Remote presenter:
  • UE with connected VR controller.
  • UE with render capability through connected HMD supporting binaural playback.
  • UE has document sharing enabled for sharing of the poster.
  • Mono audio capture.
  • 6DOF Position tracking.
    Physical attendees/presenters:
  • UE with render capability through a non-occluded binaural playback system and AR Glasses.
  • Mono audio capture of each individual participant e.g. using attached mic or detached mic with suitable directivity and/or acoustic scene capture at dedicated subgroup spots (poster booths).
  • 6DOF Position tracking.
  • UE has a connected pointing device.
  • UE of presenter has document sharing enabled for display of the poster on video screen and for sharing it with remote participants.
    Conference facilities:
  • Acoustic scene capture at dedicated subgroup spots (poster booths) and/or mono audio capture of each individual participant.
  • 360-degree video capture at dedicated spots, at the posters.
  • Video screens at dedicated spots (next to the posters), for poster display and for visualizing participants including remote participants at a poster ("Poster presentation view") and/or positions of participants in shared meeting space in "Top view".
  • Video screens are connected to driving UE/PC-client.
    Conference call server:
  • Maintenance of participant position data in shared meeting space
  • Synthesis of graphics visualizing positions of participants in conference space in "Top view".
  • Generation of overlay/merge of synthesized avatars with 360-degree video to "Poster presentation view".
    Media preconditions:
    Audio:
  • The capability of simultaneous spatial render of multiple received audio streams according to their associated 6DOF attributes.
  • Adequate adjustments of the rendered scene upon rotational and translational movements of the listener's head.
    Video/Graphics:
  • 360-degree video capture at subgroup meeting spots.
  • Support of simultaneous graphics render of multiple avatars according to their associated 6DOF attributes, including position, orientation, directivity:
  • Render on AR glasses.
  • Render on HMDs.
  • Overlay/merge synthesized avatars with 360-degree video to "Table view":
  • Render as panoramic view on video screen.
  • VR Render on HMD excluding a segment containing the remote participant itself.
  • Synthesis of "Top view" graphics visualizing positions of participants in shared meeting space.
    Document sharing:
  • Support of sharing of the poster from UE/PC-client as bitmap/vector graphics or as non-conversational (screenshare) video.
    Support of sharing of pointing device data and VR controller data, potentially as real-time text.
    Media synchronization and presentation format control:
  • Required for controlling the flow and proper render of the various used media types.
    System preconditions:
  • A metadata framework for the representation and transmission of positional information of an audio sending endpoint, including 6DOF attributes, including position, orientation, directivity.
  • Maintenance of a shared virtual meeting space that intersects consistently with the physical meeting space:
  • Real and virtual participant positions are merged into a combined shared virtual meeting space that is consistent with the positions of the real participant positions in the physical meeting.
    Requirements and QoS/QoE Considerations
    QoS: conversational requirements as for MTSI, using RTP for Audio and Video transport.
  • Audio: Relatively low bit rate requirements, that will meet conversational latency requirements.
  • 360-degree video: Specified in TS 26.118, and will meet conversational latency requirements. It is assumed that remote participants will at each time receive only the 360-degree video stream of a single poster spot (typically the closest).
  • Graphics for representing participants in shared meeting space may rely on a vector-graphics media format, see e.g. TS 26.140. The associated bit rates are low. Graphics synthesis may also be done locally in render devices, based on positional information of participants in shared meeting space.
  • Document sharing: Relatively low bit rate. No real-time requirements.
  • Pointing device/VR controller data: Very low bit rate. Real-time requirements.
  • Media synchronization and presentation format: Low bit rate. Real-time requirements.
    QoE: Immersive voice/audio and visual experience, Quality of the mixing of virtual objects into real scenes.
    The described scenario provides the remote users in "Scene view" with a 6DOF VR conferencing experience and the feeling of being physically present at the conference. The remote participants and the real poster session / conference audience are able to hear the remote attendee's verbalized questions and the presenter's answers in a way that their audio impression matches their visual experience and which provides a high degree of realism. Quality of Experience can further be enhanced if the user's UEs not only share their position but also their orientation. This will allow render of the other virtual users not only at their positions in the virtual conference space but additionally with proper rotational orientation. This is of use if the audio and the avatars associated with the virtual users support directivity, such as specific audio characteristics related to face and back. The experience is further augmented through the virtual sharing of the posters and the enabled interactions using the pointing devices.
    However, the "Scene view" compromises naturalness and "being-there" experience through the mere visual presentation of the participants through avatars. The optional "Poster presentation view" may improve the naturalness as it relies on a real 360-degree video capture. However, QoE of that view is compromised since the 360-degree camera position does not coincide with virtual position of remote user. Viewpoint correction techniques may be used to mitigate this problem.
    The physical meeting users experience the remote participants audio-visually at virtual positions as if these were physically present and as if they could come closer or move around like physical users. The AR glasses display the avatars of the remote participants at positions and in orientation matching the auditory perception. Physical participants without AR glasses receive a visual impression of where the remote participants are located in relation to the own position through the video screens at the poster booths.
    Feasibility
    Under "Preconditions" the minimum preconditions are detailed and broken down by all involved nodes of the service, such as remote participants, physical participants, meeting facilities and conference call server. In summary, the following capabilities and technologies are required:
  • UE with connected VR controller/pointing device.
  • UE with render capability through connected HMD supporting binaural playback.
  • UE with render capability through a non-occluded binaural playback system and AR Glasses.
  • Mono audio capture and/or acoustic scene capture.
  • 6DOF Position tracking.
  • UE supporting document sharing (for sharing of the poster).
  • 360-degree video capture at dedicated subgroup spots, at the posters.
  • Video screens (connected to driving UE/PC-client) at dedicated spots (next to the posters), for poster display and for visualizing participants including remote participants at a poster ("Poster presentation view") and/or positions of participants in shared meeting space in "Top view".
  • Maintenance of participant position data in shared virtual meeting space.
  • Synthesis of graphics visualizing positions of participants in conference space in "Top view".
  • Generation of overlay/merge of synthesized avatars with 360-degree video to "Poster presentation view".
  • Poster sharing and sharing of pointing device data.
    While the suggested AR glasses for the physical meeting participants are very desirable for high QoE, the use case is fully feasible even without glasses. Immersion is in that case merely provided through the audio media component. Thus, none of the preconditions constitute a feasibility barrier, given the technologies widely available and affordable today.
    Potential Standardization Status and Needs
  • - Requires standardization of at least a 6DOF metadata framework and a 6DOF capable renderer for immersive voice and audio.
  • - The presently ongoing IVAS codec work item may provide an immersive voice and audio codec/renderer and a metadata framework that may meet these requirements.
  • - Other media (non-audio) may rely on existing image/video/graphics coding standards available to 3GPP.
  • - Also required are suitable session protocols coordinating the distribution and proper rendering of the media flows.
  • Up

    A.18  Use Case 17: AR animated avatar callsp. 119

    Use Case Name
    AR animated avatar call
    Description
    This use case is about a call scenario between one user wearing AR glasses and the other user using a phone in handset mode. The AR glasses user sees an animated avatar of the phone user. Movements of the phone user are used to control the animation of his avatar. This improves the call experience of the user of the AR glasses.
    A potential user experience is described as a user story:
    Tina is wearing AR glasses while walking around in the city. She receives an incoming call by Alice, who is using her phone, and who is displayed as an overlay ("head-up display") on Tina's AR glasses. Alice doesn't have a camera facing at her, therefore a recorded 3D image of her is sent to Tina as the call is initiated. The 3D image Alice sent can be animated, following Alice's actions. As Alice holds her phone in handset mode, her head movements result in corresponding animations of her transmitted 3D image, giving Tina the impression that Alice is attentive.
    As Tina's AR glasses also include a pair of headphones, Alice' mono audio is rendered binaurally at the position where she is displayed on Tina's AR glasses. Tina also has interactivity settings, allowing to lock Alice's position on her AR screen. Therefore, her visual and auditory appearance moves when Tina rotates her head. As Tina disables the position lock, the visual and auditory appearance of Alice is placed within Tina's real world and thus Tina's head rotation leads to compensation on the screen and audio appearance, requiring visual and binaural audio rendering with scene displacement.
    Type: AR
    Degrees of Freedom: 2D, 3DoF
    Delivery: Conversational
    Device: Phone, HMD, Glasses, headphones
    Preconditions
    AR participants: Phone with tethered AR glasses and headphones (with acoustic transparency).
    Phone participant: Phone with motion sensor and potentially proximity sensor to detect handset mode.
    Requirements and QoS/QoE Considerations
    QoS: QoS requirements like MTSI requirements (conversational, RTP), e.g. 5QI 1.
    QoE: Immersive voice/audio and visual experience, Quality of the mixing of virtual objects (avatars) into real scenes and rendering an audio overlaid to the real acoustic environment.
    Feasibility
    AR glasses in various form factors exist, including motion sensing and inside-out tracking. This allows locking of avatars and audio objects to the real world.
    Smart phones typically come with built-in motion sensing, using a combination of gyroscopes, magnetometers and accelerometer. This allows extraction of the head's rotation, when the phone is used in handset mode, which could be motion data sent to the other endpoint to animate/rotate the avatar/3D image.
    Potential Standardization Status and Needs
    Visual coding and transmission of avatars or cut-out heads, alpha channel coding
    Transmission and potentially coding of motion data to show attentiveness
    Up

    Up   Top   ToC