In this first use case, three users are using the 5GS to join an immersive mobile metaverse activity (which may be an IMS multimedia telephony call using AR/MR/VR). The users Bob, Lukas, and Yong are located in the USA, Germany and China, respectively. Each of the users can be served by a local mobile metaverse service edge computing server (MECS) hosted in the 5GS, each of the mobile metaverse servers is located close to the user it is serving. In case of IMS such MECS could be an AR Media Function that provides network assisted AR media processing. When a user joins a mobile metaverse activity, such as a joint game or teleconference, the avatar of the user is loaded in the MECS of the other users. For instance, the MECS close to Bob hosts the avatars of Yong and Lukas.
The distance between the users, e.g., the distance between USA and China is around 11640 Km, determines minimum communication latency, e.g., 11640/c = 38 msec. This latency might also be higher due to different causes such as, e.g., hardware processing. This latency might also be variable due to multiple reasons, such as, e.g., congestion or delays introduced by (variable processing time of) hardware components such as sensors or rendering devices. Since this value maybe too high and variable for a truly immersive joint location agnostic metaverse service experience, each of the deployed avatars includes one or more predictive models of the person it represents and that allow rendering in the local edge server a synchronized predicted (current) digital representation (i.e. avatar) of the remote users. Similar techniques have been proposed for example in .
Figure 5.9.1-1 shows an exemplary scenario in which a MECS at location 3 (USA) runs the predictive models of remote users (Yong and Lukas) and takes as input the received sensed data from all users (Yong, Lukas, and Bob) as well as the current end-to-end communication parameters (e.g., latency) and generates a synchronized predicted (current) avatar digital representation (i.e. avatar) of the users to be rendered in local rendering devices of Bob. A particular example of such scenario might be about gaming: Yong, Lukas, and Bob are playing baseball in an immersive mobile metaverse activity , and it is Yong's turn to hit the ball that is going to be thrown by Lukas. If Yong hits the ball, then Bob can continue running since Yong and Bob are playing in the same team. In this example, the digital representation (e.g. avatar) predictive models of Lukas and Yong (deployed at the MECS close to Bob) will allow creating a combined synchronized prediction at Location 3 of Lukas throwing the ball and Yong reacting to the ball and hitting the ball so that Bob can start running without delays and can enjoy a great immersive mobile metaverse experience.
This example aims at illustrating how predictive models can improve the location agnostic service experience in a similar was as in . Synchronized predictive digital representation (e.g. avatars) are however not limited to the gaming industry and can play a relevant role in other metaverse services, e.g., immersive healthcare or teleconferencing use cases. This scenario involving synchronized predictive digital representation (e.g. avatars) assumes to require synchronization of user experiences to a single clock.
The following service flows need to be provided for each of the users:
Each of the users, e.g., Bob, decide to join the immersive mobile metaverse service activity and give consent to the deployment of their avatars.
Sensors at each user sample the current representation of each of the users where sampling is done as required by the sensing modalities. The sampled representation of each of the users is distributed to the metaverse edge computing servers of the other users (which may be an AR Media Function in case of IMS) in the metaverse activity.
Each of the edge computing servers applies the incoming data stream representing each of the far located users to the corresponding digital representation (e.g. avatar) predictive models - taking into account the current communication parameters/performance, e.g., latency - to create a combined, synchronized, and current digital representation of the remote users that is provided as input to rendering devices in the local environment. The predictive model also ensures that it correctly synchronizes with the actual state of the remote users based on which it can make the necessary corrections to the digital representation in case of differences between a predicted state and the actual state.
The service flows for the other users (i.e., Yong in China and Lukas in Germany) are the mirrored equivalent. For instance, even if not shown in Figure 5.9.1-1, the local edge computing server associated to Lukas will run the digital representation (e.g. avatar) predictive models of Yong and Bob and consume the data streams coming from those users.
TS 22.261 includes in clause 6.40.2 the following requirement related to AI/ML model transfer in 5GS:
"Based on operator policy, 5G system shall be able to provide means to predict and expose predicted network condition changes (i.e. bitrate, latency, reliability) per UE, to an authorized third party."
This requirement is related to requirement [PR 184.108.40.206], but not exactly the same since the usage of predictive digital representation (e.g. avatar) models requires the knowledge of the end-to-end network conditions, in particular, latency.
the 5G system (including IMS) shall provide a means to synchronize the incoming data streams of multiple (sensor and rendering) devices associated to different users at different locations.
the 5G system (including IMS) shall provide a means to expose predicted network conditions, in particular, latency, between remote users.
The 5G system (including IMS) shall provide a means to support the distribution, configuration, and execution in a local Service Hosting Environment of a predictive digital representation model associated to a remote user involved in multimedia conversational communication.
The 5G system (including IMS) shall provide a means to predict the rendering of a digital representation of a user (e.g. an avatar) and/or of an object based on the latency of a multimedia conversational communication, and to render the predicted digital representation.