Content for TR 22.874 Word version: 18.2.0

0… 4 5… 5.2… 5.3… 5.4… 5.5… 6… 6.2… 6.3… 6.4… 6.5… 6.6… 6.7… 7… 7.2… 7.3… 7.4… 8… A… A.2 A.3 A.4 B C D…

5.5 Session-specific model transfer split computation operations 5.5.1 Description 5.5.2 Pre-conditions 5.5.3 Service Flows 5.5.4 Post-conditions 5.5.5 Existing features partly or fully covering the use case functionality 5.5.6 Potential New Requirements needed to support the use case
...

5.5 Session-specific model transfer split computation operations p. 23

5.5.1 Description p. 23

A UE, to achieve results for the user, employs split computation (The split computation is for offloading computation intensive task between UE and network). Computation intensive tasks (machine learning, complex computation using input data and the model, etc.) can be fully or partially offloaded. This use case considers a particular use - rendering augmented reality in a headset with modest computational resources. The decision how to split the computation task between the UE and other computation resources can depend on the conditions of the communication network and on computational resources available in the UE.

5.5.2 Pre-conditions p. 23

Abigail has Augmented Reality glasses, a UE with limited computational power. She leaves a bus and stands at the bus stop, where, behind a large advertisement display, a gNB is installed. Abigail's glasses get access through the access point. She seeks to augment her view of the city with directions and annotations (opening hours, local history, description of businesses, etc.) Augmenting the visual scene of the city in real time is a computationally intensive task, accomplished by a model developed through ML. the model has two candidate split points, each candidate split point has a different workload and communication requirement shown as below. This strategy for splitting computation has been installed in the UE and Application Server so that the split point can be adjusted dynamically based on change of communication performance and/or UE's capabilities. The Application Server can be either in the MNO domain (i.e. a trusted application) or external to the MNO domain (i.e. an authorized third party application.). This use case does not consider how the splitting strategy is determined, whether this can be done autonomously or what form the strategy has.

Table 5.5.2-1: Workload and communication requirement for split points

	Approximate output UL data rate (Mbit/s)	Computation load in UE
Candidate split point 1	120	Low
Candidate split point 2	24	High

The glasses have limited computational capacity, and this capacity varies over time. A means for identifying the current status (AI-ML information) of the UE is available to the network i.e. via application layer. Initially it is determined that the UE has the capability to support either candidate split point 1 or 2.

The network communication resources are enormous, it is determined by the augmented reality service to apply candidate split point 1 so that computation is executed mainly in the network, receiving large quantities of data provided by her glasses and this helps reduce computation in UE. The large quantity of data is transmitted via the QoS flow with guaranteed data rate (GBR) 200 Mbit/s.

5.5.3 Service Flows p. 23

Case-a (split point adjusted based on communication performance):

Abigail walks away from the bus stop and the vicinity of the hot spot.

As Abigail stood a few meters from the gNB hotspot, The insufficient communication resources leads to the serving gNB becoming unable to keep the QoS flow with GBR 200 Mbit/s anymore Thus the policy decision point (which could be anywhere - we leave out what takes the decision and how) determines to downgrade will be executed at the time point-a the GBR from 200 Mbit/s to 30 Mbit/s and immediately notifies UE and Application server of this downgrade. The strategy for splitting computation for the AR application now must be adjusted, i.e. change to candidate split point 2, for which more computation needs to be done locally but the required bit rate for UL transmission is reduced to 24 Mbit/s .

The strategy and constraints for the partition of work is out of scope of this use case. (These could include e.g. partial results could be sent to the UE, which could perform sub-optimally with reduced resources, can model information be sent in a lossy / compressed form that is still useful, etc.) In any case, one of the crucial inputs to the decision of how to split the work is the current set of communication resources available.

The network provides current network resource information concerning the UE to network communication performance such as new QoS parameters (GBR=20Mbit/s), the condition information (time point-a) for update of a new QoS, as well as end to end performance between the UE and the computation resources (e.g. in the Service Hosting Environment). This information is made available (exposed) to the split computation 'policy decision point' (which could be anywhere - in the UE, the edge, the cloud, etc., this is not relevant to the use case.)

Case-b (split point adjusted based on UE's AI-ML information):

Originally candidate split point-2 is selected when UE's capability can support a high work load.

Abigail's UE's computational information is monitored at the application layer. When the communication resources are sufficient to support Candidate split point 1, if the UE's conditions degraded sufficiently (e.g. due to depleted battery, lack of storage, reduced computation capacity) then this would be a reason to select Candidate split point 1.

Then the split computation decision point then adjusts the split computation strategy:

For case-a:

to avoid service interruption, the split computation decision point selects the new split point-2 before the time point-a arrives. How this is communicated or 'enforced' is out of scope of this use case and it is not suggested that this would be standardized.

For case-b:

to guarantee the user experience, the split computation decision point selects the split point-1 as the UE's status is insufficient to support a high work load anymore.

The UE status information is assumed to be collected via application layer.

5.5.4 Post-conditions p. 24

Abigail has no awareness of the change of model split point and continues to enjoy acceptable performance as she ventures into the city, even if perhaps it is not as good as when she stood at the bus stop. Note that this use case doesn't conclude as long as Abigail continues to use the service - as the UE to network communication performance can change at any time.

5.5.5 Existing features partly or fully covering the use case functionality p. 24

From clause 6.6.2 of TS 22.261 v17.1.0:

Based on operator policy, the 5G system shall support an efficient mechanism for selection of a content caching application (e.g. minimize utilization of radio, backhaul resources and/or application resource) for delivery of the cached content to the UE.

From clause 6.7.2 of TS 22.261 v17.1.0:

The 5G system shall be able to provide the required QoS (e.g. reliability, end-to-end latency, and bandwidth) for a service and support prioritization of resources when necessary for that service.

The 5G system shall be able to support E2E (e.g. UE to UE) QoS for a service.

The 5G system shall be able to support QoS for applications in a Service Hosting Environment.

From clause 6.8 of TS 22.261 v17.1.0:

Based on operator policy, the 5G system shall support a real-time, dynamic, secure and efficient means for authorized entities (e.g. users, context aware network functionality) to modify the QoS and policy framework. Such modifications may have a variable duration.

Based on operator policy, the 5G system shall maintain a session when prioritization of that session changes in real time, provided that the new priority is above the threshold for maintaining the session.

From clause 6.10.2 of TS 22.261 v17.1.0:

Based on operator policy, the 5G network shall provide suitable APIs to allow a trusted third-party application to request appropriate QoE from the network.

Based on operator policy, the 5G network shall expose a suitable API to an authorized third-party to provide the information regarding the availability status of a geographic location that is associated with that third-party.

Based on operator policy, the 5G network shall expose a suitable API to allow an authorized third-party to monitor the resource utilisation of the network service (radio access point and the transport network (front, backhaul)) that are associated with the third-party.

5.5.6 Potential New Requirements needed to support the use case p. 25

[P.R.5.5-001]

Based on operator policy, the 5G network shall be able to provide the means to allow an authorized third-party to monitor the resource utilisation of the network service that are associated with the third-party.

[P.R.5.5-002]

Based on operator policy, the 5G system shall be able to expose QoS information to an authorized 3rd party. The QoS information can include e.g. UE UL/DL bitrate, latency, reliability per location.

[P.R.5.5-003]

The 5G system shall be able to provide the means to predict and expose network condition changes (e.g. bitrate, latency, reliability) to the authorized third party.