Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 22.876  Word version:  19.1.0

Top   Top   Up   Prev   Next
0…   5…   5.2…   6…   6.2…   7…   7.2…   7.3…   8…   8.2…   9…

 

5  Split AI/ML operation between AI/ML endpoints for AI inference by leveraging direct device connectionp. 8

5.1  Proximity based work task offloading for AI/ML inferencep. 8

5.1.1  Descriptionp. 8

The model splitting is the most significant feature for AI inference. As some R18 use cases in TR 22.874 shows, the number of terminal computing layers and the amount of data transmission are corresponding to different model splitting points. For example, as Figure 5.1-1 shows, the general trend is that the more layers the UE calculated, the less intermediate data needs to be transmitted to application server. In another word, when UE has low computation capacity (e.g. due to low battery), the application can change the splitting point to let UE calculate fewer layers while increasing the data rate in Uu for transmitting a higher load of intermediate data to network.
However, sometimes the data rate cannot be increased due to radio resource limitation, in such circumstances, UE with low computation capacity needs to offload the computation task to a proximity UE (likely a relay UE) but still keeping the computation service and let the proximity UE to send the calculated data to network. Thus, by offloading the work task using direct device connection, the original UE's computation load will be released while the data rate in Uu interface will not necessarily be increased either, which leads to a more ideal performance.
Copy of original 3GPP image for 3GPP TS 22.876, Fig. 5.1-1: Layer-level computation/communication resource evaluation for an AlexNet model
Up
(abstracted from subclause 5.1.1 in TR 22.874).

5.1.2  Pre-conditionsp. 9

A UE uses the AI model (AlexNet) for image recognition. As predetermined by application, there are 5 alternative splitting points which are corresponding to intermediate data size and data rate, see reference [13] [14] in TR 22.874, while fewer the layers being calculated implies fewer the workload being performed by UE. The specific values are shown in the Table below (it is abstracted from clause 5.1 Split AI/ML image recognition in TR 22.874).
Split point Approximate output data size (MByte) Required UL data rate (Mbit/s)
Candidate split point 0
(Cloud-based inference)
0.1536
Candidate split point 1
(after pool1 layer)
0.2765
Candidate split point 2
(after pool2 layer)
0.1741
Candidate split point 3
(after pool5 layer)
0.024.8
Candidate split point 4
(Device-based inference)
N/AN/A
Up

5.1.3  Service Flowsp. 9

Copy of original 3GPP image for 3GPP TS 22.876, Fig. 5.1-2: Using direct device connection (sidelink) to realize the proximity-based work task offloading. In this case, the data rate on Uu need not be increased while the original UE's computation load is offloaded
Up
  1. As shown in left(a) of Figure 5.1-2, UE-A is doing image recognition using Alexnet Model as described in clause 5.4.2. It selects splitting point-3 for the AI inference.
    The E2E service latency (including image recognition latency and intermediate data transmission latency) is 1 second.
  2. When the UE-A's battery becomes low, it cannot afford the heavy work task for the AlexNet model (i.e. calculating layer 1-15 for AlexNet model in local side).
  3. Being managed by 5G network, the UE-A discovers UE-B (a Customer Premise Equipment, CPE) which has installed the same model and is willing to take the offloading task from UE-A.
    Then UE-A established the sidelink (direct device connection) to UE-B. During the sidelink establishment, the UE-B also gets the information of the total service latency (including the image recognition latency and intermediate data transmission latency) and the processing time consumed by UE-A for computing layer 1-4.
    Since the UE-B has acquired the E2E service latency and the processing time consumed by UE-A, and also it knows its own processing time for computing layer 5-15, the UE-B can determine the QoS parameters applied to both Uu and Sidelink while keeping the E2E service latency same as the E2E service latency described in step-1.
  4. The UE-A sends the intermediate data (data after calculating layer 1-4) to UE-B via sidelink and let UE-B make further processing then transmit the intermediate data (data after calculating layer 5-15) to application server via Uu. The specific model layers being computed by UE-A and UE-B are shown in the right(b) in Figure 5.1-2.
  5. UE-A continues to perform image recognition by leveraging sidelink and UE-B's computation capacity while the source and destination IP address and the E2E service latency for the image recognition service is unchanged.
Up

5.1.4  Post-conditionsp. 11

Thanks to UE-B's help, the proximity-based work task offloading is performed. By doing so,
  • it decreased the UE-A's work task by letting UE-A to compute fewer layers of AlexNet model, which helps to meet the low battery condition happened to UE-A;
  • the UE-B computes the rest of layers which is originally from the UE-A's work task;
  • the mobile network does not need to increase the QoS parameters such as guaranteed data rate because the intermediate data rate transmitted by UE-B is unchanged.

5.1.5  Existing features partly or fully covering the use case functionalityp. 11

In clause 6.9 of TS 22.261, the description about the direct network connection mode and the indirect network connection mode as well as the service continuity for switching between the two modes have been described. They are summarized as below:
The UE (remote UE) can connect to the network directly (direct network connection), connect using another UE as a relay UE (indirect network connection), or connect using both direct and indirect connections.
The 5G system shall support different traffic flows of a remote UE to be relayed via different indirect network connection paths.
The 5G system shall be able to maintain service continuity of indirect network connection for a remote UE when the communication path to the network changes (i.e. change of one or more of the relay UEs, change of the gNB).
However, there is no proximity-based work task offloading which means that the "relay UE" not only performs the indirect network communication but also performs task computation for the "remote UE". This may impact the current discovery mechanism, QoS determination on Uu and PC5, and charging aspect.
Up

5.1.6  Potential New Requirements needed to support the use casep. 11

5.1.6.1  Potential Functionality Requirementsp. 11

[P.R.5.1.6-001]
The 5G system shall be able to support the means to modify the communication QoS ensuring the end-to-end latency can be satisfied when a relay UE is involved for a proximity-based work task offloading.
[P.R.5.1.6-002]
The 5G system shall be able to collect charging information for proximity-based work task offloading.
[PR.5.1.6-003]
The 5G system shall support service continuity when a UE communication path changes between a direct network connection and an indirect network connection, including the case when the data size transmitted over the two connection is different (e.g. for a proximity-based work task offloading).
Up

5.1.6.2  Potential KPI Requirementsp. 11

Considering the widely-used AlexNet and VGG-16 model for proximity-based work task offloading, the following KPIs need to be supported:
UL data size
(for sidelink)
UL data rate
(for sidelink)
Intermediate data uploading latency (including sidelink+Uu) Image recognition latency
AlexNet model with 30FPS (NOTE 1)0.15 - 0.02 Mbyte for each frame4.8 - 65 Mbit/s
  • 2ms for Remote driving, AR displaying/gaming, and remote-controlled robotics;
  • 10ms for video recognition;
  • 100ms for One-shot object recognition, Person identification, or photo enhancement in smart phone
1s
VGG-16 model with 30FPS0.1 - 1.5 Mbyte for each frame24 - 720 Mbit/s1s
NOTE 1:
FPS stands for Frame Per Second
Up

Up   Top   ToC