The on-going IEEE project P1918.1 "Tactile Internet: Application Scenarios, Definitions and Terminology, Architecture, Functions, and Technical Assumptions"  facilitates the rapid realization of the Tactile Internet as a 5G and beyond application, across a range of different user groups. The standard defines a framework for the Tactile Internet, including descriptions of various application scenarios, definitions and terminology, functions, and technical assumptions. This framework prominently also includes a reference model and architecture, which defines common architectural entities, interfaces between those entities, and the mapping of functions to those entities. The Tactile Internet encompasses low latency high reliability applications (e.g., manufacturing, transportation, healthcare and mobility), as well as non-critical applications (e.g., edutainment and events).
Tactile Internet provides a medium for remote physical interaction, including the exchange of haptic information. This interaction are among humans and / or machines (e.g., robots, networked functions, software, or any other connected entity). There are two broad categories of haptic information, namely, tactile or kinaesthetic. Tactile information refers to the perception of information by the various mechanoreceptors of the human skin, such as surface texture, friction, and temperature. Kinaesthetic information refers to the information perceived by the skeleton, muscles, and tendons of the human body, such as force, torque, position, and velocity. The goal of Tactile Internet in human-in-the-loop scenarios is that humans should not be able to distinguish between locally executing a manipulative task compared to remotely performing the same task across the Tactile Internet.
illustrates the functional architecture for the Tactile Internet proposed by IEEE P1918.1 
. Each tactile edge consists of one or multiple tactile devices (TD), where TDs in tactile edge A communicate tactile / haptic information with TDs in tactile edge B through a network domain, to meet the requirements of a given Tactile Internet use case. The gateway node (GN) is an entity with enhanced networking capabilities that resides at the interface between the tactile edge and the network domain and is mainly responsible for user plane data forwarding. The GN is accompanied by a network controller (NC) that is responsible for control plane processing including intelligence for admission and congestion control, service provisioning, resource management and optimization, and connection management in order to achieve the required QoS for the Tactile Internet session. The network domain is shown to be composed of a radio access point or base station connected logically to control plane entities (CPEs) and user plane entities in the network core. 5G radio access and core network can be a network domain to meet the quality requirements of tactile use cases.
The tactile service manager (TSM) plays a critical role in defining the characteristics and requirements of the service between the two tactile edges and in disseminating this information to key nodes in the tactile edge and network domain. The control information between the TSM and the GNC is carried over Service (S) Interface. SE, a computing and storage entity, provides both computing and storage resources for improving the performance of the tactile edges and meeting the delay and reliability requirements of the E2E communications. Open (O) Interface is used to carry information exchange between any architectural entity and the SE.
We demonstrate how 5G network is used as the "network domain" in the IEEE P1918.1 
architecture to support a typical multi-modal use case -- teleoperation.
Teleoperation allows human users to immerse into an inaccessible environment to perform complex tasks. A typical teleoperation system includes a controller (i.e., the user) and a device (i.e., the tele-manipulator), which exchange haptic signals (forces, torques, position, velocity, vibration, etc.), video signals, and audio signals over a communication network. In particular, the communication of haptic information imposes strong demands on the communication network as it closes a global control loop between the user and the tele-operator. Four communication streams are involved in a typical multi-modal use case -- teleoperation:
Haptic Control stream. It carries command queries from the user to the remote haptic equipment.
Haptic Feedback stream. It carries sensor data and response queries from the remote haptic equipment back to the user.
Video stream. It carries an encoded video stream from the remote environment back to the user. Depending on the resolution of the video, this stream usually occupies the highest percentage of the bandwidth of the communicational channel.
Audio stream. It carries audio data from the remote environment back to the user.
Note that multiple devices may be involved at both or either side of the multi-modal communication, and a single multi-modal communication session may comprise multiple streams between multiple devices. It is assumed that all the above streams are carried over the 5G network as the "network domain" as defined in IEEE P1918.1 functional architecture.
Both the controller ("Tactile Edge A") and the tele-manipulator ("Tactile Edge B") are equipped with multiple devices to capture, transmit and receive audio, video and haptic information.
When a session starts, multiple streams are established over the 5G network ("Network Domain" in the IEEE P1918.1 architecture) between the corresponding devices at the controller and the tele-manipulator that carry multiple modalities data. Table 5.7.3-1
depicts the typical QoS requirements that have to be fulfilled in order for the users' QoE to be satisfactory.
The controller starts to capture the first Media Units (MUs) of haptic information, video and voice at the same time. In this case, the three MUs have the same timestamp, which represents the generation time. Assuming that the sampling interval for haptics, video and audio are 1 ms, 30 ms and 20 ms. The source transmits the first haptic MU about 20-30 ms earlier than the MUs of audio and video, which may result in more 20 ms difference between the arrival time of the MUs of different modalities. If the destination outputs the MUs at the same time, it has to delay the output of the haptic MU until the voice and video MUs arrive at the destination.
A Synchronization Unit is assumed to preserve the time relation of the original signal as steady as possible and synchronize the three media streams with each other. This unit can be part of TSM or can be located in the Tactile Edge. Synchronization becomes increasingly challenging with the increasing demand from the application itself, for example immersive XR experience, as well as the inevitable jitter/delay issues (especially due to the nature of the wireless communication) in the network domain. Necessary information is exchanged between the TSM (including the synchronization unit) and the 5G network for the assistance of the synchronization between different streams of a multi-modal communication session. Audio, video and haptic MUs arrive at the Synchronization Unit and are re-synchronized before getting to the destination.
The user enjoys the good experiences in teleoperation enabled by 5G network and Tactile Internet, where human users are not being able to distinguish between locally executing a manipulative task compared to remotely performing the same task across the Tactile Internet.
, TS 22.263
and TS 22.104
have captured the KPIs for high data rate and low latency interactive services including Cloud/Edge/Split Rendering, Gaming or Interactive Data Exchanging, Consumption of VR content via tethered VR headset, and audio-video synchronization thresholds.
The 5G system shall support a mechanism to ensure users' QoE of the multi-modal communication service involving one or multiple devices at either end of the communication. QoE refers to the difference of the physical interaction across the 5G network and the same manipulation carried out locally.
The 5G system shall support a mechanism for a 3rd party application server to provide real-time feedback on the traffic characteristics and service requirements of the multiple streams of a multi-modal communication session.
The 5G system shall support a mechanism to assist the synchronisation between the multiple streams (e.g., haptic, audio and video) of a multi-modal communication session in order to avoid the negative impact on the user experience.