Content for TR 22.847 Word version: 18.2.0

1… 5… 5.2… 5.3… 5.4… 5.5… 5.6… 5.7… 5.8… 6…

3.1 Definitions 3.2 Symbols 3.3 Abbreviations 4.1 Multi-modal service 4.2 Multi-modal interactive system
...

1 Scope p. 6

This present document provides stage 1 use cases and potential 5G requirements on supporting tactile and multi-modal communication services. In the context of the present document, the aspects addressed include:

Study new scenarios and identify use cases and potential requirements for immersive real time experience involving tactile and multi-modal interactions, including:
1. Network assistance for coordinated transmission of multiple modal representations associated with the same session,
2. aspects of charging, security and privacy, and
3. KPIs (including network reliability and availability).
Gap analysis with existing requirements and functionalities on supporting tactile and multi-modal communication servicesmodal

2 References p. 6

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.

References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.

[1]

TR 21.905: "Vocabulary for 3GPP Specifications".

[2]

ITU-T, "Technology Watch Report: The Tactile Internet", August 2014.

[3]

O. Holland et al., "The IEEE 1918.1 "Tactile Internet" Standards Working Group and its Standards," Proceedings of the IEEE, vol. 107, no. 2, Feb. 2019.

[4]

TS 22.263: "Service requirements for Video, Imaging and Audio for Professional Applications".

[5]

S. K. Sharma, I. Woungang, A. Anpalagan and S. Chatzinotas, "Toward Tactile Internet in Beyond 5G Era: Recent Advances, Current Issues, and Future Directions," in IEEE Access, vol. 8, pp. 56948-56991, 2020

[6]

TS 22.261: "Service requirements for the 5G system".

[7]

Kwang Soon Kim, et al., "Ultrareliable and Low-Latency Communication Techniques for Tactile Internet Services", PROCEEDINGS OF THE IEEE, Vol. 107, No. 2, February 2019

[8]

SAE Manoeuver Sharing and Coordinating Service Task Force, https://www.sae.org/servlets/works/committeeHome.do?comtID=TEVCSC3A.

[9]

SAE Sensor-Sharing Task Force, https://www.sae.org/servlets/works/committeeHome.do?comtID=TEVCSC3B.

[10]

M. During and K. Lemmer, "Cooperative manoeuver planning for cooperative driving," IEEE Intell. Transp. Syst. Mag., vol. 8, no. 3, pp. 8-22, Jul. 2016.

[11]

D. Soldani, Y. Guo, B. Barani, P. Mogensen, I. Chih-Lin, S. Das, "5G for ultra-reliable low-latency communications". IEEE Network. 2018 Apr 2; 32(2):6-7.

[12] Void.

[13]

IEEE SA, "P1918.1 - Tactile Internet: Application Scenarios, Definitions and Terminology, Architecture, Functions, and Technical Assumptions", https://standards.ieee.org/project/1918_1.html

[14]

M. Eid, J. Cha, and A. El Saddik, "Admux: An adaptive multiplexer for haptic-audio-visual data communication", IEEE Tran. Instrument. and Measurement, vol. 60, pp. 21-31, Jan 2011.

[15]

K. Iwata, Y. Ishibashi, N. Fukushima, and S. Sugawara, "QoE assessment in haptic media, sound, and video transmission: Effect of playout buffering control", Comput. Entertain., vol. 8, pp. 12:1-12:14, Dec 2010.

[16]

N. Suzuki and S. Katsura, "Evaluation of QoS in haptic communication based on bilateral control", in IEEE Int. Conf. on Mechatronics (ICM), Feb 2013, pp. 886-891.

[17]

E. Isomura, S. Tasaka, and T. Nunome, "A multidimensional QoE monitoring system for audiovisual and haptic interactive IP communications", in IEEE Consumer Communications and Networking Conference (CCNC), Jan 2013, pp. 196-202.

[18]

A. Hamam and A. El Saddik, "Toward a mathematical model for quality of experience evaluation of haptic applications", IEEE Tran. Instrument. and Measurement, vol. 62, pp. 3315-3322, Dec 2013.

[19]

M. Back et al., "The virtual factory: Exploring 3D worlds as industrial collaboration and control environments," 2010 IEEE Virtual Reality Conference (VR), 2010, pp. 257-258

[20]

S. Schulte, D. Schuller, R. Steinmetz and S. Abels, "Plug-and-Play Virtual Factories," in IEEE Internet Computing, vol. 16, no. 5, pp. 78-82, Sept.-Oct. 2012

[21]

TS 22.104: "Service requirements for cyber-physical control applications in vertical domains"

[22]

Altinsoy, M. E., Blauert, J., & Treier, C., "Inter-Modal Effects of Non-Simultaneous Stimulus Presentation," A. Alippi (Ed.), Proceedings of the 7th International Congress on Acoustics, Rome, Italy, 2001.

[23]

Hirsh I.J., and Sherrrick C.E, 1961. J. Exp. Psychol 62, 423-432

[24]

Altinsoy, M.E. (2012). "The Quality of Auditory-Tactile Virtual Environments," Journal of the Audio Engineering Society, Vol. 60, No. 1/2, pp. 38-46, Jan.-Feb. 2012.

[25]

M. Di Luca and A. Mahnan, "Perceptual Limits of Visual-Haptic Simultaneity in Virtual Reality Interactions," 2019 IEEE World Haptics Conference (WHC), 2019, pp. 67-72, doi: 10.1109/WHC.2019.8816173.

[26]

Arnon, Shlomi, et al. "A comparative study of wireless communication network configurations for medical applications." IEEE Wireless Communications 10.1 (2003): page 56-61.

[27]

K. Antonakoglou et al., "Toward Haptic Communications Over the 5G Tactile Internet", IEEE Communications Surveys & Tutorials, 20 (4), 2018.

3 Definitions, symbols and abbreviations p. 7

3.1 Definitions p. 7

For the purposes of the present document, the terms and definitions given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.

end-to-end latency:

the time that takes to transfer a given piece of information from a source to a destination, measured at the communication interface, from the moment it is transmitted by the source to the moment it is successfully received at the destination.

Multi-modal Data:

Multi-modal Data is defined to describe the input data from different kinds of devices/sensors or the output data to different kinds of destinations (e.g. one or more UEs) required for the same task or application. Multi-modal Data consists of more than one Single-modal Data, and there is strong dependency among each Single-modal Data. Single-modal Data can be seen as one type of data.

reliability:

in the context of network layer packet transmissions, percentage value of the amount of sent network layer packets successfully delivered to a given system entity within the time constraint required by the targeted service, divided by the total number of sent network layer packets.

service area:

geographic region where a 3GPP communication service is accessible.

synchronisation threshold:

A multi-modal synchronisation threshold can be defined as the maximum tolerable temporal separation of the onset of two stimuli, one of which is presented to one sense and the other to another sense, such that the accompanying sensory objects are perceived as being synchronous.

Tactile Internet:

A network (or network of networks) for remotely accessing, perceiving, manipulating, or controlling real or virtual objects or processes in perceived real time by humans or machines.

user experienced data rate:

the minimum data rate required to achieve a sufficient quality experience, with the exception of scenario for broadcast like services where the given value is the maximum that is needed.

3.2 Symbols p. 8

3.3 Abbreviations p. 8

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.

DoF

Degrees of Freedom

4 Overview p. 8

4.1 Multi-modal service p. 8

Tactile and multi-modal communication services enable multi-modal interactions, combining ultra-low latency with extremely high availability, reliability and security. Tactile Internet can be applied in multiple fields, including: industry, robotics and telepresence, virtual reality, augmented reality, healthcare, road traffic, serious gaming, education and culture, smart grid, etc. [2]. Multiple modalities can be used in combination in a service to provide complementary methods that may convey redundant information but can convey information more effectively. With the benefit of combining input from more than one source and/or output to more than one destination, interpretation in communication services will be more accurate and faster, response can also be quicker, and the communication service will be smoother and more natural.

For a typical tactile and multi-modal communication service/application, there can be different modalities affecting the user experience, e.g.:

Video/Audio media;
Information perceived by sensors about the environment, e.g. brightness, temperature, humidity, etc.;
Haptic data: can be feelings when touching a surface (e.g., pressure, texture, vibration, temperature), or kinaesthetic senses (e.g. gravity, pull forces, sense of position awareness).

The ambient information may be further processed to generate IoT control instructions as the feedback. The haptic data, according to the physiological perception, has specific characteristics, e.g. frequency and latency, and may require adequate periodic, deterministic and reliable communication path. For example, the sampling rate of the haptic device for teleoperation systems may reach 1000 times per second and samples are typically transmitted individually hence 1000 packets per second, while the video is 60/90 frames per second. The high frequency transmission of small packets over a long distance would be a great challenge for 5G system.

Multiple modalities can be transmitted at the same time to multiple application servers for further processing in a coordinated manner, in terms of QoS coordination, traffic synchronization, power saving, etc.

Multiple outcomes may be generated as the feedback. In the scenario of real time remote virtual reality service, a VR user may use a plurality of independent devices to separately collect video, audio, ambient and haptic data from the person and to receive video, audio, ambient and haptic feedback from one or multiple application servers for a same VR application. In this case, an end user could wear VR glasses to receive images and sounds, and a touch glove to receive a touch sensation, a camera to collect video inputs, a microphone to collect audio inputs, multiple wearable sensors to provide haptic information and environmental information associated to the user. The real time remote virtual reality service can also be conducted between two users.
Multiple outcomes may need to reach the distributed UEs at the very same time. In the scenario of sound field reappearing, different channels of sounds are sent to the distributed sound boxes to simulate the sound from a particular direction. A small time difference may cause big direction error to impact user experience. In some cases, time difference of 1ms may cause more than 30° angle error.
Multi-modal applications may involve a big number of UEs at a long distance. In the scenario of multi-modal telepresence, tens of UEs may need synchronization for time, control signal and visual signal.

In another scenario, the devices associated to the same tactile and multi-modal communication service may be triggered to wake up by the discovery of a tactile and multi-modality capable user/UE in proximity. And a different group of tactile and multi-modality capable devices can serve the user as he moves.

Other scenarios that can be investigated are industrial manufacturing and drones real-time applications, which require synchronous control of visual-haptic feedback.

4.2 Multi-modal interactive system p. 10

Copy of original 3GPP image for 3GPP TS 22.847, Fig. 4.2-1: Multi-modal interactive system

Figure 4.2-1: Multi-modal interactive system
(⇒ copy of original 3GPP image)

As shown in Figure 4.2-1, multi-modal outputs are generated based on the inputs from multiple sources. In the multi-modal interactive system, modality is a type or representation of information in a specific interactive system. Multi-modal interaction is the process during which information of multiple modalities are exchanged. Modal types consists of motion, sentiment, gesture, etc. Modal representations consists of video, audio, tactition (vibrations or other movements which provide haptic or tactile feelings to a person), etc.