Content for  TS 24.103  Word version:  17.0.0

Top   Top   None   None   Next
1…   5…


1  Scopep. 6

The present document provides the protocol details for telepresence using the IP Multimedia (IM) Core Network (CN) subsystem based on the Session Initiation Protocol (SIP), the Session Description Protocol (SDP), the Binary Floor Control Protocol (BFCP) and the ControLling mUltiple streams for tElepresence (CLUE) controlling multiple streams for telepresence based on service requirements.
The present document addresses the areas of describing and negotiating IM session with multiple media streams based on the IM CN subsystem, including point to point calls as specified in TS 24.229 and multiparty conferences as specified in TS 24.147, to facilitate the support of telepresence.
The functionalities for conference policy control and the signalling between a MRFC and a MRFP are not specified in this document.
Where possible, the present document specifies the requirements for this protocol by reference to specifications produced by the IETF within the scope of SIP, SDP, CLUE and BFCP, either directly, or as modified by TS 24.229.
The present document is applicable to Application Servers (ASs), Multimedia Resource Function Controllers (MRFCs), Multimedia Resource Function Processors (MRFP) and User Equipment (UE) providing IM session supporting telepresence capabilities.

2  Referencesp. 6

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
  • References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
  • For a specific reference, subsequent revisions do not apply.
  • For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
TR 21.905: "Vocabulary for 3GPP Specifications".
TS 24.229: "Internet Protocol (IP) multimedia call control protocol based on Session Initiation Protocol (SIP) and Session Description Protocol (SDP); Stage 3".
TS 24.147: "Conferencing using the IP Multimedia (IM) Core Network (CN) subsystem; Stage 3".
RFC 8845  (January 2021): "Framework for Telepresence Multi-Streams".
RFC 8850  (January 2021): "Controlling Multiple Streams for Telepresence (CLUE) Protocol Data Channel".
RFC 8848  (January 2021): " Session Signaling for Controlling Multiple Streams for Telepresence (CLUE)".
RFC 8841  (January 2021): "Session Description Protocol (SDP) Offer/Answer Procedures for Stream Control Transmission Protocol (SCTP) over Datagram Transport Layer Security (DTLS) Transport".
RFC 3264  (June 2002): "An Offer/Answer Model with Session Description Protocol (SDP)".
RFC 8846  (January 2021): "An XML Schema for the Controlling Multiple Streams for Telepresence (CLUE) Data Model".
TS 23.218: "IP Multimedia (IM) session handling; IM call model; Stage 2".
RFC 8842  (January 2021): "Session Description Protocol (SDP) Offer/Answer Considerations for Datagram Transport Layer Security (DTLS) and Transport Layer Security (TLS)".

3  Definitions, symbols and abbreviationsp. 7

3.1  Definitionsp. 7

For the purposes of the present document, the terms and definitions given in TR 21.905 and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in TR 21.905.
IM session:
An IP multimedia (IM) session is a set of multimedia senders and receivers and the data streams flowing from senders to receivers. IP multimedia sessions are supported by the IP multimedia CN Subsystem and are enabled by IP connectivity bearers (e.g. GPRS as a bearer). A user may invoke concurrent IP multimedia sessions.
A conference with interactive audio-visual communications experience between remote locations, where the users enjoy a strong sense of realism and presence between all participants by optimizing a variety of attributes such as audio and video quality, eye contact, body language, spatial audio, coordinated environments and natural image size.

3.2  Symbolsp. 7

3.3  Abbreviationsp. 7

For the purposes of the present document, the abbreviations given in TR 21.905 and the following apply. An abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in TR 21.905.
Application Server
Binary Floor Control Protocol
ControLling mUltiple streams for tElepresence
Datagram Transport Layer Security
IP Multimedia CN subsystem
Multiple Content Capture
Multimedia Resource Function
Multimedia Resource Function Controller
Multimedia Resource Function Processor
Stream Control Transmission Protocol
Session Description Protocol
Session Initiation Protocol
TelePresence User Equipment
User Equipment

4  Telepresence overviewp. 7

4.1  Generalp. 7

As an architectural framework for provision of IP multimedia services, IMS is capable of delivering various service functionalities and easing to integrate with new kinds of application, such as telepresence. Compared to traditional video conference, telepresence is a communication system with multiple cameras, microphones and screens that has the characteristics of gaze direction, eye contact, spatial audio and scaling images to true size, which in all to achieve the immersive "being there" experience for participants.
IMS is using the IETF defined session control mechanism with the inherited capability to negotiate multiple multimedia streams in one single session, which could be applied as a basis for supporting telepresence in IMS that always has the necessity of producing and rendering various media streams with high qualities among the involved parties, even in the point to point case.
Based on the existing procedures as specified in TS 24.229 and TS 24.147, this specification introduces updates and enhancements for IMS by incorporating CLUE with SIP, SDP and BFCP to facilitate controlling multiple spatially related media streams in an IM session supporting telepresence.
To provide a "being there" experience for conversational audio and video session between remote locations, a variety of information needs to be coordinated, such as:
  • audio and video spatial information;
  • information to enable eye contact, gaze awareness, body language and natural image size; and
  • information to coordinate the environments;

4.2  Spatial informationp. 8

A spatial relationship is representative of the arrangement in space of two or more objects in the same capture scene, in contrast to retain in time or other relationships. It involves mainly both video and audio sources in telepresence conferencing system.
For video, a spatial description of source video images sent in video streams, which includes the order of images in the actual captured scene and may be in two or three dimensions, enables a reproduction of the original scene at the receiver side. For audio, a spatial description of source audio including the point of capture, line of capture and capture sensitivity pattern (i.e. omni, shotgun, cardioid, hyper-cardioid) enable a reproduction at the receiver side in a spatially correct manner. Spatial matching may also be needed between audio and video streams coming from a specific party.
When advertising video and audio media captures in an IM session supporting telepresence, a TP UE as well as a TP enabled conference focus sends spatial information, e.g. physical dimensions of the capture area for each video capture, and associated audio captures spatial information. This allows the receiving party to coordinate the capture scenarios and perform a proper rendering. Consider a TP UE of a typical triple-screen/camera system as example, of which each camera can provide one video capture for each 1/3 section of the room. Each capture has spatial information to indicate the scope of view, where a capture showing a zoomed out view of the whole room has the spatial information indicating a full global view.

4.3  Media informationp. 8

The media information is enhanced in an IM session supporting telepresence by introducing the source components of the media capture, e.g. the original media captures like a camera or a composed media captures indicating a mix of audio streams, a composed or switched media capture indicating the dynamic or most appropriate subset of a "whole".
The enhanced media information enables a sender to describe the sources and a receiver to choose which it wants to see. Taking the above example in subclause 4.2, the TP UE can further provide a single capture representing the active speaker based on voice activity detection, and a single capture representing the active speaker with the other two captures composed as a picture in picture. The media information can be used to distinguish the media captures from each other.
Further media information may also be needed, such as simultaneity constraints. For example, a room camera have two options which are zoomed-in view and zoomed-out view, but there is no way to get them simultaneously.

4.4  Meeting descriptionp. 8

Meeting description includes view information, language information, person information and type, as described below which enable the receivers to choose and render different captures.
  • View information: indicates a physical or logical region as captured;
  • Language information: used in case of multi-lingual and/or accessible conferences;
  • Person information: provides specific information about people participating within a multi-media conference; and
  • Person type: indicates the type of people participating within a multi-media conference with respect to the meeting agenda.
In addition, there may be some descriptive information which contains a relative priority between different captures, embedded textual information, or additional complementary information.

4.5  Presentationp. 9

Presentation indicates resource sharing from one or more specific devices, including slides, video, data and etc. Presentations may have unfixed sources, which varies in placement and can be seen by all the involved parties.

4.6  Information usagep. 9

The information detailed above may be used to obtain a better experience during an IM session between involved parties with different capabilities, such as different number of devices, different picture aspect ratios, or different number of media streams for sending and receiving.
The usage of the information depends on the application scenarios. The TP UE described in the example of subclause 4.2 can provide at least 6 video captures, the message for negotiation therefore needs to contain enough parameters to describe the characteristics of each capture in order to allow the receiving party to clearly differentiate the captures and provide a proper rendering, e.g. spatial view, media composition, person information and type, and etc.
Protocols adopted for IM session supporting telepresence, which exchanges above information among involved parties, enables interoperability by handling multiple streams in standardized way.

Up   Top   ToC