Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TS 26.114  Word version:  18.5.0

Top   Top   Up   Prev   Next
0…   3…   4…   5…   6…   6.2.3…   6.2.5…   6.2.7…   6.2.10…   7…   7.5…   8…   9…   10…   10.2.1.6…   10.2.2…   10.3…   10.4…   11…   12…   12.3…   12.7…   13a…   16…   16.5…   17…   18…   19…   A…   A.3…   A.4…   A.5…   A.10…   A.14…   A.15…   B…   C…   C.1.3…   C.1.3.5   C.2…   D   E…   E.18…   E.31…   G…   K…   L…   M…   N…   O…   P…   P.3   Q…   R…   S…   T…   U…   V…   W…   X…   Y…   Y.6…   Y.6.4…   Y.6.5…   Y.7…

 

Y (Normative)  Immersive Teleconferencing and Telepresence for Remote Terminals (ITT4RT) |R17|p. 462

Y.1  Generalp. 462

The MTSI terminal may support the Immersive Teleconferencing and Telepresence for Remote Terminals (ITT4RT) feature as defined in this clause. MTSI clients supporting the ITT4RT feature shall be referred to as ITT4RT clients.
ITT4RT functionality for MTSI enables support of an immersive experience for remote terminals joining teleconferencing and telepresence sessions. It addresses scenarios with two-way audio and one-way immersive 360-degree video, e.g., a remote single user wearing an HMD participating in a conference will send audio and optionally 2D video (e.g., of a presentation, screen sharing and/or a capture of the user itself), but receives stereo or immersive voice/audio and immersive 360-degree video captured by an omnidirectional camera in a conference room connected to a fixed network.
Since immersive 360-degree video support for ITT4RT is unidirectional, ITT4RT clients supporting immersive 360-degree video are further classified into two types to distinguish between the capabilities for sending or receiving immersive video: (i) ITT4RT-Tx client, which is an ITT4RT client only capable of sending immersive 360-degree video, and (ii) ITT4RT-Rx client, which is an ITT4RT client only capable of receiving immersive 360-degree video. Such a classification does not apply for ITT4RT clients supporting immersive speech/audio, since the support for immersive speech/audio is expected to be bi-directional. It should also be noted that a terminal containing ITT4RT-Tx or ITT4RT-Rx client capabilities may also contain further MTSI client capabilities to support bi-directional 2D video.
MTSI gateways supporting ITT4RT functionality are referred to as ITT4RT MRF, which is an ITT4RT client implemented by functionality included in the ITT4RT MRF. An ITT4RT MRF supporting immersive 360-degree video contains both ITT4RT-Tx and ITT4RT-Rx clients.
Up

Y.2  Architecture and Interfacesp. 462

Definitions, reference and coordinate systems, video signal representation and audio signal representation as described in clause 4.1 of TS 26.118 are applicable.
Figure Y.1 provides a possible sender architecture that produces the RTP streams containing 360-degree video and immersive speech/audio as applicable to an ITT4RT client in terminal. VR content acquisition includes capture of 360-degree video and immersive speech/audio, as well as other relevant content such as overlays. Following VR content pre-processing and encoding of 360-degree video and immersive speech/audio components, the corresponding elementary streams are generated. For 360-degree projected video, pre-processing may include video stitching, rotation or other translations, and the pre-processed 360-degree video is then passed into the projection functionality in order to map 360-degree video into 2D textures using a mathematically specified projection format. Optionally, the resulting projected video may be further mapped region-wise onto a packed video. For 360-degree fisheye video, circular videos captured by fisheye lenses are not stitched, but directly mapped onto a 2D texture, without the use of the projection and region-wise packing functionalities (as described in clause 4.3 of ISO/IEC 23090-2 [179]). In this case, pre-processing may include arranging the circular images captured by fisheye lenses onto 2D textures, and the functionality for projection and mapping is not needed. For audio, no stitching process is needed, since the captured signals are inherently immersive and omnidirectional. Followed by the HEVC/AVC encoding of the 2D textures and EVS encoding of immersive speech/audio along with the relevant immersive media metadata (e.g., SEI messages), the consequent video and audio elementary streams are encapsulated into respective RTP streams and transmitted.
Reproduction of 3GPP TS 26.114, Fig. Y.1: Reference sender architecture for ITT4RT client in terminal
Up
Figure Y.2 provides an overview of a possible receiver architecture that reconstructs the 360-degree video and immersive speech/audio in an ITT4RT client in terminal. Note that this Figure does not represent an actual implementation, but a logical set of receiver functions. Based on one or more received RTP media streams, the UE parses, possibly decrypts and feeds the elementary video stream into the HEVC/AVC decoder and the speech/audio stream into the EVS decoder. The HEVC/AVC decoder obtains the decoder output signal, referred to as the "2D texture", as well as the decoder metadata. Likewise, the EVS decoder output signal contains the immersive speech/audio. The decoder metadata for video contains the Supplemental Enhancement Information (SEI) messages, i.e., information carried in the omnidirectional video specific SEI messages, to be used in the rendering phase. In particular, the decoder metadata may be used by the Texture-to-Sphere Mapping function to generate a 360-degree video (or part thereof) based on the decoded output signal, i.e., the texture. The viewport is then generated from the 360-degree video signal (or part thereof) by taking into account the pose information from sensors, display characteristics as well as possibly other metadata.
For 360-degree video, the following components are applicable:
  • The RTP stream contains an HEVC or an AVC bitstream with omnidirectional video specific SEI messages. In particular, the omnidirectional video specific SEI messages as defined in ISO/IEC 23008-2 [119] and ISO/IEC 14496-10 [24] may be present.
  • The video elementary stream(s) are encoded following the requirements in clause Y.3
Reproduction of 3GPP TS 26.114, Fig. Y.2: Reference receiver architecture for ITT4RT- client in terminal
Up
The output signal, i.e., the decoded picture or "texture", is then rendered using the Decoder Metadata information in relevant SEI messages contained in the video elementary streams as well as the relevant information signalled at the RTP/RTCP level (in the viewport-dependent case). The Decoder Metadata is used when performing rendering operations such as region-wise unpacking, projection de-mapping and rotation for 360-degree projected video, or fisheye video information for 360-degree fisheye video) toward creating spherical content for each eye. Details of such sample location remapping process operations are described in clause D.3.41.7 of ISO/IEC 23008-2 [119].
Viewport-dependent 360-degree video processing could be supported for both point-to-point conversational sessions and multiparty conferencing scenarios and can be achieved by sending from the ITT4RT-Rx client RTCP feedback messages with viewport information and then encoding and sending the corresponding viewport by the ITT4RT-Tx client or by the ITT4RT-MRF. This is expected to deliver resolutions higher than the viewport independent approach for the desired viewport. The transmitted RTP stream from the ITT4RT-Tx client or ITT4RT-MRF may also include the information on the region of the 360-degree video encoded in higher quality as the video generated, encoded and streamed by the ITT4RT-Tx client may cover a larger area than the desired viewport. Viewport-dependent processing is realized via RTP/RTCP based protocols that are supported by ITT4RT clients. The use of RTP/RTCP based protocols for viewport-dependent processing is further described in clause Y.7.2.
Up

Y.3  Immersive 360-Degree Video Supportp. 464

ITT4RT-Rx clients in terminals offering video communication shall support decoding capabilities based on:
  • H.264 (AVC) [24] Constrained High Profile, Level 5.1 with the following additional restrictions and requirements on the bitstream:
    • the maximum VCL Bit Rate is constrained to be 120 Mbps with cpbBrVclFactor and cpbBrNalFactor being fixed to be 1250 and 1500, respectively.
    • the bitstream does not contain more than 10 slices per picture.
  • H.265 (HEVC) [119] Main 10 Profile, Main Tier, Level 5.1.
In addition, ITT4RT-Rx clients in terminals may support:
ITT4RT-Tx clients in terminals offering video communication shall support encoding up to the maximum capabilities (e.g., color bit-depth, luma samples per second, luma picture size, frames per second) compatible with decoders compliant with the following on the bitstream:
  • H.264 (AVC) [24] Constrained High Profile, Level 5.1 with the following additional restrictions and requirements:
    • the maximum VCL Bit Rate is constrained to be 120 Mbps with cpbBrVclFactor and cpbBrNalFactor being fixed to be 1250 and 1500, respectively.
    • the bitstream does not contain more than 10 slices per picture.
  • H.265 (HEVC) [119] Main 10 Profile, Main Tier, Level 5.1.
In addition, ITT4RT-Tx clients in terminals may support:
Hence, for a Bitstream conforming to the H.264 (AVC) [24] Constrained High Profile, Level 5.1 delivered from an ITT4RT-Tx client to the ITT4RT-Rx client, the following restrictions apply:
  • The profile_idc shall be set to 100 indicating the High profile.
  • The constraint_set0_flag, constraint_set1_flag, constraint_set2_flag and constraint_set3_flag shall all be set to 0, and constraint_set4_flag and constraint_set5_flag shall be set to 1.
  • The value of level_idc shall not be greater than 51 (corresponding to the level 5.1) and should indicate the lowest level to which the Bitstream conforms.
Furthermore, for a Bitstream conforming to the H.265 (HEVC) [119] Main 10 Profile, Main Tier, Level 5.1 delivered from an ITT4RT-Tx client to the ITT4RT-Rx client, the following restrictions apply:
  • The general_profile_idc shall be set to 2 indicating the Main10 profile.
  • The general_tier_flag shall be set to 0 indicating the Main tier.
  • The value of level_idc shall not be greater than 153 (corresponding to the Level 5.1) and should indicate the lowest level to which the Bitstream conforms.
For 360-degree video delivery across ITT4RT clients, the following components are applicable:
  • The RTP stream shall contain an HEVC or an AVC bitstream with possible presence of omnidirectional video specific SEI messages. In particular, the omnidirectional video specific SEI messages as defined in clause D.2.41 of ISO/IEC 23008-2 [119] or ISO/IEC 14496-10 [24] may be present for the respective HEVC or AVC bitstreams.
  • The video elementary stream(s) shall be encoded following the requirements in the Omnidirectional Media Format (OMAF) specification ISO/IEC 23090-2 [179], clauses 10.1.2.2 (viewport-independent case) or 10.1.3.2 (viewport-dependent case) for HEVC bitstreams and clause 10.1.4.2 for AVC bitstreams. Furthermore, the general video codec requirements for AVC and HEVC in clause 5.2.2 of TS 26.114 also apply.
ITT4RT-Rx clients are expected to be able to process the VR metadata carried in SEI messages for rendering 360-degree video according to the relevant processes. Relevant SEI messages contained in the elementary stream(s) with decoder rendering metadata may include the following information for the relevant processes as per clause D.3.41 of ISO/IEC 23008-2 [119] and ISO/IEC 14496-10 [24]:
  • Projection mapping information (indicating the projection format in use, e.g., Equirectangular projection (ERP) or Cubemap projection (CMP)), for the projection sample location remapping process as specified in clauses 7.5.1.3 and 5.2 of ISO/IEC 23090-2 [179].
  • Region-wise packing information (carrying region-wise packing format indication, any coverage restrictions or padding/guard region information in ithe packed picture), for the inverse processes of the region-wise packing as specified in clauses 7.5.1.2 and 5.4 of ISO/IEC 23090-2 [179].
  • Sphere rotation information (indicating the amount of sphere rotation, if any, applied to the sphere signal before projection and region-wise packing at the encoder side), for the coordinate axes conversion process as specified in clause 5.3 of ISO/IEC 23090-2 [179].
  • Frame packing arrangement (indicating the frame packing format for stereoscopic content), for the processes as specified in D.3.16 of ISO/IEC 23008-2 [119].
  • Fisheye video information (indicating that the picture is a fisheye video picture containing a number of active areas captured by fisheye camera lens), for the fisheye sample location remapping process as specified in clause D.3.41.7.5 of ISO/IEC 23008-2 [119].
The exchange of SEI messages carrying VR metadata for rendering 360-degree video or fisheye video shall be performed using bitstream-level signalling as follows.
SEI messages shall be present in the respective video elementary streams corresponding to the HEVC or AVC bitstreams carrying 360-degree video or fisheye video from the ITT4RT-Tx client to the ITT4RT-Rx client, as per ISO/IEC 23008-2 [119] or ISO/IEC 14496-10 [24]. As expressed more clearly below, the mandatory inclusion of the specific SEI messages in the bitstream by the ITT4RT-Tx client and their decoder and rendering processing by the ITT4RT-Rx client is conditional upon successful SDP-based negotiation of the corresponding 360-degree video or fisheye video capabilities.
In particular, the ITT4RT-Tx client supporting 360-degree video for viewport-independent processing shall signal in the bitstream the equirectangular projection SEI message (payloadType equal to 150) to the ITT4RT-Rx client, with the erp_guard_band_flag set to 0.
If viewport-dependent processing (VDP) capability is successfully negotiated by the ITT4RT-Tx client and ITT4RT-Rx client for the exchange of 360-degree video, then, the ITT4RT-Tx client shall signal in the bitstream to the ITT4RT-Rx client either:
  • the equirectangular projection SEI message (payloadType equal to 150) with the erp_guard_band_flag set to 0, or
  • the cubemap projection SEI message (payloadType equal to 151).
In order to optimize the spatial resolution of specific viewports, the ITT4RT-Tx client and ITT4RT-Rx client may negotiate the use of region-wise packing as part of the exchange of 360-degree video. If this is the case, the region-wise packing SEI message (payloadType equal to 155) shall also be signalled by the ITT4RT-Tx client to the ITT4RT-Rx client in the bitstream.
If stereoscopic video capability is successfully negotiated by the ITT4RT-Tx client and ITT4RT-Rx client as part of the exchange of 360-degree video, then the frame packing arrangement SEI message (payloadType equal to 45) shall also be signalled by the ITT4RT-Tx client to the ITT4RT-Rx client in the bitstream, with the following restrictions:
  • The value of frame_packing_arrangement_cancel_flag is equal to 0.
  • The value of frame_packing_arrangement_type is equal to 4.
  • The value of quincunx_sampling_flag is equal to 0.
  • The value of spatial_flipping_flag is equal to 0.
  • The value of field_views_flag is equal to 0.
  • The value of frame0_grid_position_x is equal to 0.
  • The value of frame0_grid_position_y is equal to 0.
  • The value of frame1_grid_position_x is equal to 0.
  • The value of frame1_grid_position_y is equal to 0.
Furthermore, ITT4RT-Tx clients supporting 360-degree fisheye video shall signal the fisheye video information SEI message (payloadType equal to 152) to the ITT4RT-Rx clients in the bitstream.
The bitstream delivered from an ITT4RT-Tx client to the ITT4RT-Rx client shall contain the corresponding SEI message and ITT4RT-Rx client shall process the VR metadata carried in the signalled SEI message(s) for rendering 360-degree video (provided the successful SDP-based negotiation of the corresponding 360-degree video or fisheye video capabilities associated with the SEI messages).
Up

Y.4  Immersive Voice/Audio Supportp. 466

ITT4RT-Rx and ITT4RT-Tx clients in terminals offering speech communication shall support super-wideband communication as specified in clause 5.2.1.

Y.5  Overlay Supportp. 466

An overlay is defined as a piece of visual media, rendered over omnidirectional video or on the viewport. An ITT4RT-Tx client supporting the 'Overlay' feature shall add real-time overlays on top of a 360-degree background and offer this capability in the SDP as part of the the initial offer-answer negotiation. An overlay in ITT4RT is characterized by the overlay source and its rendering configuration. ITT4RT supports both 2D and spherical overlays.
An overlay source specifies the image or video to be used as the content for the overlay. The overlay source is a bitstream that may be delivered as an overlay video encoded and delivered as an RTP stream.
ITT4RT clients supporting the 'Overlay' feature have the following requirements:
  • an ITT4RT-Tx client may be capable to provide a bitstream consisting of an overlay with a video source that conforms to the requirements of clause Y.3 or the video bitstream requirements of MTSI.
  • an ITT4RT-Tx client may be capable of providing HEVC encoded images/image sequence that conform to HEVC bitstream requirements of clause Y.3. The signalling for such a stream shall comply to clause 6.2.11 and clause 7.4.8.
  • an ITT4RT-Rx client may be capable to decode and render a 360-degree video bitstream and further decode and render one or more overlay streams on top of the 360-degree video.
The rendering configuration defines how the overlay is to be rendered in the spherical scene. It is possible for an overlay to be delivered without any rendering configuration, in which case, the ITT4RT-Rx client should render the content as per the default configuration described in clause Y.6.4.3. Alternatively, the rendering configuration for one or more overlays may be negotiated during session establishment to ensure uniform operation for different receivers.
Up

Up   Top   ToC