Content for TS 26.114 Word version: 18.6.0

0… 3… 4… 5… 6… 6.2.3… 6.2.5… 6.2.7… 6.2.10… 7… 7.5… 8… 9… 10… 10.2.1.6… 10.2.2… 10.3… 10.4… 11… 12… 12.3… 12.7… 13a… 16… 16.5… 17… 18… 19… A… A.3… A.4… A.5… A.10… A.14… A.15… B… C… C.1.3… C.1.3.5 C.2… D E… E.18… E.31… G… K… L… M… N… O… P… P.3 Q… R… S… T… U… V… W… X… Y… Y.6… Y.6.4… Y.6.5… Y.7…

Y.6.5 Fisheye Video Y.6.5.1 Identifying the 360-degree fisheye video stream Y.6.5.2 360-degree fisheye video SDP attribute parameters Y.6.5.3 Viewport dependent delivery of fisheye video Y.6.6 Camera Calibration for Network-based Stitching Y.6.7 Support for Stream Pausing/Resuming Y.6.8 Multiple 360-degree videos Y.6.8.2 Excluding other participants' overlays Y.6.9 Scene Description-Based Overlays Y.6.9.1 General Y.6.9.2 Offer/Answer Negotiation Y.6.9.3 SDP Signaling

Y.6.5 Fisheye Video p. 482

Y.6.5.1 Identifying the 360-degree fisheye video stream p. 482

The SDP attribute 3gpp_fisheye is used to indicate a 360-degree fisheye video stream.

The semantics of the above attribute and parameters is provided below.

ITT4RT clients supporting 360-degree fisheye video shall support the 3gpp_fisheye attribute and shall support the following procedures:

when sending an SDP offer, the ITT4RT-Tx client includes the 3gpp_fisheye attribute in the media description for video in the SDP offer
when sending an SDP answer, the ITT4RT-Rx client includes the 3gpp_fisheye attribute in the media description for video in the SDP answer if the 3gpp_fisheye attribute was received in an SDP offer
after successful negotiation of the 3gpp_fisheye attribute in the SDP, the MTSI clients exchange an RTP-based video stream containing an HEVC or AVC bitstream with fisheye omnidirectional video specific SEI messages as defined in clause Y.3

ITT4RT-Tx clients that support both 360-degree projected video and 360-degree fisheye video may include both 3gpp_360video and 3gpp_fisheye attributes as alternatives in the SDP offer, but an ITT4RT-Rx client shall include only one attribute (either 3gpp_360video or 3gpp_fisheye, based on support or selection) in the SDP answer.

Y.6.5.2 360-degree fisheye video SDP attribute parameters p. 483

Media-line level parameters are defined in order to aid session establishment between the ITT4RT-Tx and ITT4RT-Rx clients for 360-degree fisheye video, as well as to describe the fisheye video stream as identified by the 3gpp_fisheye attribute.

The syntax for the SDP attribute is:

a=3gpp_fisheye: <fisheye> <fisheye-img> <maxpack>

Total number of fisheye circular videos at the capturing terminal.
Depending on the camera configuration of the sending terminal, the 360-degree fisheye video may be comprised of multiple different fisheye circular videos, each captured through a different fisheye lens.
- <fisheye>: this parameter inside an SDP offer sent by an ITT4RT-Tx client indicates the total number of fisheye circular videos output by the camera configuration at the terminal.
Fisheye circular video static parameters.
In order to enable the quick selection of desired fisheye circular videos by the ITT4RT-Rx client during SDP negotiation, the following static parameters are defined for each fisheye circular video. These parameters are defined from the video bitstream fisheye video information SEI message as defined in ISO/IEC 23008-2 [119] and ISO/IEC 23090-2 [179].
- <fisheye-img> = <fisheye-img-1> … <fisheye-img-N>
- <fisheye-img-X> = [<id-X> <azi> <ele> <til> <fov>] for 1 ≤ X ≤ N where:
  - <id>: an identifier for the fisheye video.
  - <azi>, <ele>: azimuth and elevation indicating the spherical coordinates that correspond to the centre of the circular region that contains the fisheye video, in units of 2^-16 degrees. The values for azimuth shall be in the range of −180 * 2¹⁶ (i.e., −11 796 480) to 180 * 2¹⁶ − 1 (i.e., 11 796 479), inclusive, and the values for elevation shall be in the range of −90 * 2¹⁶ (i.e., −5 898 240) to 90 * 2¹⁶ (i.e., 5 898 240), inclusive.
  - <til>: tilt indicating the tilt angle of the sphere regions that corresponds to the fisheye video, in units of 2−16 degrees. The values for tilt shall be in the range of −180 * 2¹⁶ (i.e., −11 796 480) to 180 * 2¹⁶ − 1 (i.e., 11 796 479), inclusive.
  - <fov>: specifies the field of view of the lens that corresponds to the fisheye video in the coded picture, in units of 2−16 degrees. The field of view shall be in the range of 0 to 360 * 2¹⁶ (i.e., 23 592 960), inclusive.
Stream packing of fisheye circular videos
Depending on the terminal device capabilities and bandwidth availability, the packing of fisheye circular videos within the stream can be negotiated between the sending and receiving terminals.
- <maxpack>: this parameter inside an SDP offer indicates the maximum supported number of fisheye videos which can be packed into the video stream by the ITT4RT-Tx client. The value of this parameter inside an SDP answer indicates the number of fisheye videos to be packed, as selected by the ITT4RT-Rx client.

The ABNF syntax for this attribute is the following:

att-field = "3gpp_fisheye" 
att-value = [SP fisheye] SP fisheye-img SP maxpack
fisheye = pos-integer
fisheye-img = 1*fisheye-img-X
fisheye-img-X = "[" "id=" idvalue "," "azi=" azivalue "," "ele=" elevalue "," "til=" tilvalue
"," "fov=" fovvalue "]"
;sub-rules for fisheye-img-X
idvalue = byte-string ; byte-string defined by RFC 4566
azivalue = degminus180to180
elevalue = degminus90to90
tilvalue = degminus180to180
fovvalue = deg0to360
maxpack = pos-integer
;pos-integer, degminus180to180, degminus90to90 and deg0to360 are from Annex Y.6.2.1

An example SDP offer is shown in Table Y.6.5.2-1.

Table Y.6.5.2-1: Example SDP offer with 360-degree fisheye video attribute parameters

SDP offer
m=video 49154 RTP/AVP 99 a=tcap:1 RTP/AVPF a=pcfg:1 t=1 b=AS:10000 b=RS:0 b=RR:2500 a=rtpmap:99 H265/90000 a=fmtp:99 profile-id=1; level-id=93; a=3gpp_fisheye: 2 [id=1,azi=0,ele=0,til=0,fov=11796480] [id=2,azi=11796479,ele=0,til=0,fov=11796480] 2 a=sendonly

SDP offer

m=video 49154 RTP/AVP 99
a=tcap:1 RTP/AVPF
a=pcfg:1 t=1
b=AS:10000
b=RS:0
b=RR:2500
a=rtpmap:99 H265/90000
a=fmtp:99 profile-id=1; level-id=93;
a=3gpp_fisheye: 2 [id=1,azi=0,ele=0,til=0,fov=11796480] [id=2,azi=11796479,ele=0,til=0,fov=11796480] 2
a=sendonly

As an example, a receiving terminal which only receives 360-degree fisheye video (and possibly sends a 2D video to the sender) replies with an SDP answer containing only the selected fisheye videos equal to the number as selected by the value of maxpack in the corresponding m-line, which is set to recvonly.

Y.6.5.3 Viewport dependent delivery of fisheye video p. 484

By exposing the coverage information of each fisheye circular video using the parameters in clause Y.6.5.2, the collective multitude of which makes up the whole 360-degree video, a ITT4RT-Rx client can opt to select only the required fisheye circular videos needed to render the current viewport of the user.

Through the parameters defined in clause Y.6.5.2, a ITT4RT-Rx client can select the desired fisheye packing configuration of the video stream during SDP negotiation, as well as the initial desired fisheye videos using the id parameter.

Once a session is established, dynamic delivery of the desired fisheye videos depending the ITT4RT-Rx client user's viewport can be enabled using RTCP-based signalling, specifically with the RTCP feedback message with type "Viewport" as defined in clause Y.7.2.

Y.6.6 Camera Calibration for Network-based Stitching p. 484

Network-based stitching in the context of ITT4RT refers to generation of 360-degree videos in the ITT4RT MRF based on 2D video captures received from MTSI clients. This clause describes SDP-based signalling of camera calibration parameters for this purpose using the "a=3gpp-camera-calibration" attribute and SDP-based grouping of the corresponding 2D video captures using the "a=stitch_group" attribute.

The SDP syntax for "a=3gpp-camera-calibration" is defined with the following semantics (detailed ABNF presented at the end of the clause):

3gpp-camera-calibration = "a=3gpp-camera-calibration:"
 [SP "Param 1" SP "Param 2" SP ……. SP "Param K"]

where "Param 1", …. , "Param K" express the set of intrinsic and extrinsic camera parameters as specified below.

If the ITT4RT-Tx client in the ITT4RT MRF intends to perform network-based stitching to generate 360-degree video from a particular set of 2D video captures received from an MTSI sender, it shall use the SDP session-level attribute "a=stitch_group" before any media lines that correspond to the particular 2D video captures during the SDP negotiation of the corresponding media. Likewise, an MTSI sender capable of capturing 2D videos for 360-degree video generation shall use the session-level "a=stitch_group" attribute in the SDP before any media lines that correspond to the particular 2D video captures. The "a=stitch_group" attribute is used to group the corresponding to-be-stitched 2D video captures using the mid attribute as defined according to the ABNF below:

 att-field = "stitch_group" 
 att-value = mid *[SP mid]
 mid =   token
    ; token is defined in RFC 4566

The mid attribute with the appropriate value as defined in the other parts of the SDP shall be included in the media description for the relevant 2D video captures when the "a=stitch_group" attribute is used. Furthermore, for each of these 2D video captures, the MTSI sender shall also include the SDP attribute 3gpp-camera-calibration in the SDP under the relevant m= line for that particular video to signal the relevant camera calibration information. The order of the media included in the "a=stitch_group" indicates the synchronization source with the first media always being the synchronization anchor when synchronization is required.

More specifically, detailed camera calibration parameters based on ISO/IEC 23008-2 [3] are provided as follows, considering the multi-view acquisition information SEI message for HEVC. With these specifications, a 3-dimensional world point, wP = [ x y z ] is mapped to a 2-dimensional camera point, cP[ i ] = [ u v 1 ], for the i-th camera according to:

s * cP[ i ] = A[ i ] * R−1[ i ] * ( wP − T[ i ] ) (eqn. Y.6.6.1)

where A[ i ] denotes the intrinsic camera parameter matrix, R−1[ i ] denotes the inverse of the rotation matrix R[ i ], T[ i ] denotes the translation vector, and s (a scalar value) is an arbitrary scale factor chosen to make the third coordinate of cP[ i ] equal to 1.

Equation Y.6.6.1 can be extended to incorporate the entrance pupil variation to correct the incidence ray of cP[ i ] = [ u v 1 ] such that it always passes through the camera optical center, thereby removing distortion. The resulting entrance pupil coefficients E[i] may be incorporated into Equation Y.6.6.1 as

s * cP[ i ] = A[ i ] * R−1[ i ] * ( (wP + E) − T[ i ] ) (eqn. Y.6.6.2)

where wP + E[i]) = [ x y z+E ], E = e1* Θ³ + e2* Θ⁵ + e3* Θ⁷ + e4* Θ⁹, Θ

is the incidence angle pertaining to each ray formed by the pixel cP[ i ] = [ u v 1 ], and [e1, e2, e3, e4] are entrance pupil coefficients. In addition, the accuracy of these entrance pupil parameters have an influence of the accuracy of estimated extrinsic parameters and thus improve the future imaging tasks. If not available, vector E is considered as 0 and a fallback to eqn. Y.6.6.1 is expected.

Accordingly, the following intrinsic camera parameters can be signalled in the SDP for each 2D video capture using the "a=3gpp-camera-calibration" attribute:

focalLengthX[ i ] specifies the focal length of the i-th camera in the horizontal direction as a signed floating-point number.

focalLengthY[ i ] specifies the focal length of the i-th camera in the vertical direction as a signed floating-point number.

principalPointX[ i ] specifies the principal point of the i-th camera in the horizontal direction as a signed floating-point number.

principalPointY[ i ] specifies the principal point of the i-th camera in the vertical direction as a signed floating-point number.

skewFactor[ i ] specifies the skew factor of the i-th camera as a signed floating-point number.

The intrinsic matrix A[ i ] for i-th camera is represented by:

It is possible that the intrinsic camera parameters are equal for all of the cameras. In that case, only one set of values based on the above parameters would need to be signalled, e.g., via SDP signalling at the session level.

Furthermore, the following extrinsic camera parameters can be signalled in the SDP for each camera as per ISO/IEC 23008-2 [3]:

rE[ i ][ j ][ k ] specifies the ( j, k ) component of the rotation matrix for the i-th camera as a signed floating-point number.

The rotation matrix R[ i ] for i-th camera is represented as follows:

tE[ i ][ j ] specifies the j-th component of the translation vector for the i-th camera as a signed floating-point number.

The translation vector T[ i ] for the i-th camera is represented by:

For the i-th camera, E[ i ][ j ] specifies the j-th component of the entrance pupil coefficient [e1, e2, e3, e4] where j=1,…4. The parameters are represented as a signed floating-point number, as per eqn (2) above.

The syntax for the "a=3gpp-camera-calibration" attribute shall conform to the following ABNF:

att-field =  "3gpp-camera-calibration"
att-value = PT 1*WSP attr-list 
PT =   1*DIGIT / "*"
attr-list = ( set *(1*WSP set) ) / "*"
  ;  WSP and DIGIT defined in [RFC5234]
;sub-rules for set
set = "[" "focalLengthX=" sfloatvalue ",focalLengthY=" sfloatvalue
 ",skewFactor=" sfloatvalue ",principalPointX=" sfloatvalue 
 ",principalPointY=" sfloatvalue ",rotation00=" sfloatvalue 
 ",rotation01=" sfloatvalue ",rotation02=" sfloatvalue ",rotation10=" 
 sfloatvalue ",rotation11=" sfloatvalue ",rotation12=" sfloatvalue 
 ",rotation20=" sfloatvalue ",rotation21=" sfloatvalue ",rotation22=" 
 sfloatvalue ",translation0=" sfloatvalue ",translation1="
 sfloatvalue ",translation2=" sfloatvalue ",epupil1=" sfloatvalue 
 ",epupil2=" sfloatvalue ",epupil3=" sfloatvalue ",epupil4=" 
 sfloatvalue "]"
sfloatvalue = [sign] sizevalue ["." 6*DIGIT] 
sign =  "-" 
sizevalue = POS-DIGIT *5DIGIT
  ; POS-DIGIT is defined in Y.6.2.1

Y.6.7 Support for Stream Pausing/Resuming p. 487

An ITT4RT-Tx client shall use the a=rtcp-fb ccm pause attribute and parameter values as specified in RFC 5104 and RFC 7728 to indicate the capability to support receiving and acting on PAUSE and RESUME requests targeted for RTP streams it sends. The optional parameter setting of a=rtcp-fb ccm pause config=3 could be used by the ITT4RT-Tx client to indicate that it will only receive and react to PAUSE and RESUME requests but will not send them.

An ITRT4RT-Rx client shall use the a=rtcp-fb ccm pause attribute and parameter values as specified in RFC 5104 and RFC 7728 to indicate the capability to support sending PAUSE and RESUME requests targeted for RTP streams it receives. The optional parameter setting of a=rtcp-fb ccm pause config=2 could be used by the ITT4RT-Rx client to indicate that it will only send PAUSE and RESUME requests but does not support receiving these requests.

Y.6.8 Multiple 360-degree videos p. 487

An ITT4RT conference may contain multiple 360-degree videos which originate from multiple conference rooms at the conference location, or from remote participants. When multiple 360-degree videos are present in an ITT4RT conference, an ITT4RT MRF shall negotiate an SDP session with every remote participant.

In the SDP offer from the ITT4RT MRF, 360-degree video is identified by either the a=3gpp_360video or a=3gpp_fisheye media line attributes. When multiple 360-degree videos are present in the SDP offer, the ITT4RT MRF shall include the a=content attribute under the media lines for 2D or 360-degree video originating from the conference location. For media streams originating from the main default conference room, the content attribute is set to a=content:main. For media streams originating from other conference rooms, the content attribute is set to a=content:alt. 2D and 360-degree video from remote participants shall not include the a=content attribute under their corresponding media lines.

When there are multiple 360-degree videos from multiple sources available to the ITT4RT MRF, the ITT4RT MRF may include the 'itt4rt_group' attribute (as defined in clause Y.6.2.6) to define one or more restricting groups, each group containing at least one mid associated with a 360-degree video media line, and at least one mid associated with an overlay.

On receipt of an SDP offer containing multiple 360-degree videos from the ITT4RT MRF, an ITT4RT-Rx client shall select to receive only one 360-degree video media together with possible 2D video media from other sources, rejecting the other 360-degree video media.

Example SDP offers for multiple 360-degree video with and without group restrictions are shown in clause Y.8.

Y.6.8.2 Excluding other participants' overlays p. 487

When an ITT4RT-Tx client in terminal sends a 360-degree video media stream to the MRF, it may include an attribute "a= no_other_overlays", which indicates that the MRF shall not group the 360-degree media stream from that ITT4RT-Tx client with overlay media streams from other ITT4RT clients. In this case, the MRF shall group the 360-degree video media stream and one or more overlays of that ITT4RT-Tx client in a separate <rest-group> in the itt4rt_group attribute when describing them to any ITT4RT-Rx client.

The ABNF syntax for this attribute is the following:

att-field	= "no_other_overlays"

Y.6.9 Scene Description-Based Overlays p. 487

Y.6.9.1 General p. 487

ITT4RT clients that support the "Overlay" feature may support the scene description as defined in [183] for signaling the overlay configuration.

If scene description-based overlays are supported, the following subset of the MPEG-I scene description extensions and features shall be supported:

The MPEG_media extension: used to reference the media streams.
The MPEG_accessor_timed and the MPEG_buffer_circular: used to bind timed media.
The MPEG_texture_video: used to define video textures for the overlay and the 360 video.
The scene description update mechanism as defined in clause 5.2.4 of [183].

If scene description-based overlays are used in an ITT4RT session with multiple participants, then the ITT4RT MRF shall be used for the session and shall own the scene description.

If scene description-based overlays are used, then the ITT4RT-TX client in the ITT4RT MRF shall:

Create a sphere or cubemap mesh node (depending on the selected projection) in the scene description for each 360 video stream in the ITT4RT session. The source of the node's texture shall reference the ITT4RT media stream of the corresponding 360 video as signaled by the SDP.
Create a rectangular or spherical mesh node in the scene description for each overlay stream in the ITT4RT session. The source of the node's texture shall reference the media stream of the corresponding overlay stream as signaled by the SDP.
The location of the overlay shall be indicated by the transformation of the corresponding overlay node in the scene description.

The URL format as specified in 23090-14 [183] Annex C shall be used to reference media streams in the ITT4RT session.

For participants that support scene description, the overlay information and positioning that is provided as part of the scene description shall take precedence over any information provided as part of the 3gpp_overlay attribute.

An ITT4RT-Tx client in terminal that offers overlays may select to signal the overlay either through the 3gpp_overlay attribute or through a scene update that adds the overlay node. The scene update mechanism is described in [183]. In case the ITT4RT-Tx uses the 3gpp_overlay attribute to describe its overlays, the ITT4RT-Tx client in the ITT4RT MRF shall generate the scene description or scene description update document that signals the presence and position of that overlay.

Y.6.9.2 Offer/Answer Negotiation p. 488

An ITT4RT-Tx client that desires to use scene description-based overlays, shall offer a data channel with a data channel indicating the "mpeg-sd" sub-protocol. The ITT4RT-Rx client in the MRF that supports scene-based overlays may answer by accepting the scene description data channel.

If the offer is accepted, the ITT4RT MRF shall generate and send the scene description to the offerer upon establishment of the data channel.

If the ITT4RT MRF receives an offer that does not contain a data channel with the "mpeg-sd" sub-protocol, it shall assume that the offering ITT4RT client does not support scene description-based overlays. In such case, the answering ITT4RT MRF shall not add a data channel with the "mpeg-sd" sub-protocol and may describe any overlays using the 3gpp_overlay attribute.

Y.6.9.3 SDP Signaling p. 488

An ITT4RT-Tx in the ITT4RT MRF that supports scene description-based overlays, shall support MTSI data channel media and act as a DCMTSI client. The stream id of the data channel with the sub-protocol "mpeg-sd" shall be in the range allocated for bootstrap data channels, i.e. below 1000, excluding values in Table 6.2.10.1-2. A single data channel with sub-protocol "mpeg-sd" shall be present in the offer/answer SDP. If multiple data channels with the "mpeg-sd" sub-protocol are detected, the one with the lowest stream ID shall be used. The scene description data channel shall be configured as ordered, reliable, with normal SCTP multiplexing priority.

When scene description-based overlays are offered, the ITT4RT-Tx in the ITT4RT MRF shall offer a data channel with a stream id that indicates the "mpeg-sd" subprotocol in the dcmap attribute. The "mpeg-sd" messages shall be JSON formatted in UTF-8 coding without BOM.

Scene description-based overlay descriptions, including complete scene descriptions and scene updates, shall be delivered through the same data channel.