Based on the architecture described in clause Y.2, an SDP framework for immersive video and immersive voice/audio exchange for ITT4RT is presented to negotiate codec support, SEI messages for decoder rendering metadata, as well as RTP/RTCP signaling necessary for viewport dependent processing.
The SDP attributes 3gpp_360video, 3gpp_fisheye, 3gpp_overlay shall be used to indicate respectively a 360-degree projected video stream, a 360-degree fisheye video stream, and a spherical overlay. ITT4RT-Tx clients that support both 360-degree projected video and 360-degree fisheye video may include both 3gpp_360video and 3gpp_fisheye attributes as alternatives in the SDP offer, but an ITT4RT-Rx client willing to receive 360-degree video shall include only one attribute (either 3gpp_360video or 3gpp_fisheye, based on support or selection) in the SDP answer. The 3gpp_overlay attributes may be included in the SDP answer independent on whether projected or fisheye video is selected, since spherical overlays are applicable to both types of 360-degree video streams. The detailed definition and usage of these SDP attributes are presented in the clauses below.
The semantics of the above attribute and parameters is provided below. Unsupported parameters of the 3gpp_360video attribute may be ignored. The payload type is the RTP payload type number of the media stream associated with the 3gpp_360video attribute.
An ITT4RT client supporting the 3gpp_360video attribute shall support the following procedures:
when sending an SDP offer, the ITT4RT client includes the 3gpp_360video attribute in the media description for video in the SDP offer,
when sending an SDP answer, the ITT4RT client includes the 3gpp_360video attribute in the media description for video in the SDP answer if the 3gpp_360video attribute was received in an SDP offer,
after successful negotiation of the 3gpp_360video attribute in the SDP, for the video streams based on the HEVC or AVC codec, the ITT4RT clients exchange an RTP-based video stream containing an HEVC or AVC bitstream with omnidirectional video specific SEI messages as defined in Annex Y.3.
An ITT4RT client supporting the 3gpp_360video attribute supporting use of viewport-dependent processing (VDP) shall include the VDP parameter in the SDP offer and answer. Depending on the value indicated by the VDP parameter, the ITT4RT client shall further support the following procedures:
the RTCP feedback (FB) message described in Annex Y.7.2 of type 'Viewport' to carry requested viewport information during the RTP streaming of media (signalled from the ITT4RT-Rx client to the ITT4RT-Tx client).
An ITT4RT client shall not include VDP parameter in the SDP answer if the SDP offer contains the 3gpp_360video attribute without the VDP parameter.
An ITT4RT-Tx client that supports VDP may use viewport margins to maintain consistent quality during small head motion and also to reduce the need for frequent viewport updates. Viewport margins can be extended on all or some sides of the viewport and may be at the same quality (or resolution) as the viewport or at a quality (or resolution) lower than the viewport but higher than the background. Viewport margins may be extended around the viewport evenly or unevenly depending on head motion or network quality.
An ITT4RT- client supporting the 3gpp_360video attribute with VDP supporting projection may include the Projection parameter indicating the types of projection (e.g. ERP, CMP) it prefers (in the order of preference) in the SDP. An ITT4RT client may respond to an SDP offer with multiple options indicated in the Projection parameter with the agreed option. An ITT4RT-Tx client is not required to provide the preferred form of projection indicated by an ITT4RT-Rx client but may do so when possible.
An ITT4RT-Tx client may support sending a limited 360-degree video.
An ITT4RT-Tx client supporting the 3gpp_360video attribute capable of sending a limited 360-degree video shall include the parameter FOV in its SDP offer to indicate the cfov (Capture FoV) as the extent (range) of the 360-degree video with respect to the unit sphere. The range is expressed in units of 2-16 degrees with an x parameter for azimuth range and a y parameter for elevation range, sent as a comma-seperated tuple. The values for azimuth range shall be in the range of 0 to 360 * 216 (i.e., 23 592 960), inclusive, and the values for elevation range shall be in the range of 0 to 180 * 216 (i.e., 11 796 480), inclusive. In the absence of cfov, the default value of x and y are 360 and 180 degrees, respectively.
An ITT4RT-Rx client supporting the 3gpp_360video attribute capable that wants to receive a limited 360-degree video shall include the parameter FOV in its SDP offer/answer to indicate the pfov (Preferred FoV), where pfov <= cfov in one or both the x and y dimensions when cfov is known. The pfov range is expressed in units of 2-16 degrees with an x parameter for azimuth range and a y parameter for elevation range, sent as a comma-seperated tuple. The values for azimuth range shall be in the range of 0 to 360 * 216 (i.e., 23 592 960), inclusive, and the values for elevation range shall be in the range of 0 to 180 * 216 (i.e., 11 796 480), inclusive. In the absence of pfov, the default value of x and y are 360 and 180 degrees, respectively.
An ITT4RT-Tx client that has received an SDP offer from an ITT4RT-Rx client with the parameter FOV shall include in its SDP answer the parameter FOV to indicate the range of the 360-degree video it will provide. The value is the same as the FOV in the SDP offer or different based on the ITT4RT-Tx client capabilities.
An ITT4RT client supporting the 3gpp_360video attribute with the FOV parameter may include the paramater FOV_CENTER in the SDP. FOV_CENTER is expressed as a comma-separated tuple (x,y), where x is the azimuth (in units of 2-16 degrees) and y is the elevation (in units of 2-16 degrees) with respect to the global coordinates such that the range defined FOV bypasses through the coordinates defined by FOV_CENTER. The values for azimuth shall be in the range of −180 * 216 (i.e., −11 796 480) to 180 * 216 − 1 (i.e., 11 796 479), inclusive, and the values for elevation shall be in the range of −90 * 216 (i.e., −5 898 240) to 90 * 216 (i.e., 5 898 240), inclusive. The imageattr attribute indicates the resolution of the delivered content based on the cfov and pfov options.
An ITT4RT client supporting mixed-quality tiled encoding, mixed-resolution tiled encoding and/or a 360-degree low-quality background frame-packed with an overlapping high-quality viewport shall include the PPM parameter in the 3gpp_360video attribute of the SDP offer.
A list of all supported options as defined by ppm-list in Annex Y.6.2.1 shall be included in the SDP offer, where:
A ppm-value of 1 indicates mixed-quality tiled encoding
A ppm-value of 2 indicates mixed-resolution tiled encoding
A ppm-value set to the comma-separated list 'packing' indicates low-quality viewport-independent background 360-degree video frame-packed with a high-quality viewport (possibly with margins) such that the two regions have overlapping content
An ITT4RT client that receives an SDP offer with a ppm-list of more than one ppm-value shall include only one preferred/supported ppm-value in the SDP answer. An ITT4RT-Rx client that includes the PPM parameter in its SDP offer with the ppm-value set to 'packing' (as defined above in the PPM syntax) shall set all values of the 'packing' to zero. An ITT4RT-Tx client that receives the PPM parameter in an SDP offer with the ppm-value set to 'packing' (as defined above in the PPM syntax) shall set all values of the 'packing' appropriately in the response.
Tiled encoding may be used to deliver the full 360-degree video or a high-quality video which includes the viewport and may include viewport margins.
The ppm-value 'packing' consists of the following six fields:
PPWHQ defines packed_picture_width of the high-quality region in pixels
PPHHQ defines packed_picture_height of the high-quality region in pixels
TRHQ defines transform operations applied on the high-quality region.
PPWLQ defines packed_picture_width of the low-quality region in pixels
PPHLQ defines packed_picture_height of the low-quality region in pixels
TRLQ defines transform operations applied on the low-quality region
The transform operations have a value of 0-7 as defined in Table Y.6.1:
rotation by 180 degrees (counter-clockwise) before mirroring horizontally
rotation by 90 degrees (counter-clockwise) before mirroring horizontally
rotation by 90 degrees (counter-clockwise)
rotation by 270 degrees (counter-clockwise) before mirroring horizontally
rotation by 270 degrees (counter-clockwise)
An ITT4RT-Rx client shall render the high-quality viewport region where these two regions are overlapping. The PPM parameter for defining the HQ and LQ regions should be used when the information remains constant during the session. When the packed regions are not overlapping, the high-quality and low-quality regions do not need to be explicitly defined and SEI messages for region-wise packing may be used instead of the SDP PPM parameter.
Multiple options are provided as a comma-separated list. An ITT4RT client that receives an SDP offer with multiple viewport_ctrl options may include its preferred viewport_ctrl option in the SDP answer. If no options are given in the answer, the sender shall use the first option in the list. If the recommended_viewport is successfully negotiated as viewport_ctrl, the ITT4RT-Rx client should not use viewport prediction when sending the RTCP feedback (FB) message type 'Viewport' to avoid any conflicts with the prediction engine of the ITT4RT-Tx client.
An ITT4RT client that sends an SDP message with at least one 360-degree video/audio and at least one overlay shall include in SDP the attribute itt4rt_group before any media lines. The itt4rt_group attribute is used to group 360-degree media and overlay media using the mid attribute and the syntax for the SDP attribute is:
a=itt4rt_group: <group-1> / … / <group-N>
where <group-X> shall include at least one mid associated with 360-degree media and at least one mid associated with an overlay as defined by the mid attribute in the corresponding media description.
The ABNF syntax for this attribute is the following:
att-field = "itt4rt_group"
att-value = rest-group *[" /" rest-group]
rest-group = 2*(SP identification-tag)
; identification-tag is defined in RFC 5888
An ITT4RT-Tx client and an ITT4RT-Rx client may negotiate the overlays that can be associated with the 360-degree video offered by the ITT4RT-Tx client using the itt4rt_group attribute. An ITT4RT client shall indicate in an offer the overlays to be grouped with the 360-degree video using the itt4rt_group attribute. The overlays that are acceptable shall be retained in the answer and the ones that are not acceptable shall be removed. An ITT4RT-Tx client may offer overlay configuration options using the 3gpp_overlay attribute based on the list of media lines (i.e., potential overlay sources) provided in the itt4rt_group attribute in an SDP offer initiated by an ITT4RT-Rx client. The 3gpp_overlay attribute is offered in an SDP renegotiation.
The order of the media included in the itt4rt_group indicates the synchronization source with the first media always being the synchronization anchor when synchronization is required.
An ITT4RT client that includes the 3gpp_360video with the VDP parameter shall also include in SDP the parameter viewport_size to indicate the size of the device viewport using the azimuth and elevation ranges expressed in degrees. An ITT4RT-Tx client may include the viewport_size of the ITT4RT-Rx client when this is known (e.g., in response to an SDP offer from an ITT4RT-Rx client) or include "viewport=0x0" and the value can be ignored by the ITT4RT-Rx client.
An ITT4RT-Tx client supporting VDP may deliver only the viewport or viewport with viewport margins and not the full captured/preferred field-of-view of the 360-degree video. If the viewport region (with or without a viewport margin) is extracted from a projected picture (e.g., ERP), the resolution would change depending on where the viewport is located on the picture. To avoid this, the ITT4RT-Tx client may rotate the desired viewport region to the centre of the ERP before cropping it to the desired size as indicated by imageattr. The delivered bitstream shall contain the rotation SEI and the region-wise packing SEI message if the ITT4RT-Rx client is expected to do sphere-locked rendering by reversing the rotation of the received image before rendering.
An ITT4RT client may support viewport-locked VDP for delivering the viewport region only. A viewport-locked VDP bit stream should include only the viewport region and should not include rotation SEI messages. An ITT4RT client that supports viewport-locked VDP shall include in its SDP offer the parameter SLVL as defined in Annex Y.6.2.1. The value "SL" refers to sphere-locked rendering, which requires the receiver to render the received picture according to the global coordinate axes. The value "VL" refers to viewport-locked rendering, which require the receiver to render the received picture such that the center of the received picture is aligned to the center of the current viewport.
An ITT4RT client that supports only viewport-locked VDP shall include "VL" in its SDP offer. An ITT4RT client that supports both viewport-locked VDP and a sphere-locked type of VDP shall include "VL,SL". An ITT4RT client that receives an SDP offer with "VL" shall include it in its response if it supports viewport-locked VDP and chooses to use it. An ITT4RT client that receives an SDP offer with "VL" shall remove the VDP parameter in its response if it does not support viewport-locked VDP or does not wish to use it; the 360-degree video is then delivered using viewport-independent processing. An ITT4RT client that receives an SDP offer with "SL" shall include it in its response if it supports sphere-locked VDP and chooses to use it. An ITT4RT client that receives an SDP offer with "SL" shall remove the VDP parameter in its response if it does not support sphere-locked VDP or does not choose to use it; the 360-degree video is then delivered using viewport-independent processing. An ITT4RT client that receives an SDP offer with "VL,SL" shall include either "SL" or "VL" in the SDP response based on its preferred mode. Alternatively, if it does not support nor choose either VL or SL it shall remove the VDP from the 3gpp_360video attribute in the response.
An ITT4RT client include the parameter viewportfb_trigger in the 3gpp_360video attribute to define the minimum view port change to initiate an early or event-based RTCP feedback. The syntax for viewportfb_trigger is defined in Annex Y.6.2.1. D_azimuth and D_elevation are the minimum number of degrees that the viewport may change in the horizontal or vertical direction, respectively. The value D_spherical is the minimum spherical distance in degrees between the center of the old and the new viewport. The values for D_spherical shall be in the range 0 to 180 * 216 - 1 (i.e., 11 796 479). Spherical distance between the centre of a first viewport (x1,y1) and second viewport (x2,y2), is calculated as:
where x1 and x2 is the azimuth in radians and y1 and y2 is the elevation in radians. The value c is in radians and must be converted to degrees for use with D.
The viewport feedback trigger value is estimated by the ITT4RT-Tx client based on the viewport margin configuration it intends to use. An ITT4RT-Rx client supporting RTCP viewport feedback shall use periodic RTCP viewport feedback. The frequency of the periodic feedback should be such that it does not exceed the allocated RTCP bandwidth as defined in RFC 4585. An ITT4RT-Rx client may use immediate/early RTCP feedback in addition to the periodic feedback as long as the allocated RTCP bandwidth requirements are met. An ITT4RT-Tx may define a viewport feedback trigger value for an early/immediate feedback and signal this value to the ITT4RT-Rx client in the SDP. The ITT4RT-Tx client should select a threshold value that is suitable for the margin configuration that it intends to use for that stream. The threshold value should be defined within the viewport margin region such that the ITT4RT-Tx client would update the high-quality region (viewport and viewport margin) if the viewport breaches this threshold.
If an ITT4RT-Rx client does not have the capability to provide an RTCP viewport feedback at the viewport feedback threshold value provided by the ITT4RT-Tx client in an SDP offer, it may respond with the the minimum threshold value it can support. The ITT4RT-Tx client may adjust its viewport margin configuration based on the threshold value in the answer. If an ITT4RT-Rx client only supports periodic feedback, it shall remove the viewportfb_trigger parameter from the response.
An ITT4RT-Rx client that supports a viewport feedback trigger shall include the parameter viewportfb_trigger with the minimum threshold value it can support in an SDP offer. The ITT4RT-Tx client may remove the parameter if it does not support this value or respond with an acceptable value that is equal or higher than the one in the ITT4RT-Rx's offer.
If both sides acknowledge the support of viewportfb_trigger, the ITT4RT-Rx client shall use event-driven/early viewport feedback in addition to periodic feedback. If viewportfb_trigger is not defined by the ITT4RT-Tx client, the ITT4RT-Rx client may still use immediate/early feedback. An ITT4RT-Rx client may use the velocity of the viewport during head motion and the viewport margin (if known) to trigger an immediate feedback. Alternatively, it may use the spherical distance between the viewport in the last feedback and the current viewport to trigger an immediate feedback. The spherical distance can be selected based on viewport margins (if known). An ITT4RT-Rx client may suppress an immediate/early feedback if the time to the next periodic viewport feedback is less than an application-defined threshold.
Still image backgrounds may be supported by ITT4RT clients. The format and signaling shall follow the static image format and signaling as defined in clauses 5.2.4, 6.2.11, and 7.4.8. An ITT4RT-Tx client should send the image/image sequence as a video bitstream if still images are not supported.
The signaling in clause Y.6.2 shall apply to indicate that the still background is 360 degree. The 3gpp_360video or 3gpp_fisheye attribute shall be used for that purpose.