Tech-invite3GPPspaceIETF RFCsSIP
Quick21222324252627282931323334353637384‑5x

Content for  TS 26.118  Word version:  17.0.0

Top   Top   Up   Prev   Next
0…   4…   4.2   4.3   4.4   4.5…   5…   5.1.4…   5.1.5…   5.1.6…   5.1.7…   5.2…   6…   7…   A…   B…   C…

 

7  MetadataWord‑p. 61

7.1  Presentation without Pose Information to 2D ScreensWord‑p. 61

In several devices, the VR sensor providing pose information may not be available or may be disabled by the user and the presentation of the VR360 presentation is on a 2D screen. In this case, the receiver needs to rely on other information to determine a proper rendering of the VR360 presentation that is to be presented at a specific media time.
For this purpose, a VR Media Presentation may include the recommended viewport metadata. The recommended viewport metadata may be encapsulated in a timed metadata track in either a file or a DASH representation.
Receivers presenting on a 2D screen and not implementing a viewport sensor should implement the recommended viewport processing to process the recommended viewport metadata and to render VR video or audio accordingly.
Receivers implementing a viewport sensor may implement the recommended viewport processing.
If the viewport sensor is not implemented at the receiver, or if the viewport sensor is disabled (permanently or temporarily), the receiver should process the recommended viewport metadata, if present.
If the viewport metadata is provided in the VR Media Presentation and if processing is supported and applied, then the Receiver shall render the viewport indicated metadata.
Up

8  VR PresentationWord‑p. 62

8.1  DefinitionWord‑p. 62

A VR presentation provides an omnidirectional audio-visual experience. A 3GPP VR Presentation is a VR Presentation for which each of the VR Tracks contained in a VR Presentation are aligned to the 3GPP DOF Reference System as defined in clause 4 and are time-synchronized.

8.2  3GPP VR FileWord‑p. 62

A 3GPP VR Presentation may be provided in an ISO BMFF conforming file. A 3GPP VR File is defined as a file that conforms to ISO BMFF [17] and for which at least two tracks are present whereby:
  • at least one track conforms to a 3GPP VR Track according to a video media profile defined in clause 5,
  • at least one track conforms to a 3GPP VR Track according to an audio media profile defined in clause 6.
Conformance to a 3GPP VR File may be signalled with a compatibility brand '3gvr'.
Up

8.3  3GPP VR DASH Media PresentationWord‑p. 62

A 3GPP VR Presentation may be provided in DASH Media Presentation. A 3GPP VR DASH Media Presentation is defined as a DASH Media Presentation that conforms to 3GP DASH and for which at least two Adaptation Sets are present whereby
  • at least one Adaptation Set conforms to an Adaptation Set for a video media profile defined in clause 5,
  • at least one Adaptation Set conforms to an Adaptation Set for an audio media profile defined in clause 6.
Conformance to a 3GPP VR File may be signalled with an MPD @profiles parameter 'urn:3gpp:vrstream:presentation'.
Up

9  VR Metrics |R16|Word‑p. 62

9.1  GeneralWord‑p. 62

VR metrics is a functionality where the client collects specific quality-related metrics during a session. These collected metrics can then be reported back to a network side node for further analysis. The metric functionality is based on the QoE metrics concept in 3GP-DASH TS 26.247, but further extended to also cover VR-specific metrics. A VR client supporting VR metrics shall support all metrics listed in clause 9.3, and shall handle metric configuration and reporting as specified in clauses 9.4 and 9.5.
Up

9.2  VR Client Reference ArchitectureWord‑p. 62

9.2.1  ArchitectureWord‑p. 62

The access engine fetches the MPD, constructs and issues segment requests for relevant adaptation sets or preselections as ordered by the VR application, and receives segments or parts of segments. It may also adapt between different representations due to changes in available bitrate. The access engine provides a conforming 3GPP VR track to the file decoder.
The interface from the access engine towards MCC is referred to as observation point 1 (OP1) and is defined to monitor:
  • A sequence of transmitted network requests, each defined by its transmission time, contents, and the TCP connection on which it is sent
  • For each network response, the reception time and contents of the response header and the reception time of each byte of the response body
  • The projection/orientation metadata carried in network manifest file if applicable
  • The reception time and intended playout time for each received segment
Up

9.2.2  Observation Point 1Word‑p. 63

The client reference architecture for VR metrics, shown below in Figure 9.2.1-1, is based on the client architecture in Figure 4.3-1. It also contains a number of observation points where specific metric-related information can be made available to the Metrics Collection and Computation (MCC) function. The MCC can use and combine information from the different observation points to calculate more complex metrics.
Note that these observation points are only defined conceptually, and might not always directly interface to the MCC. For instance, an implementation might relay information from the actual observation points to the MCC via the VR application. It is also possible that the MCC is not separately implemented, but simply included as an integral part of the VR application.
Also note that in this version of this specification not all of the described observation points are necessarily used to produce VR metrics.
Copy of original 3GPP image for 3GPP TS 26.118, Figure 9.2.1-1: Client reference architecture for VR metrics
Up

9.2.3  Observation Point 2Word‑p. 63

The file decoder processes the 3GPP VR Track and typically includes a file parser and a media decoder. The file parser processes the file or segments, extracts elementary streams, and parses the metadata, if present. The processing may be supported by dynamic information provided by the VR application, for example which tracks to choose based on static and dynamic configurations. The media decoder decodes media streams of the selected tracks into the decoded signals. The file decoder outputs the decoded signals and metadata which is used for rendering.
The interface from the file decoder towards MCC is referred to as observation point 2 (OP2) and is defined to monitor:
  • Media resolution
  • Media codec
  • Media frame rate
  • Media projection, such as region wise packing, region wise quality ranking, content coverage
  • Mono vs. stereo 360 video
  • Media decoding time
Up

9.2.4  Observation Point 3Word‑p. 64

The sensor extracts the current pose according to the user's head and/or eye movement and provides it to the renderer for viewport generation. The current pose may also be used by the VR application to control the access engine on which adaptation sets or preselections to fetch.
The interface from the sensor towards MCC is referred to as observation point 3 (OP3) and is defined to monitor:
  • Head pose
  • Gaze direction
  • Pose timestamp
  • Depth

9.2.5  Observation Point 4Word‑p. 64

The VR Renderer uses the decoded signals and rendering metadata, together with the pose and the knowledge of the horizontal/vertical field of view, to determine a viewport and render the appropriate part of the video and audio signals.
The interface from the media presentation towards MCC is referred to as observation point 4 (OP4) and is defined to monitor:
  • The media type
  • The media sample presentation timestamp
  • Wall clock counter
  • Actual presentation viewport
  • Actual presentation time
  • Actual playout frame rate
  • Audio-to-video synchronization
  • Video-to-motion latency
  • Audio-to-motion latency
Up

9.2.6  Observation Point 5 |R17|Word‑p. 64

The VR application manages the complete device, and controls the access engine, the file decoder and the rendering based on media control information, the dynamic user pose, and the display and device capabilities.
The interface from the VR application towards MCC is referred to as observation point 5 (OP5) and is defined to monitor:
  • Display resolution
  • Max display refresh rate
  • Field of view, horizontal and vertical
  • Eye to screen distance
  • Lens separation distance
  • OS support, e.g. OS type, OS version

9.3  Metrics DefinitionsWord‑p. 65

9.3.1  GeneralWord‑p. 65

As the VR metrics functionality is based on the DASH QoE metrics TS 26.247, all metrics already defined in TS 26.247 are valid also for a VR client. Thus the following sub-clauses only define additional VR-related metrics.

9.3.2  Comparable quality viewport switching latencyWord‑p. 65

The comparable quality viewport switching latency metric reports the latency and the quality-related factors when viewport movement causes quality degradations, such as when low-quality background content is briefly shown before the normal higher-quality is restored. Note that this metric is only relevant if the Advanced Video Media profile and region-wise packing is used. Also note that the metric currently does not report factors related to foveated rendering.
The viewport quality is represented by two factors; the quality ranking (QR) value, and the pixel resolution of one or more regions within the viewport. The resolution is defined by the orig_width and orig_height values in ISO/IEC 23090-2 [13] in SRQR (Spherical-Region Quality Ranking) or 2DQR (2-Dimensional Quality Ranking). The resolution corresponds to the monoscopic projected picture from which the packed region covering the viewport is extracted.
In order to determine whether two viewports have a comparable quality, if more than one quality ranking region is visible inside the viewport, the aggregated viewport quality factors are calculated as the area-weighted average for QR and the area-weighted (effective) pixel resolution, respectively.
For instance, if 60% of the viewport is from a region with QR=1, Res=3840 x 2160, and 40% is from a region with QR=2, Res=960 x 540, then the average QR is 0.6 x 1 + 0.4 x 2, and the effective pixel resolution is 0.6 x 3840 x 2160 + 0.4 x 960 x 540 (also see Annex D.1 for more examples).
If the viewport is moved so that the current viewport includes at least one new quality ranking region (i.e. a quality ranking region not included in the previous viewport), a switch event is started. The list of quality factors related to the last evaluated viewport quality before the switch are assigned to the firstViewport log entry. The start time of the switch is also set to the time of the last evaluated viewport before the switch.
The end time for the switch is defined as when both the weighted average QR and the effective resolution for the viewport reach values comparable to the ones before the switch. A value is comparable if it is not more than QRT% (QR threshold) or ERT% (effective resolution threshold) worse than the corresponding values before the switch. If comparable values are not achieved within N milliseconds, a timeout occurs (for instance if an adaptation to a lower bitrate occurs, and the viewport never reaches comparable quality).
Note that smaller QR values and larger resolution values are better. For instance, QRT=5% would require a weighted average QR value equal or smaller than 105 % of the weighted average QR before the switch, but ERT=5% would require an effective resolution value equal or larger than 95% of the effective resolution before the switch.
The list of quality factors related to the viewport which fulfills both thresholds are assigned to the secondViewport log entry, and the latency (end time minus start time) is assigned to the latency log entry. In case of a timeout, this is indicated under the cause log entry.
During the duration of the switch the worst evaluated viewport is also stored, and assigned to the worstViewport log entry. The worst viewport is defined as the viewport with the worst relative weighted average QR or relative effective resolution, as compared to the values before the switch.
If a new viewport switching event occurs (e.g. yet another new region becomes visible) before an ongoing switch event has ended, only the N milliseconds timeout is reset. The ongoing measurement process continues to evaluate the viewport quality until a comparable viewport quality value is achieved (or a timeout occurs).
The observation points needed to calculate the metrics are:
  • OP2 File Decoder: SRQR/2DQR information
  • OP3 Sensor: Gaze information
  • OP4 VR Renderer: Start of switch event detection (alternatively, region coverage information from SRQR/2DQR can be used as strict rendering pixel-exactness is not required)
  • OP5 VR Application: Field-of-view information of the device
The accuracy of the measured latency depends on how the client implements the viewport switching monitoring. As this might differ between clients, the client shall report the estimated accuracy.
The thresholds QRT, ERT, and the timeout N, can be specified during metrics configuration (see clause 9.4) as attributes within parenthesis, e.g. "CompQualLatency (QRT=3.5,ERT=6.8,N=900)". If a threshold or the timeout is not specified the client shall use appropriate default values.
The data type ViewportDataType is defined in Table 9.3.2-1 below, and identifies the direction and coverage of the viewport.
Key Type Description
ViewportDataTypeObject
centre_azimuthIntegerSpecifies the azimuth of the centre of the viewport in units of 2−16 degrees. The value shall be in the range of −180 * 216 to 180 * 216 − 1, inclusive.
centre_elevationIntegerSpecifies the elevation of the centre of the viewport in units of 2−16 degrees. The value shall be in the range of −90 * 216 to 90 * 216, inclusive.
centre_tiltIntegerSpecifies the tilt angle of the viewport in units of 2−16 degrees. The value shall be in the range of −180 * 216 to 180 * 216 − 1, inclusive.
azimuth_rangeIntegerSpecifies the azimuth range of the viewport through the centre point of the viewport, in units of 2−16 degrees.
elevation_rangeIntegerSpecifies the elevation range of the viewport through the centre point of the viewport, in units of 2−16 degrees.
The data type Viewport-Item is defined as shown in Table 9.3.2-2. Viewport-Item is an Object which identifies a viewport and quality-related factors for the region(s) covered by the viewport.
Key Type Description
ViewportItemObject
PositionViewportDataTypeIdentifies the viewport
QualityLevelsListList of different quality levels regions within the viewport
CoverageFloatPercentage of the viewport area covered by this region
QRIntegerQuality ranking (QR) value of this region
ResolutionObjectResolution for this region
WidthIntegerHorizontal resolution for this region
HeightIntegerVertical resolution for this region
The comparable quality viewport switching latency metric is specified in Table 9.3.2-3 below.
Key Type Description
CompQualLatencyListList of comparable quality viewport switching latencies
EntryObject
firstViewportViewportItemSpecifies information about the first viewport
secondViewportViewportItemSpecifies information about the second viewport
worstViewportViewportItemSpecifies information about the worst viewport seen during the switch duration
timeReal-TimeWall-clock time when the switch started
MtimeMedia-TimeMedia presentation time when the switch started.
LatencyIntegerSpecifies the switching delay in milliseconds.
AccuracyIntegerSpecifies the estimated accuracy of the latency metric in milliseconds
CauseListSpecifies a list of possible causes for the latency.
EntryObject
codeEnumA possible cause for the latency. The value is equal to one of the following:
  • 0: Segment duration
  • 1: Buffer fullness
  • 2: Availability of comparable quality segment
  • 3: Timeout
Up

9.3.3  Rendered viewportsWord‑p. 67

The rendered viewports metric reports a list of viewports that have been rendered during the media presentation.
The client shall evaluate the current viewport gaze every X ms and potentially add the viewport to the rendered viewport list. To enable frequent viewport evaluations without necessarily increasing the report size too much, consecutive viewports which are close to each other may be grouped into clusters, where only the average cluster viewport data is reported. Also, clusters which have too short durations may be excluded from the report.
The viewport clustering is controlled by an angular distance threshold D. If the center (i.e. the azimuth and the elevation) of the current viewport is closer than the distance D to the current cluster center (i.e. the average cluster azimuth and elevation), the viewport is added to the cluster. Note that the distance is only compared towards the current (i.e. last) cluster, not to any earlier clusters which might have been created.
If the distance to the cluster center is instead equal to or larger than D, a new cluster is started based on the current viewport, and the average old cluster data and the start time and duration for the old cluster is added to the viewport list.
Before reporting a viewport list, a filtering based on viewport duration shall be done. Each entry in the viewport list is first assigned an "aggregated duration" equal to the duration of that entry. Then, for each entry E, the other entries in the viewport list are checked. The duration for a checked entry is added to the aggregated duration for entry E, if the checked entry is both less than T ms away from E, and closer than the angular distance D from E.
After all viewport entries have been evaluated and have received a final aggregated duration, all viewport entries with an aggregated duration of less than T are deleted from the viewport list (and thus not reported). Note that the aggregated duration is only used for filtering purposes, and not itself included in the viewport list reports.
Some examples of metric calculation are shown in Annex D.2.
The observation points needed to calculate the metrics are:
  • OP3 Sensor: Gaze information
  • OP5 P5 VR Application: Field-of-view information of the device
The viewport sample interval X (in ms), the distance threshold D (in degrees), and the duration threshold T (in ms) can be specified during metrics configuration as attributes within parenthesis, e.g. "RenderedViewports (X=50,D=15,T=1500)". Note that if no clustering or duration filtering is wanted, the D and T thresholds can be set to 0 (e.g. specifying "RenderedViewports (X=1000,D=0,T=0)" will just log the viewport every 1000 ms). If no sample interval or thresholds values are specified the client shall use appropriate default values.
The rendered viewports metric is specified in Table 9.3.3-1.
Key Type Description
RenderedViewportsListList of rendered viewports
EntryObject
startTimeMedia-TimeSpecifies the media presentation time of the first played out media sample when the viewport cluster indicated in the current entry is rendered starting from this media sample.
durationIntegerThe time duration, in units of milliseconds, of the continuously presented media samples when the viewport cluster indicated in the current entry is rendered starting from the media sample indicated by startTime.
"Continuously presented" means that the media clock continued to advance at the playout speed throughout the interval.
viewportViewportDataTypeIndicates the average region of the omnidirectional media corresponding to the viewport cluster being rendered starting from the media sample time indicated by startTime.
Up

9.3.4  VR Device informationWord‑p. 68

This metric contains information about the device, and is logged at the start of each session and whenever changed (for instance if the rendered field-of-view for the device is adjusted). If an individual metric cannot be logged, its value shall be set to 0 (zero) or to the empty string.
The observation point needed to report the metrics is:
  • OP5 VR Application: Device Information
Key Type Description
VrDeviceInformationListA list of device information objects.
EntryObjectA single object containing new device information.
startReal-TimeWall-clock time when the device information was logged.
mstartMedia-TimeThe presentation time at which the device information was logged.
deviceIdentifierStringThe brand, model and version of the device.
horizontalResolutionIntegerThe horizontal display resolution, per eye, in pixels.
verticalResolutionIntegerThe vertical display resolution, per eye, in pixels.
horizontalFoVIntegerMaximum horizontal field-of-view, per eye, in degrees.
verticalFoVIntegerMaximum vertical field-of-view, per eye, in degrees.
renderedHorizontalFoVIntegerCurrent rendered horizontal field-of-view, per eye, in degrees.
renderedVerticalFoVIntegerCurrent rendered vertical field-of-view, per eye, in degrees.
refreshRateIntegerDisplay refresh rate, in Hz
Up

9.4  Metrics Configuration and ReportingWord‑p. 69

9.4.1  ConfigurationWord‑p. 69

Metrics configuration is done according to clauses 10.4 and 10.5 in DASH TS 26.247, but can also include any metrics defined in clause 9.3.

9.4.2  ReportingWord‑p. 69

Metrics reporting is done according to clauses 10.4 and 10.6 in DASH TS 26.247, with the type QoeReportType extended to handle the additional VR-specific metrics according the XML schema in clause 9.4.3. In this version of the specification the element vrMetricSchemaVersion shall be set to 1.
Up

9.4.3  Reporting FormatWord‑p. 69

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="urn:3gpp:metadata:2020:VR:metrics"
    xmlns:hsd="urn:3gpp:metadata:2011:HSD:receptionreport"
    xmlns="urn:3gpp:metadata:2020:VR:metrics" elementFormDefault="qualified">
    <xs:complexType name="VrQoeReportType">
        <xs:complexContent>
            <xs:extension base="QoeReportType">
                <xs:sequence>
                    <xs:element name="vrMetric" type="VrMetricType"
                        minOccurs="0" maxOccurs="unbounded"/>
                    <xs:element name="vrMetricSchemaVersion" type="unsignedInt"/>
                    <xs:any namespace="##other" processContents="lax"
                        minOccurs="0" maxOccurs="unbounded"/>
                </xs:sequence>
            </xs:extension>
        </xs:complexContent>
        <xs:anyAttribute processContents="skip"/>
    </xs:complexType>
    <xs:complexType name="VrMetricType">
        <xs:choice maxOccurs="unbounded">
            <xs:element name="compQualLatency" type="CompQualLatencyType"
                maxOccurs="unbounded"/>
            <xs:element name="renderedViewports" type="RenderedViewportsType"
                maxOccurs="unbounded"/>
            <xs:element name="vrDeviceInformation" type="VrDeviceInformationType"
                maxOccurs="unbounded"/>
            <xs:any namespace="##other" processContents="lax"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:choice>
        <xs:anyAttribute processContents="skip"/>
    </xs:complexType>
    <xs:complexType name="CompQualLatencyType">
        <xs:sequence>
            <xs:element name="firstViewport" type="ViewportItem"/>
            <xs:element name="secondViewport" type="ViewportItem"/>
            <xs:element name="worstViewport" type="ViewportItem"/>
            <xs:element name="time" type="xs:dateTime"/>
            <xs:element name="mtime" type="xs:duration"/>
            <xs:element name="latency" type="xs:unsignedInt"/>
            <xs:element name="accuracy" type="xs:unsignedInt"/>
            <xs:element name="cause" type="unsignedInt" minoccurs="0" maxoccurs="unbounded"/>
            <xs:any namespace="##other" processContents="lax" 
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute processContents="skip"/>
    </xs:cmplexType>
    <xs:complexType name="RenderedViewportsType">
        <xs:sequence>
            <xs:element name="startTime" type="xs:duration"/>
            <xs:element name="duration" type="xs:unsignedInt"/>
            <xs:element name="viewport" type="ViewportDataType"/>
            <xs:any namespace="##other" processContents="lax"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute processContents="skip"/>
    </xs:cmplexType>
    <xs:complexType name="VrDeviceInformationType">
        <xs:sequence>
            <xs:element name="start" type="xs:dateTime"/>
            <xs:element name="mstart" type="xs:duration"/>
            <xs:element name="deviceIdentifier" type=cs:string/>
            <xs:element name="horizontalResolution" type=cs:unsignedInt/>
            <xs:element name="verticalResolution" type=cs:unsignedInt/>
            <xs:element name="horizontalFoV" type=cs:unsignedInt/>
            <xs:element name="verticalFoV" type=cs:unsignedInt/>
            <xs:element name="renderedHorizontalFoV" type=cs:unsignedInt/>
            <xs:element name="renderedVerticalFoV" type=cs:unsignedInt/>
            <xs:element name="refreshRate" type=cs:unsignedInt/>
            <xs:any namespace="##other" processContents="lax"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute processContents="skip"/>
    </xs:cmplexType>
    <xs:complexType name="ViewportItem">
        <xs:sequence>
            <xs:element name="position" type="ViewportDataType"/>
            <xs:element name="qualityLevel" type="QualityLevelEntry" maxOccurs="unbounded"/>
            <xs:any namespace="##other" processContents="lax"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute processContents="skip"/>
    </xs:complexType>
    <xs:complexType name="ViewportDataType">
        <xs:sequence>
            <xs:element name="centreAzimuth" type="xs:unsignedInt"/>
            <xs:element name="centreElevation" type="xs:unsignedInt"/>
            <xs:element name="centreTilt" type="xs:unsignedInt"/>
            <xs:element name="azimuthRange" type="xs:unsignedInt"/>
            <xs:element name="elevationRange" type="xs:unsignedInt"/>
            <xs:any namespace="##other" processContents="lax"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute processContents="skip"/>
    </xs:cmplexType>
    <xs:complexType name="QualityLevelEntry">
        <xs:sequence>
            <xs:element name="coverage" type="xs:double"/>
            <xs:element name="qr" type="xs:unsignedInt"/>
            <xs:element name="width" type="xs:unsignedInt"/>
            <xs:element name="height" type="xs:unsignedInt"/>
            <xs:any namespace="##other" processContents="lax"
                minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute processContents="skip"/>
    </xs:complexType>
</xs:schema>
Up

Up   Top   ToC