Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 26.902  Word version:  17.0.0

Top   Top   None   None   Next
1…   6…

 

1  Scopep. 6

The present document comprises a technical report on Video Codec Performance, for packet-switched video-capable multimedia services standardized by 3GPP.

2  Referencesp. 6

The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
  • References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.
  • For a specific reference, subsequent revisions do not apply.
  • For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
[1]
RFC 2429:  "RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)".
[2]
RFC 3550:  "RTP: A Transport Protocol for Real-Time Applications", Schulzrinne H. et al, July 2003.
[3]
ITU-T Recommendation H.263 (1998): "Video coding for low bit rate communication".
[4]
TS 26.110: "Codec for Circuit Switched Multimedia Telephony Service; General Description".
[5]
TS 26.111: "Codec for Circuit Switched Multimedia Telephony Service; Modifications to H.324".
[6]
ITU-T Recommendation H.263 - Annex X (2004): "Annex X: Profiles and levels definition".
[7]
ITU-T Recommendation H.264 (2003): "Advanced video coding for generic audiovisual services" | ISO/IEC 14496-10:2003: "Information technology - Coding of audio-visual objects - Part 10: Advanced Video Coding".
[8]
ISO/IEC 14496-10/FDAM1: "AVC Fidelity Range Extensions".
[9]
RFC 3984:  "RTP payload Format for H.264 Video".
[10]
TS 26.141: "IP Multimedia System (IMS) Messaging and Presence; Media formats and codecs".
[11]
TS 26.234: "Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs".
[12]
TS 26.346: "Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs".
[13]
TS 26.235: "Packet switched conversational multimedia applications; Default codecs".
[14]
TS 26.236: "Packet switched conversational multimedia applications; Transport protocols".
[15]
TS 26.114: "IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction".
[16]
TR 26.936: "Performance characterization of 3GPP audio codecs".
[17]
TR 25.101: "User Equipment (UE) radio transmission and reception (FDD)".
Up

3  Abbreviationsp. 7

For the purposes of the present document, the following abbreviations apply:
APSNR
Average PSNR
AVC
Advanced Video Codec
DCCH
Dedicated Control CHannel
DPDCH
Dedicated Physical Data CHannel
DTCH
Dedicated Traffic CHannel
HSPA
High-Speed Packet Access
IETF
Internet Engineering Task Force
IMS
Internet protocol Multimedia Subsystem
IP
Internet Protocol
MAC
Medium Access Control
MBMS
Multimedia Broadcast/Multicast Service
MSE
Mean Square Error
MTSI
Multimedia Telephony over IMS
NAL
Network Abstraction Layer
NSD
Normalized Square Difference
PANSD
PSNR of Average Normalized Square Difference
PDU
Protocol Data Unit
PDVD
Percentage of Degraded Video Duration
PSC
Packet-Switched Conversational
PSNR
Peak Signal-to-Noise Ratio
PSS
Packet-switched Streaming Service
RFC
IETF Request For Comments
RLC
Radio Link Control
RTCP
RTP Control Protocol
RTP
Real-time Transport Protocol
SDP
Session Description Protocol
TTI
Transmission Time Interval
UDP
User Datagram Protocol
UE
User Equipment
UTRAN
UMTS Terrestrial Radio Access Network
Up

4  Document organizationp. 7

The present document is organized as discussed below.
  • Clause 5 introduces the service scenarios, including their relationship with 3GPP services. Furthermore, it discusses the performance measurement metrics used in the present document.
  • Clause 6 (performance figures) defines representative test cases and contains a listing, in the form of tables, performance of video codecs for each of the test cases.
  • Clause 7 (supplementary information on figure generation) contains pointers to accompanying files containing video sequences, anchor bit streams, and error prone test bit streams. It also describes the mechanisms used to generate the anchor compressed video data, compressed video data exposed to typical error masks, and descriptions on the creation of error masks.
  • Annex A sketches one possible environment that could be used by interested parties as a starting point for defining a process to assess the performances of a particular video codec against the performance figures.
  • Annex B introduces details on the H.263 encoder and decoder configurations.
  • Annex C introduces details of the H.264 encoder and decoder configurations
  • Annex D introduces details on the usage of 3G file format in the present document.
  • Annex E introduces details on the usage of RTPdump format in the present document.
  • Annex F introduces details on the simulator, bearers, and dump files.
  • Annex G introduces the details on the Quality Metric Evaluation.
  • Annex H introduces the details on the Video Test Sequences.
  • Annex I provides information on verification of appropriate use of the tools provided in this document.
Up

5  Service scenarios and metricsp. 8

Video transmission in a 3GPP packet-switched environment conceptually consists of an Encoder, one or more Channels, and a Decoder. The Encoder, as defined here, comprises the steps of the source coding and, when required by the service, the packetization into RTP packets, according to the relevant 3GPP Technical Specification for the service and media codec in question. The Channel, as defined here comprises all steps of conveying the information created by the Encoder to the Decoder. Note that the Channel, in some environments, may be prone to packet erasures, and in others it may be error free. In an erasure prone environment, it is not guarantied that all information created by the Encoder can be processed by the Decoder; implying that the Decoder needs to cope to some extent with compressed video data not compliant with the video codec standard. The Decoder, finally, de-packetizes and reconstructs the - potentially erasure prone and perhaps non-compliant - packet stream to a reconstructed video sequence. The only type of error considered at the depacketizer/decoder is RTP packet erasures.
Up

5.1  Service scenariosp. 8

3GPP includes video in different services, e.g. PSS [11], MBMS [12], PSC [13], [14], and MTSI [15]. This report lists the performance figures only one service scenario focusing on an RTP-based conversational service such as PSC or MTSI.
  • Service scenario A (PSC/MTSI-like) relates to conversational services involving compressed video data (an erasure prone transport, low latency requirements, application layer transport quality feedback, etc.). In this scenario, UE-based video encoding and decoding are assumed. The foremost examples for this service scenario are PSC or MTSI. Within the this service scenario, the performance of an encoder and a decoder is of importance for the service quality. Service scenario A refers to the performance of a decoder to consume a possibly non-compliant (due to transmission errors) compressed video data generated by an encoder that fulfils the provision of sufficient quality in this scenario.
Up

5.2  Performance metricsp. 8

This clause defines performance metrics as used in clause 6, to numerically and objectively express a Decoder's reaction to compressed video data that is possibly modified due to erasures. Only objective metrics are considered which can be computed from sequences being available in a 3G format as described in Annex D by using the method detailed in annex G.
The following section provides a general description of the quality metrics. For the exact computation with the availability of sequences in 3G format please refer to annex G.
The following acronyms are utilized throughout the remainder of this subclause:
  • OrigSeq: The original video sequence that has been used as input for the video encoder.
  • ReconSeq: The reconstructed video sequence, the output of a standard compliant decoder that operates on the output of the video encoder without channel simulation, i.e. without any errors. Timing alignment between the OrigSeq and ReconSeq are assumed.
  • ReceivedSeq: The video sequence that has been reconstructed and error-concealed by an error-tolerant video decoder, after a) the video encoder operated on the OrigSeq and produced an error free packet stream file as output, b) the channel simulator used the error free packet stream file and applied errors and delays to it so to produce an error-prone packet file which is used as the input of the error-tolerant video decoder. For comparison purpose, a constant delay between OrigSeq and the ReceivedSeq is assumed, whereby this constant delay is removed before comparison.
Each of the following metrics generates a single value when run for a complete video sequence.
Up

5.2.1  Average Peak Signal-to-Noise Ratio (APSNR)p. 9

The average Peak Signal-to-Noise Ratio (APSNR) calculated between all pictures of the OrigSeq and the ReconSeq or the ReceivedSeq, respectively. First, the Peak Signal-to-Noise Ratio (PSNR) of each picture is calculated with a precision sufficient to prevent rounding errors in the future steps. Thereafter, the PSNR values of all pictures are averaged. The result is reported with a precision of two digits.
Only the luminance component of the video signal is used.
In case that results from several ReceivedSeq are to be combined, the average of all PSNR values for all ReceivedSeq is computed as the final result.
Up

5.2.2  PSNR of Average Normalized Square Difference (PANSD)p. 9

The PSNR of Average Normalized Square Difference (PANSD) is calculated between all pictures of the OrigSeq and the ReceivedSeq, respectively. First, the normalized square difference, also know as Mean Square Error (MSE) of each picture is calculated with a precision sufficient to prevent rounding errors in the future steps. Thereafter, the NSD values of all pictures are averaged. The result is reported with a precision of two digits. Then, a conversion of this value into a PSNR value is carried out.
Only the luminance component of the video signal is used.
In case that results from several ReceivedSeq are to be combined, the average of all NSD values for all ReceivedSeq is computed and the final result is the PSNR over this averaged NSD.
Up

5.2.3  Percentage of Degraded Video Duration (PDVD)p. 9

The Percentage of Degraded Video Duration (PDVD) is defined as the percentage of time of the entire display time for which the PSNR of the erroneous video frames are more than x dB worse than PSNR of the reconstructed frames whereby x is set to 2 dB. This metric computation requires three sequences, the OrigSeq, the ReconSeq, and the ReceivedSeq.
Only the luminance component of the video signal is used.
In case that results from several ReceivedSeq are to be combined, the average of all PDVD values for all ReceivedSeq is computed as the final result.
Up

Up   Top   ToC