Tech-invite3GPPspaceIETFspace
21222324252627282931323334353637384‑5x

Content for  TR 26.906  Word version:  17.0.0

Top   Top   Up   Prev   Next
1…   5…   6…

 

5  Overview of H.265 (HEVC)p. 8

5.1  Key coding-tool features of H.265 (HEVC) and differences versus H.264 (AVC)p. 8

Similar to earlier hybrid-video-coding based standards, including H.264 (AVC), the following basic video coding design is employed by H.265 (HEVC). Prediction signal is first formed either by intra or motion compensated prediction, and the residual (the difference between the original and the prediction) is then coded. The gains in coding efficiency are achieved by redesigning and improving almost all parts of the codec over earlier designs. In addition, H.265 (HEVC) includes several tools to make the implementation on parallel architectures easier. Below is a summary of key H.265 (HEVC) coding-tool features, and a more elaborate list can be found in [2]:
  • Quadtree block and transform structure: One of the major tools that contribute significantly to the coding efficiency of H.265 (HEVC) is the usage of flexible coding blocks and transforms, which are defined in a hierarchical quad-tree manner. Unlike H.264 (AVC), where the basic coding block is a macroblock of fixed size 16x16, H.265 (HEVC) defines a Coding Tree Unit (CTU) of a maximum size of 64x64. Each CTU can be divided into smaller units in a hierarchical quad-tree manner and can represent smaller blocks of size 4x4. Similarly, the transforms used in H.265 (HEVC) can have different sizes, starting from 4x4 and going up to 32x32.
Utilizing large blocks and transforms contribute to the major gain of H.265 (HEVC), especially at high resolutions:
  • Entropy coding: H.265 (HEVC) uses a single entropy coding engine, which is based on Context Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 (AVC) uses two distinct entropy coding engines. CABAC in H.265 (HEVC) shares many similarities with CABAC of H.264 (AVC), but contains several improvements. Those include improvements in coding efficiency and lowered implementation complexity, especially for parallel architectures.
  • In-loop filtering: H.264 (AVC) includes an in-loop adaptive deblocking filter, where the blocking artefacts around the transform edges in the reconstructed picture are smoothed to improve the picture quality and compression efficiency. In H.265 (HEVC), a similar deblocking filter is employed but with somewhat lower complexity. In addition, pictures undergo a subsequent filtering operation called Sample Adaptive Offset (SAO), which is a new design element in H.265 (HEVC). SAO basically adds a pixel level offset in an adaptive manner and usually acts as a de-ringing filter. It is observed that SAO improves the picture quality, especially around sharp edges contributing substantially to visual quality improvements of H.265 (HEVC).
  • Motion prediction and coding: There have been a number of improvements in this area that are summarized as follows:
    • Merge and Advanced Motion Vector Prediction (AMVP) modes: The motion information of a prediction block can be inferred from the spatially or temporally neighbouring blocks. This is similar to the DIRECT mode in H.264 (AVC) but includes new aspects to incorporate the flexible quad-tree structure and methods to improve the parallel implementations. In addition, the motion vector predictor can be signalled for improved efficiency.
    • High precision interpolation: The interpolation filter length is increased to 8-tap from 6-tap, which improves the coding efficiency but also comes with increased complexity. In addition, interpolation filter is defined with higher precision without any intermediate rounding operations to further improve the coding efficiency.
  • Intra prediction and intra coding: Similar to motion prediction, intra prediction has many improvements, which can be summarized as:
    • Compared to 8 intra prediction modes of H.264 (AVC), H.265 (HEVC) supports angular intra prediction with 33 directions. This increased flexibility improves both objective coding efficiency and visual quality as the edges can be better predicted and ringing artefacts around the edges are reduced.
    • The reference samples are adaptively smoothed based on the prediction direction. In addition, to avoid contouring artefacts, a new interpolative prediction generation is included to improve the visual quality.
    • Discrete Sine Transform (DST) is utilized instead of traditional Discrete Cosine Transform (DCT) for 4x4 intra transform blocks.
  • Other coding-tool features: H.265 (HEVC) includes some tools for lossless coding and efficient screen content coding:
    • Lossless coding: H.265 (HEVC) allows certain part of the coded picture to be coded in a lossless manner by setting a dedicated flag equal to 1.
    • Screen content coding: H.265 (HEVC) includes some tools to better code computer generated screen content, such as skipping the transform coding for certain blocks. These tools are particularly useful for example when streaming the user-interface of a mobile device to a large display.
Up

5.2  Complexity of H.265 (HEVC)p. 10

Measuring the complexity of a video codec is a difficult task, due to different constraints of different architectures. For example, for hardware implementations CABAC might not be very problematic but for software implementations it could become a bottleneck, especially at higher bitrates. Nevertheless, there had been several studies that analyses the complexity of H.265 (HEVC), and the conclusions could be roughly summarized as follows (see also [3] and [4]):
  • H.265 (HEVC) Decoder: Even though many parts of H.265 (HEVC) are more complex than their counterparts in H.264 (AVC) (e.g. motion compensation, intra prediction), some parts are easier to implement (e.g. CABAC, deblocking filter). Therefore, the additional complexity of H.265 (HEVC) decoder over H.264 (AVC) decoder is not expected to be substantial.
  • H.265 (HEVC) Encoder: As well known, the standard does not define how the encoding is performed, which means there will be various encoders with different complexity-quality trade-offs. However, it is estimated that the encoder complexity of H.265 (HEVC) needs to be higher than that of H.264 (AVC), in order to achieve the coding efficiency gains of H.265 (HEVC). The main reason for that is that there exists higher number of combinations to be tested during the rate-distortion optimization as H.265 (HEVC) supports more flexible partitioning of blocks and transforms. It should be noted that the parallel processing tools are mostly useful for encoders and their efficient utilization is expected to improve the complexity aspects of H.265 (HEVC) encoders. It is also expected that there will be significant efforts over the coming years to develop efficient methods for H.265 (HEVC) encoding.
Some more existing complexity analyses of H.265 (HEVC) and H.264 (AVC) can be found in [3] to [8], where [3] and [5] to [8] reported real-time H.265 (HEVC) decoding by H.265 (HEVC) decoder implementations based on ARM platforms.
Up

5.3  Systems and transport interfaces of H.265 (HEVC) and differences versus H.264 (AVC)p. 10

H.265 (HEVC) inherited the basic systems and transport interfaces designs, such as parameter sets and network abstraction layer (NAL) units based syntax structure, the hierarchical syntax and data unit structure from sequence-level parameter sets, multi-picture-level or picture-level parameter sets, slice-level header parameters, lower-level parameters, supplemental enhancement information (SEI) message mechanisms, hypothetical reference decoder (HRD) based video buffering model, and so on.
In the following, a list of differences in these aspects compared to H.264 (AVC) is summarized:
  • Video parameter set: A new type of parameter set, called video parameter set (VPS), was introduced. The VPS provides a "big picture" of a bitstream, including what types of operation points are provided, the profile, tier, and level of the operation points, and some other high-level properties of the bitstream that can be used as the basis for session negotiation and content selection, etc.
  • Profile, tier and level: The profile, tier and level syntax structure that can be included in both VPS and sequence parameter set (SPS) includes 12 bytes data for the entire bitstream, and possibly include more profile, tier and level information for temporal scalable layers, which are referred to as sub-layers in the H.265 (HEVC) specification:
    • The profile indicator indicates the "best viewed as" profile when the bistream conforms to multiple profiles, like the major brand as in 3GPP file format and other ISO base media file format (ISOBMFF) based file formats.
    • The profile, tier and level syntax structure also includes the indications of whether the bitstream is free of frame-packed content, whether the bitstream is free of interlaced source and free of field pictures, i.e. contains only frame pictures of progressive source, such that clients/players with no special support of post-processing functionalities for handling of frame-packed contents, or contents with interlaced source or field pictures can stay away from those contents.
  • Bitstream and elementary stream: H.265 (HEVC) includes a definition of elementary stream, which is new compared to H.264 (AVC). An elementary stream consists of a sequence of one or more bitstreams. An elementary stream that consists of two or more bitstreams would typically have been formed by splicing together two or more bitstreams (or parts thereof). When an elementary stream contains more than one bitstream, the last NAL unit of the last access unit of a bitstream (except the last bitstream in the elementary stream) contains an end of bitstream NAL unit and the first access unit of the subsequent bitstream is an intra random access point (IRAP) access unit. This IRAP access unit may be a clean random access (CRA), broken link access (BLA), or instantaneous decoding refresh (IDR) access unit.
  • Improved random accessibility support: H.265 (HEVC) includes signalling in NAL unit header, through NAL unit types, of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, namely IDR, CRA, and BLA pictures, are supported, wherein IDR pictures are conventionally referred to as closed group-of-pictures (closed-GOP) random access points, while CRA and BLA pictures are those conventionally referred to as open-GOP random access points.
    • BLA pictures usually originate from splicing of two bitstreams or part thereof at a CRA picture, e.g. during stream switching.
    • To enable better systems usage of IRAP pictures, altogether six different NAL units are defined to signal the properties of the IRAP pictures, which can be used to better match the stream access point (SAP) types as defined in the ISOBMFF, which are utilized for random access support in both 3GP-DASH and MPEG DASH.
    • Pictures following an IRAP picture in decoding order and preceding the IRAP picture in output order are referred to as leading pictures associated with the IRAP picture. There are two types of leading pictures, namely random access decodable leading (RADL) pictures and random access skipped leading (RASL) pictures. RADL pictures are decodable when random access starts at the associated IRAP picture, and RASL pictures are not decodable when random access starts at the associated IRAP picture and are usually discarded.
    • H.265 (HEVC) provides mechanisms to enable the specification of conformance of bitstreams with RASL pictures being discarded, thus to provide a standard-complaint way to enable systems components to discard RASL pictures when needed.
  • Improved temporal scalability support: H.265 (HEVC) includes an improved support of temporal scalability, by inclusion of the signalling of temporal ID in the NAL unit header, the restriction that pictures of a particular temporal sub-layer cannot be used for inter prediction reference by pictures of a higher temporal sub-layer, the sub-bitstream extraction process, and the requirement that each sub-bitstream extraction output be a conforming bitstream. Media-aware network elements (MANEs) can utilize the temporal ID in the NAL unit header for stream adaptation purposes based on temporal scalability.
  • Improved temporal layer switching support: H.265 (HEVC) specifies, through NAL unit types present in the NAL unit header, the signalling of temporal sub-layer access (TSA) and stepwise temporal sub-layer access (STSA):
    • A TSA picture and pictures following the TSA picture in decoding order do not use pictures prior to the TSA picture in decoding order with TemporalId greater than or equal to that of the TSA picture for inter prediction reference. A TSA picture enables up-switching, at the TSA picture, to the sub-layer containing the TSA picture or any higher sub-layer, from the immediately lower sub-layer.
    • An STSA picture does not use pictures with the same TemporalId as the STSA picture for inter prediction reference. Pictures following an STSA picture in decoding order with the same TemporalId as the STSA picture do not use pictures prior to the STSA picture in decoding order with the same TemporalId as the STSA picture for inter prediction reference. An STSA picture enables up-switching, at the STSA picture, to the sub-layer containing the STSA picture, from the immediately lower sub-layer.
  • Sub-layer reference or non-reference pictures: The concept and signalling of reference/non-reference pictures in H.265 (HEVC) are different from H.264 (AVC). In H.264 (AVC), if a picture may be used by any other picture for inter prediction reference, it is a reference picture; otherwise it is a non-reference picture, and this is signalled by two bits in the NAL unit header. In H.265 (HEVC), a picture is called a reference picture only when it is marked as "used for reference". In addition, the concept of sub-layer reference picture was introduced. If a picture may be used by another other picture with the same TemporalId for inter prediction reference, it is a sub-layer reference picture; otherwise it is a sub-layer non-reference picture. Whether a picture is a sub-layer reference picture or a sub-layer non-reference picture is signalled through NAL unit type values.
  • Improved extensibility: Besides the temporal ID in the NAL unit header, H.265 (HEVC) also includes the signalling of six-bit layer ID in the NAL unit header, which is equal to 0 for a single-layer bitstream. Extension mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice headers, and so on. All these extension mechanisms enable future extensions in a backward compatible manner, such that bitstreams encoded according to potential future H.265 (HEVC) extensions can be fed to then-legacy decoders (e.g. H.265 (HEVC) version 1 decoders) and the then-legacy decoder can decode and output the base layer bitstream.
  • Bitstream extraction: H.265 (HEVC) includes bitstream extraction process as an integral part of the overall decoding process, as well as specification of the use of the bitstream extraction process in description of bitstream conformance tests as part of the hypothetical reference decoder (HRD) specification.
  • Improved reference picture management: H.265 (HEVC) includes a different way of reference picture management, including reference picture marking and removal from the decoded picture buffer (DPB) as well as reference picture list construction (RPLC). Instead of the sliding window plus adaptive memory management control operation (MMCO) based reference picture marking mechanism in H.264 (AVC), H.265 (HEVC) specifies a reference picture set (RPS) based reference picture management and marking mechanism, and the RPLC is consequently based on the RPS mechanism.
    • A reference picture set consists of a set of reference pictures associated with a picture, consisting of all reference pictures that are prior to the associated picture in decoding order, that may be used for inter prediction of the associated picture or any picture following the associated picture in decoding order. The reference picture set consists of five lists of reference pictures; RefPicSetStCurrBefore, RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and RefPicSetLtCurr contains all reference pictures that may be used in inter prediction of the current picture and that may be used in inter prediction of one or more of the pictures following the current picture in decoding order. RefPicSetStFoll and RefPicSetLtFoll consists of all reference pictures that are not used in inter prediction of the current picture but may be used in inter prediction of one or more of the pictures following the current picture in decoding order.
    • RPS provides an "intra-coded" signalling of the DPB status, instead of an "inter-coded" signalling, mainly for improved error resilience.
    • The RPLC process in H.265 (HEVC) is based on the RPS, by signalling an index to an RPS subset for each reference index. The RPLC process has been simplified compared to that in H.264 (AVC), by removal of the reference picture list modification (also referred to as reference picture list reordering) process.
  • Ultralow delay support: H.265 (HEVC) specifies a sub-picture-level HRD operation, for support of the so-called ultralow delay. The mechanism specifies a standard-complaint way to enable delay reduction below one picture interval. Sub-picture-level coded picture buffer (CPB) and DPB parameters may be signalled, and utilization of these information for the derivation of CPB timing (wherein the CPB removal time corresponds to decoding time) and DPB output timing (display time) is specified. Decoders are allowed to operate the HRD at the conventional access-unit-level, even when the sub-picture-level HRD parameters are present.
  • Parallel processing support: H.265 (HEVC) is the first video coding standard that includes some features that are specifically to enable parallel coding, particularly parallel encoding. These tools are tiles and wavefront parallel processing (WPP), which cannot be applied at the same time within a coded video sequence (as defined in the H.265 (HEVC) specification).
    • In WPP, the picture is partitioned into single rows of CTUs. Entropy decoding and prediction are allowed to use data from CTUs in other partitions. Parallel processing is possible through parallel decoding of CTU rows, where the start of the decoding of a CTU row is delayed by two CTUs, so to ensure that data related to a CTU above and to the right of the subject CTU is available before the subject CTU is being decoded. Using this staggered start (which appears like a wavefront when represented graphically), parallelization is possible with up to as many processors/cores as the picture contains CTU rows. Because in-picture prediction between neighbouring CTU rows within a picture is permitted, the required inter-processor/inter-core communication to enable in-picture prediction can be substantial. The WPP partitioning does not result in the production of additional NAL units compared to when it is not applied, thus WPP is not a tool for MTU size matching. However, if MTU size matching is required, slices and dependent slice segments can be used with WPP, with certain coding overhead.
    • Tiles define horizontal and vertical boundaries that partition a picture into tile columns and rows. The scan order of CTUs is changed to be local within a tile (in the order of a CTU raster scan of a tile), before decoding the top-left CTU of the next tile in the order of tile raster scan of a picture. Similar to slices, tiles break in-picture prediction dependencies as well as entropy decoding dependencies. However, they do not need to be included into individual NAL units (same as WPP in this regard); hence tiles cannot be used for MTU size matching, though slices and dependent slice segments can be used in combination for that purpose. Each tile can be processed by one processor/core, and the inter-processor/inter-core communication required for in-picture prediction between processing units decoding neighbouring tiles is limited to conveying the shared slice header in cases a slice is spanning more than one tile, and loop filtering related sharing of reconstructed samples and metadata. When more than one tile or WPP segment is included in a slice, the entry point byte offset for each tile or WPP segment other than the first one in the slice is signalled in the slice header.
  • New SEI messages: H.265 (HEVC) inherits many SEI messages from H.264 (AVC) with changes in syntax and/or semantics to make them applicable to H.265 (HEVC). Additionally, H.265 (HEVC) includes some new SEI messages; some of them are summarized below.
    • The display orientation SEI message signals the recommended anticlockwise rotation of the decoded picture (after applying horizontal and/or vertical flipping when needed) prior to display. This SEI message was also agreed to be included into H.264 (AVC).
    • The active parameter sets SEI message includes the IDs of the active video parameter set and the active sequence parameter set, and can be used to activate VPSs and SPSs. In addition, the SEI message includes the following indications:
      • An indication of whether "full random accessibility" is supported (when supported, all parameter sets needed for decoding of the remaining of the bitstream when random accessing from the beginning of the current coded video sequence by completely discarding all access units earlier in decoding order are present in the remaining bitstream and all coded pictures in the remaining bitstream can be correctly decoded).
      • An indication of whether there is any parameter set within the current coded video sequence that updates another parameter set of the same type preceding in decoding order. An update of a parameter set refers to the use of the same parameter set ID but with some other parameters changed. If this property is true for all coded video sequences in the bitstream, then all parameter sets can be sent out-of-band before session start.
    • The region refresh information SEI message can be used together with the recovery point SEI message (present in both H.264 (AVC) and H.265 (HEVC)) for improved support of gradual decoding refresh (GDR). This supports random access from inter-coded pictures, wherein complete pictures can be correctly decoded or recovered after an indicated number of pictures in output/display order.
    • The decoding unit information SEI message provides coded picture buffer removal delay information for a decoding unit. The message can be used in very-low-delay buffering operation.
    • The structure of pictures SEI message provides information on the NAL unit types, picture order count values and prediction dependencies of a sequence of pictures. The SEI message can be used for example for concluding which impact a lost picture has on other pictures.
    • The decoded picture hash SEI message provides a checksum derived from the sample values of a decoded picture. It can be used for detecting whether a picture was correctly received and decoded.
Up

5.4  H.265 (HEVC) for image codingp. 13

H.265 (HEVC) includes a Main Still Picture profile to efficiently code still images. This profile utilizes the same coding tools as the Main Profile of H.265 (HEVC) but can be used for encoding/decoding of still images. H.265 (HEVC) Main Still Picture profile is believed to be very useful for coding still images because of the following reasons:
  • High coding efficiency: Compared to legacy still picture codecs, H.265 (HEVC) provides significant benefits in compression capability.
  • Tile support: H.265 (HEVC) includes mechanism to divide a picture into regions called Tiles and to code those independently. This "spatial random access" provides various useful functionalities, such as easy browsing of extremely large pictures.
  • Using the same coding engine as for video coding: H.265 (HEVC) Main Still Picture profile uses the same tools as the Main profile for video coding. This means that all the H.265 (HEVC) implementations will most likely come with a support for the Main Still Picture profile as well, because no extra codec implementation is needed, thus it makes the deployment of this image codec relatively easy.
Up

Up   Top   ToC