August 7, 2016

Contemporary Video Compression Standards. H.265/HEVC, VP9, VP10, Daala

In this paper we compare compression efficiency of the latest video coding standards H.265/HEVC, VP9, VP10 and Daala to H.264/AVC with the help of reference video encoders available.

This blog post deliberates on the results published in our research paper:
Sharabayko M.P., Markov N.G. Contemporary Video Compression Standards. H.265/HEVC, VP9, VP10, Daala. Control and Communications (SIBCON), 2016 International Siberian Conference on, Moscow, 12-14 May, 2016, pp. 1-4.
that can be found on IEEE Xplore.

In our previous experiments [1,3] H.265/HEVC appeared to provide superior compression efficiency making it possible to save up to 50% of bitrate compared to H.264/AVC. Due to the fact that VP9 has to get around the patented compression techniques, it was less efficient even in intra-frame coding, but still showed better results than H.264/AVC. Google started the development of VP10 [4] to further improve compression efficiency of techniques used in VP9 standard. Daala, being in an early development stage back at the time of previous experiments, showed poor results: almost 10 times higher bitrates compared to H.264/AVC at the same distortion levels.
Our new research on the compression efficiency of the contemporary video compression standards H.264/AVC, H.265/HEVC and video encoders Google VP9, VP10, Xiph Daala aims to outline the current state of royalty-free codecs.

Encoder Implementations Used

In this research we aim to compare maximum video compression efficiency, provided by the latest compression standards to get an updated outlook at the capabilities of modern mainstream video compression techniques. It is one of the reasons we give preference to reference test model implementations of the encoders rather than using the commercial versions.

H.264/AVC

Reference JM encoder within H.264/AVC standard implements quasi-full Rate-Distortion Optimization (RDO) model for coding decisions. This makes JM encoder the best choice for our experiments. The reference model has a lot of configuration properties available, and the most common property sets are combined in several configuration files available with the sources of the encoder.
We use JM ver. 19.0 with default configuration “JM_LB_HE”, which sets up a hierarchical B-frames structure, disables rate control and enables all the available prediction block sizes. Additional options are passed by the following command line arguments:
lencod -p FrameRate=<FR> -p QPISlice=<QP> -p QPPSlice=<QP> -p QPBSlice=<QP> -p SourceWidth=<W> -p SourceHeight=<H> -p OutputWidth=<W> -p OutputHeight=<H>
The command line sets a fixed quantization parameter QP to be used for each frame and defines the frame rate and video resolution. Additional tweaking is done to disable QP offset used for hierarchical B-frames by default.

H.265/HEVC Encoder

We use reference HM encoder ver. 16.5 with its simplified RDO model to estimate compression efficiency of H.265/HEVC standard. “Low Delay Main” configuration is provided with source codes. To get constant QP on each frame we modified 'Qpoffset' values of GOP structure in configuration file. The “Low Delay Main” configuration defines hierarchical B-frames structure and enables almost all available coding tools.

VP9 Encoder

To test VP9 compression efficiency we use the open source 'libvpx' encoder ver. 1.5.0 provided by the WebM project [6]. The encoder provides command line interface to configure most of the coding options. The encoder is run with the following parameters:
vpxenc --codec=vp9 --fps=<FR> --i420 --min-q=<Q> --max-q=<Q> --cq-level=<Q> --kf-min-dist=1000 --kf-max-dist=1000 --passes=1 -w=<W> -h=<H>
Parameters '--fps', '-w' and '-h' define frame rate, width and height of an input video sequence. Parameters '--min-q', '--max-q' and '--cq-level' define the quantization values available, while making them equal (forces constant quantization mode). Parameters '--kf-min-dist' and '--kf-max-dist' specify key frame distance, and we force a key frame to be used only at the start of the sequence similar to the other encoders.

VP10 Encoder

The open source 'libvpx' encoder ver. 1.5.0 also has an implementation of VP10. The encoder is in the early development stage, which makes it possible to analyze compression efficiency of its most recent coding tools. The options are the same used for VP9 except for the parameter '--codec' that needs to be set to 'vp10' instead of 'vp9'.
vpxenc --codec=vp10 --fps=<FR> --i420 --min-q=<Q> --max-q=<Q> --cq-level=<Q> --kf-min-dist=1000 --kf-max-dist=1000 --passes=1 -w=<W> -h=<H>

Daala Encoder

Xiph Daala encoder and decoder implementations [7] are used in our experiments (master version of January 21, 2016). The configuration used for testing forces a key frame to be placed only at the start of the sequence and uses four B-frames. Quantization (quality) level is controlled by the parameter '-v'.
daala -k 9999 -b 4 -v <Q>

Results and Discussion

Experiments are carried out on JCT-VC test sequences [8]. The test set provides diverse video sequences specific for video conferencing, surveillance systems, desktop capturing and other fields of application of video compression.
Compression efficiency is compared in terms of distortion levels at the same bitrate values. Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) are used as two measures of distortion. The first one corresponds to the metric used in Rate-Distortion Optimization module of most encoders. On the other hand, SSIM provides better correlation with subjective distortion perception of human visual system which should make a notable difference for Daala as it is partially focused on the improved reduction of blockiness effects. 

Table 1. Compression efficiency compared to JM in terms of BD-Rate
Class
Sequence
HM vs JM
VP9 vs JM
VP10 vs JM
Daala vs JM
ΔRate (PSNR)
ΔRate (SSIM)
ΔRate (PSNR)
ΔRate (SSIM)
ΔRate (PSNR)
ΔRate (SSIM)
ΔRate (PSNR)
ΔRate (SSIM)
A
Traffic
-37.80
-40.59
-9.03
-5.03
-10.38
-6.32
-0.77
-30.71
PeopleOnStreet
-53.00
-54.65
-8.94
-7.97
-10.38
-9.54
17.93
-9.92
B
Kimono
-42.23
-48.07
-27.24
-33.15
-28.83
-34.53
-13.03
-30.33
ParkScene
-35.66
-41.40
-10.69
-10.41
-12.35
-12.15
3.81
-23.23
Cactus
-35.54
-37.32
-8.14
-5.43
-10.27
-7.92
22.09
-5.28
BQTerrace
-45.02
-52.90
-12.71
-14.09
-14.67
-16.76
6.29
-33.26
BasketballDrive
-48.38
-50.76
-27.80
-25.30
-29.62
-27.13
10.81
-14.93
C
RaceHorses
-26.33
-31.026
-8.15
-5.78
-9.41
-7.54
28.42
7.47
BQMall
-32.58
-35.61
-5.85
-0.51
-7.09
-2.67
30.21
1.49
PartyScene
-30.96
-32.57
-6.79
-3.06
-7.76
-3.87
14.48
-7.22
BasketballDrill
-42.00
-42.45
-12.84
-12.77
-13.3
-12.78
17.15
-11.51
D
RaceHorses
-25.77
-27.27
-6.79
-1.23
-7.35
-1.85
23.69
6.31
BQSquare
-39.14
-39.98
-17.15
-9.16
-18.48
-9.99
2.28
-32.98
BlowingBubbles
-27.40
-30.00
-5.46
-2.50
-6.47
-3.34
15.75
-7.06
BasketballPass
-28.32
-31.12
2.56
10.38
2.61
11.20
11.30
-21.29
E
FourPeople
-35.82
-35.77
2.03
6.27
0.71
4.62
29.36
-3.54
Johnny
-47.46
-53.70
-11.84
-21.17
-14.09
-22.91
23.44
-14.16
KristenAndSara
-45.34
-47.02
-11.06
-15.73
-13.30
-18.71
29.94
-14.18
F
BasketballDrillText
-40.49
-39.96
-10.62
-10.49
-11.19
-10.48
23.13
-7.54
ChinaSpeed
-37.16
-39.31
5.25
-10.70
4.65
-11.25
96.08
9.26
SlideEditing
-26.37
-29.35
-7.16
-8.59
-5.63
-6.71
137.69
109.56
SlideShow
-37.15
-31.63
-27.40
-23.66
-27.51
-23.15
155.25
113.34
On average:
-37.27
-39.66
-10.26
-9.55
-11.37
-10.63
40.54
8.63

Table 1 provides the experimental results in terms of BD-Rate (average bitrate difference on the common interval of distortion levels) [9]. As can be seen, HM encoder provides results superior to all other codecs. There are a lot of comparisons of H.265/HEVC to H.264/AVC, and we will not focus on that as much as on comparison of royalty-free codecs to JM implementation of H.264/AVC. Another obvious observation is that all the codecs except for Daala have almost the same compression results in terms of PSNR and SSIM. This once again highlights the fact that those codecs are based on pretty much the same techniques, which produce almost similar distortion features.
Test sequences of Class A–D have common features of photo-realistic video sequences and may be a good benchmark for, e.g., video surveillance video content.
Class A test sequences have the highest resolution of 2560×1600 pixels in the test set. They mostly share the features of video surveillance content with a highway traffic (Traffic) and a crowd of people passing a crossroad (PeopleOnStreet). Both videos have both plain textured regions and regions with smooth borders of different objects. 
Fig. 1. Bitrate-PSNR plot for Traffic test sequence
Fig. 2. Bitrate-SSIM plot for Traffic test sequence
The comparison results for the sequences of this Class have peculiar results. Fig. 1 and Fig. 2 show rate-distortion curves for Traffic test sequences with PSNR and SSIM distortion measures respectively. Daala compression efficiency In terms of rate–SSIM is very close to HM and is superior to the rest encoders tested. At the same time, the efficiency of both VP9 and VP10 is almost the same as of JM.
Different results are obtained on Class B test sequences with resolution of 1280×720 pixels. The sequences contain a lot of details and noticeable motion. Compared by PSNR to JM, HM encoder provides 41% bitrate savings, VP9 provides 17% bitrate savings, VP10 provides 19%, while Daala provides 16% bitrate overhead by PSNR and 15% bitrate savings by SSIM distortion metric. 
Fig. 3. Bitrate-PSNR plot for BQTerrace test sequence
Fig. 4. Bitrate-SSIM plot for BQTerrace test sequence
Fig. 3 and Fig. 4 show rate-distortion plots of the compared encoders for BQTerrace test sequence with the results to be studied closely. When distortion is measured by PSNR metric, Daala provides 6.29% bitrate overhead compared to JM, while based on SSIM metric Daala provides 33.26% bitrate savings (very close to HM). Daala looks better at lower bitrates which should be due to lapped transforms and pre– and post-filtering. At higher bitrate levels VP9 and VP10 have better results compared to Daala. Even compared to HM both VP9 and VP10 show competitive results with SSIM-based distortion measurement. 
Class C sequences have resolution of 832×480 pixels. VP9 and VP10 provide slightly better compression efficiency compared to JM (7–13% bitrate savings by PSNR). Compression efficiency of Daala on RaceHorses and BQMall sequences is slightly worse compared to JM by SSIM. However, 7–10% bitrate savings are achieved on PartyScene and BasketballDrill video sequences.
Class D test sequences have the smallest resolution of 416×240 pixels, which is not the target usage of the contemporary compression systems, still needs to be considered. On BQSquare and BasketballPass test sequences Daala shows results very close to HM with SSIM distortion measurement. The results for the rest two Class D sequences are close to JM efficiency, as well as the results for VP9 and VP10.
Class E test sequences represent video conferencing test case with resolution of 1280×720 pixels, which may be considered one of the target usage of royalty-free video codecs. On this test set VP9 and VP10 have good results for Johnny and KristenAndSara, and not very good results for FourPeople sequence. SSIM-based results of Daala are comparable to VP9 and VP10.
Class F test set contains video sequences with full or partial artificial content: desktop capture (SlideShow, SlideEditing), video game (ChinaSpeed) and subtitles (BasketballDrillText). The results in Table I obviously show that Daala works bad on this test sequences as it tends to smooth texture edges. On artificial-based content this feature of Daala tools does not work well as it does for photo-realistic content.

Conclusion

The results showed the superior compression efficiency of H.265/HEVC coding tools over H.264/AVC tools and the studied royalty-free encoders. Compression efficiency of the royalty-free codecs is not very stable and lies between the efficiency of HM and JM encoders. However, VP9 and potentially VP10 encoders may be considered a good substitute over H.264/AVC based encodes. Daala compression efficiency is the most unstable: best results are achieved on photo-realistic content, while compression efficiency of artificial content with sharp edges is a weak point of Daala.

Rate-PSNR evaluation of compression results of Daala encoder is not competitive to the results of the rest encoders, while rate-SSIM measurements show relatively good results. To further acknowledge Daala efficiency, a research on compression distortion of Daala encoder compared to H.264/AVC, H.265/HEVC, VP9 and VP10 with the help of subjective distortion measurements needs to be carried out.

References

  1. M.P. Sharabayko, Next Generation Video Codecs: HEVC, VP9 and Daala, In Youth and Contemporary Information Technologies, Tomsk Polytechnic University: Tomsk, Russia, 2013; Vol. 13, 35-37.
  2. Recommendation H.265: High effciency video coding, ITU-T, April 2015.
  3. M.P. Sharabayko, O.G. Ponomarev, R.I. Chernyak, Intra Compression Efficiency in VP9 and HEVC, Applied Mathematical Sciences, 7 (2013), 6803-6824. http://dx.doi.org/10.12988/ams.2013.311644.
  4. D. Mukherjee, H. Su, J. Bankoski, A. Converse, J. Han, Z. Liu, Y. Xu. An Overview of new Video Coding Tools under consideration for VP10 the successor to VP9. Proc. SPIE 9599, Applications of Digital Image Processing XXXVIII, 95991E. September 22, 2015.
  5. A. Grange, H. Alvestrand. A VP9 Bitstream Overview (Internet-Draft), Google, August 2013.
  6. The WebM Project. – URL: http://www.webmproject.org/ (07.02.2015).
  7. Xiph.org Daala video. – URL: https://xiph.org/daala/ (07.02.2015).
  8. F. Bossen. Common test conditions and software reference configurations. In Document of ITU-T Q.6/SG16 JCTVC-K1100. ITU-T: Shanghai, CN, 2012.
  9. G. Bjontegaard, Improvements of the BD-PSNR model. ITU-T SC16/Q6, 35th VCEG Meeting Doc. VCEG-AI11, Berlin, Germany, 16-18 July, 2008.



No comments:

Post a Comment