close

Вход

Забыли?

вход по аккаунту

?

j.image.2017.09.008

код для вставкиСкачать
Accepted Manuscript
A fast inter CU decision algorithm for HEVC
Zhe Xu, Biao Min, Ray C.C. Cheung
PII:
DOI:
Reference:
S0923-5965(17)30168-6
https://doi.org/10.1016/j.image.2017.09.008
IMAGE 15281
To appear in:
Signal Processing: Image Communication
Received date : 19 September 2016
Revised date : 14 September 2017
Accepted date : 15 September 2017
Please cite this article as: Z. Xu, B. Min, R.C.C. Cheung, A fast inter CU decision algorithm for
HEVC, Signal Processing: Image Communication (2017),
https://doi.org/10.1016/j.image.2017.09.008
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form.
Please note that during the production process errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
Manuscript
A Fast Inter CU Decision Algorithm for HEVC
Zhe Xu, Biao Min, Ray C. C. Cheung∗
Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
Abstract
A fast CU decision algorithm is very desirable for High Efficiency Video Coding
(HEVC) due to its high encoding complexity. In this paper, a fast inter CU
decision algorithm is proposed, with the motion correlations between neighboring CUs discussed. Decision for splitting of collocated CU has a strong impact
on current CU, high-motion CUs are early split by means of calculating motion
diversity of collocated CUs. On the other hand, SKIP mode indicates a motion
sharing relation among neighboring CUs and it can be used to early determine
CU termination. A discriminant function minimizing expected risk is defined for
both early SKIP mode detection and SKIP mode based termination decision.
Experimental results show that the proposed algorithm can reduce computational complexity by 48.2% with only 0.46% BDBR increase for random access
configuration. For the low-delay B configuration, it can reduce complexity by
45.9% with 0.55% BDBR increase penalty. The results show our algorithm
achieves less BDBR increase compared with other state-of-the-art works.
Keywords: High Efficiency Video Coding (HEVC), fast inter coding, motion
correlation.
1. Introduction
Though current dominant video coding standards, such as H.264/AVC standard [1], have high coding efficiency, research work for better compression perfor∗ Corresponding
author
Email addresses: zhexu22-c@my.cityu.edu.hk (Zhe Xu), biaomin3-c@my.cityu.edu.hk
(Biao Min), r.cheung@cityu.edu.hk (Ray C. C. Cheung)
Preprint submitted to Signal Processing: Image Communication
September 14, 2017
Depth 0
64x64
Depth 1
32x32
Depth 2
16x16
Depth 3
8x8
Fig. 1. Quad-tree based CU partition and its corresponding coding tree in
HEVC.
mance continues due to the strong demand for emerging high definition (HD)
or even ultra-HD videos. Thus, the latest developed video coding standard,
High Efficiency Video Coding (HEVC) [2] is released under the efforts of Joint
Collaborative Team on Video Coding (JCT-VC). It contains new coding tools
and can achieve 50% bit rate reduction with similar perceptual video quality
compared with H.264/AVC standard [3].
An important improvement in HEVC is the use of a quad-tree based coding
unit (CU) structure for block partitioning instead of MacroBlock structure in
H.264/AVC, as shown in Fig. 1. A slice is partitioned into multiple coding tree
units (CTUs). CTU is the largest CU in HEVC with depth equal to 0. The
size of a CTU is allowed to be 64x64, 32x32 and 16x16. A CTU can be split
into four sub-blocks with depth one larger than current depth. Each sub-block
can be split recursively to form a coding tree structure until the allowed maximum depth is reached. Each leaf node in the coding tree is a CU and can be
further divided into prediction units (PUs). Each PU has its own prediction
information, such as the prediction angle for intra coding and motion vector
for inter coding. Fig. 2 shows 8 PU modes available for inter coding, including
PART 2Nx2N, PART NxN, PART Nx2N, PART 2NxN, and four asymmetric
modes PART nLx2N, PART nRx2N, PART 2NxnU, PART 2NxnD. The coding
tree structure and multiple PU modes allow better flexibility for block partitioning and highly increase the coding efficiency.
However, the adoption of CU structure as well as other advanced coding tools
2
PART_2Nx2N
PART_NxN
PART_Nx2N
PART_2NxN
PART_nLx2N
PART_nRx2N
PART_2NxnU
PART_2NxnD
Fig. 2. 8 PU modes available for HEVC inter prediction.
makes the computational complexity of HEVC encoder enormously increase
compared with H.264/AVC [4]. In the original HEVC test model, HM software, searching for the best block partitioning is a depth-first traversal method.
Nearly all possible partitioning patterns are evaluated and then the optimum
partitioning is selected. Although this method achieves the best coding efficiency, it makes HEVC coding several times more complex than H.264/AVC,
which becomes a real concern.
Some fast algorithms for HEVC intra prediction have been proposed. In [5],
an early CU splitting and termination decision for intra prediction is proposed
based on the statistical analysis of rate distortion (RD) cost. In [6], global and
local edge complexities in horizontal, vertical, 45◦ diagonal, and 135◦ diagonal
directions are used to decide CU partitioning. In [7], the authors propose a fast
intra coding algorithm based on spatial correlations and analysis of probabilities. In [8], a fast PU skip and termination algorithm is proposed using the
ratio of RD costs between the current CU and the spatially adjacent or upper
depth CU. In [9], convolutional neural network (CNN) is adopted to early decide CU partitioning. The algorithm does not depend on the correlations among
CU depths or spatially neighboring CUs so it is friendly to parallel processing.
In [10], two support machine vectors (SVMs) which employ depth difference
and RD cost ratio as features are used for fast CU size decision. These fast
intra CU decision algorithms can drastically decrease intra coding time without
3
large coding efficiency loss, but they are not suitable for inter prediction configurations which are commonly used. Thus a fast inter CU decision algorithm is
crucial.
Significant efforts have been made in searching for faster CU decision algorithms for HEVC inter prediction and corresponding approaches have been
proposed. Based on the features employed to make early decision, these algorithms can be classified into several types, as follows listed three of them.
The first type is RD cost based algorithm. Kim et al. [11] propose a fast
SKIP mode decision by comparing the RD cost with a threshold. The decision
is based on an adaptive linear prediction utilizing neighboring PUs and previous
coded PUs. In [12], Lee et al. propose a fast CU decision algorithm including
SKIP mode decision, CU skip estimation and early CU termination mainly
based on Bayesian decision rule. It is assumed that the probability density
function of RD cost is a Gaussian distribution function. To further improve the
decision accuracy, Kim et al. [13] use RD cost of both inter and intra modes
to reflect spatial and temporal characteristics. Thus, the RD cost distribution
is a 2-D Gaussian distribution function. Hu et al. [14] partition the SKIP RD
cost distribution space into high distinction region (HDR) and low distinction
region (LDR). A straightforward decision and a Baysian decision are proposed
for HDR and LDR respectively. Jung et al. [15] propose a fast mode decision
algorithm by predicting RD cost and bit cost. Mode candidates with high cost
prediction are early skipped. In [16], Hu et al. adopt RD cost as the decision
feature and the mode decision and CU partitioning are modeled as the binary
classification problems. The Neyman-Pearson-based method is employed to
balance the coding efficiency loss and complexity reduction. In [17], Xiong et
al. propose an inter CU decision method employing the characteristics of both
RD cost and latent SAD values. The relationship between RD cost and SAD
values are explored and the threshold of decision is selected accordingly.
The second type uses spatial and temporal correlations to predict current
CU size and mode. In [18], the depth information of collocated CTU in the
reference frame is utilized to early predict current CTU depth situation. Zhang
4
et al. [19] propose a fast CU depth decision by reducing depth search range. The
depth search range is set based on similarity among neighboring CTUs. Shen
et al. [20] propose an integrated CU decision algorithm jointly utilizing interlevel correlation and spatiotemporal correlation. Mode information of adjacent
and collocated CUs is employed to early determine SKIP mode and exclude
unlikely modes. In [21], a fast motion estimation method is proposed by restraining search window based on neighboring coded blocks. In [22], CTUs are
firstly classified as static and motive and then depth range and SKIP mode
are determined respectively based on the information of collocated CU. In [23],
depth evaluations that are rarely used in the previous frame and neighboring
CUs are skipped to reduce computational complexity. In [24], relationship of
depth range between 13 neighboring CTUs and current CTU is analyzed. Most
relative CTUs are used to decide the optimal depth of current CTU. In [25], if
the final best search point of PART 2Nx2N mode is equal to its initial search
point, the block matching process is skipped for the motion estimation of rest
modes. Thus the motion estimation time is reduced.
The third type employs multiple coding signals and flags. In [26], the differential motion vector (DMV) and coded block flag (CBF) are utilized to early
detect SKIP mode. Zhang et al. [27] propose a fast mode decision algorithm
using distribution of prediction distortions to skip some unlikely modes. Wu
et al. [28] propose a fast CU encoding algorithm utilizing the best PU mode,
the second-best PU mode and the CBF. In [29], several parameters such as
sample-adaptive-offset (SAO) parameters, motion vectors, TU sizes and CBF
information are employed to estimate spatial and temporal complexity, thus
early CU size decision is made. In [30], multiple features including variance of
PART 2Nx2N prediction errors and the variance of four SATDs in sub-blocks
are utilized in a Baysian decision. In [31], Shen et al. propose a weighted SVM
using multiple features to early determine CU termination. In [32], optical flow
estimation is used to calculate pyramid motion divergence feature. Then the
k-NN decision is performed to decide CU splitting. In [33], Tan et al. propose a
fast CU termination method using both the variance of two PART Nx2N PUs
5
and two PART 2NxN PUs of current CU. If both two variances are smaller than
the threshold, the current CU is early terminated.
In this paper, a fast inter CU decision algorithm is proposed to reduce computational complexity without noticeable coding efficiency loss. The algorithm
includes three methods, namely, motion based split-to-split method (MBS2S),
minimum risk SKIP detection (MRSD) and SKIP mode based termination decision (SMBT). Motion correlations among neighboring CUs are discussed firstly
and in MBS2S, motion diversity and split decision of the collocated CU in
neighboring frame are employed to early split non-homogeneous CUs. SKIP
mode is fully discussed for not only the early detection but also its potential in
early termination decision. To make accurate decisions, a discriminant function
minimizing expected risk is defined for MRSD and SMBT. In the proposed algorithm, we utilize an RD cost decision based on Baysian rule in MRSD and
SMBT. In addition, some new approaches are adopted to help reduce computational complexity as well as ensure the coding efficiency which include:
1. The MBS2S decision with motion diversity and split decision of collocated
CU fully utilized is adopted to help early split current CU. Besides, a turnoff scheme is employed to avoid potential coding efficiency loss.
2. New minimum risk decision is adopted in both MRSD and SMBT to
ensure the coding efficiency while the effect on computational complexity
reduction is minimal.
The algorithm achieves a better balance between complexity reduction and
coding efficiency compared with other existing algorithms. Experimental results
show that the algorithm can save near 50% coding time with only about 0.5%
BDBR increase for both RA and LB configurations.
The rest of this paper is organized as follows. In Section 2, a brief overview
of inter encoding process adopted in HM software is presented. The proposed
fast algorithm is introduced in detail in Section 3. Experimental results are
provided in Section 4. At last, Section 5 gives the conclusion.
6
2. HM Inter Encoding Overview
In this section, the inter encoding process adopted in HM software is introduced. HEVC applies many advanced coding techniques for inter prediction,
including merge mode, CU structure and multiple PU modes. For each intercoded CU, HEVC encoder will evaluate PU modes shown in Fig. 2 and for each
PU mode, rate distortion optimum (RDO) is performed. The PU mode with
minimum RD cost is selected as the optimum mode. RD cost is calculated as
J =D+λ·R
(1)
where J is the RD cost, D stands for reconstruction distortion, which is calculated as the sum of squared error (SSE) between the original input image block
and the predicted block. λ is the Lagrangian multiplier and R is the coding bits
cost.
Since neighboring MVs have correlations in both time domain and space
domain, a new prediction mode called merge mode [34] is performed in inter
prediction. The merge mode derives motion information including MVs and
reference indices from spatially or temporally neighboring blocks and creates
an MV candidate list. The encoder will evaluate all the candidates and the
MV with minimum RD cost is selected for merge mode. In HEVC, the coded
block flag (CBF) is employed to indicate the occurrence of non-zero quantized
transform coefficients in the transform unit (TU). When all CBFs are equal to
zero in a determined CU, then only a skip flag and its corresponding merge index
are signaled to further save bit rate. This is a special case of merge mode called
SKIP mode. According to [20], the SKIP mode takes up the major proportion
ranging from 30% to 90% in different sequences and the average percent is 63%.
It is very common in inter prediction since it can present low-motion scenes such
as background efficiently.
Besides the mode decision of each CU, in HM software, there exist two
processes to decide CU partitioning: CU splitting and CU pruning. CU splitting
is a top-down manner to split a CU into four sub-CUs until it reaches the
7
maximum depth. While CU pruning is a down-top manner to decide whether
the CU is split or not. To determine the CU size, RD cost of the current CU
JCU is compared with the total RD cost of its four sub-CUs, defined as Jsplit .
Jsplit =
3
∑
Jsub−CUi
(2)
i=0
If Jsplit is smaller than JCU , the current CU is decided to be split and its
RD cost is replaced by Jsplit . Otherwise, the CU is decided not to be split.
After the exhaustive CU splitting and CU pruning process, the optimum CU
partitioning in a CTU and PU mode for each CU are figured out.
3. Proposed Inter CU Decision Algorithm
Inter CU partitioning is a reflection of block motion diversity. In motionhomogeneous region, CU sizes are typically large while small CUs are dominant
in a region with various motions. Thus, by analyzing motion correlations between the current CU and its neighboring CUs, the characteristics of current
CU are acquired and early CU decision can be made. In this section, the proposed fast inter CU decision algorithm is introduced in detail including three
individual methods.
3.1. Motion Based Split-to-Split Method (MBS2S)
Fig. 3 shows large CU partitioning in temporal neighboring frames. Fig. 3(a)
and 3(c) are in the reference list of Fig. 3(b) and 3(d), respectively. A highmotion sequence, BasketballDrill, is selected to explore the CU splitting relation.
Two configurations, random access (RA) and low-delay B (LB) are tested. For
LB configuration, temporal distance is only 1 frame so both content and CU
partitioning are almost the same: both of them are partitioned when representing moving players located in nearly the same region. For RA configuration,
though the maximum distance could be 8 frames, the majority of CU splitting
is similar because two frames are in the same encoding temporal layer and both
tend to be coded by small CUs. In this proposed method, 64x64 and 32x32
8
(a) 9th frame, RA configuration
(b) 17th frame, RA configuration
(c) 6th frame, LB configuration
(d) 7th frame, LB configuration
Fig. 3. CU partitioning in neighboring frames. (BasketballDrill, QP = 32)
CUs are analyzed so the probability of moving object crossing different CUs
in neighboring frames is small. Even the current CU is located in the edge of
moving objects, its collocated CU still contains this edge part and thus these
two CUs have the same partitioning decision. This indicates that motion variety
is similar in these two large collocated CUs and encoded frames are helpful to
decide CUs to be split early in the current frame thus computational complexity
is reduced.
Some works [18, 19, 22] already use depth information of temporal collocated
CUs to set depth range of the current CTU so that some depth evaluations are
skipped. While in our proposed method, partitioning information of collocated
CUs is directly employed to make early split decision for depth 0 and 1, which
is called split-to-split decision. Moreover, motion information is an additional
weight when deciding CU splitting. Thus the whole method is called motion
based split-to-split method (MBS2S).
9
30%
50%
Split CTU
45%
Split CTU
Terminated CTU
40%
Percentage
Percentage
35%
30%
Terminated CTU
25%
Early Split Decision
25%
20%
20%
Early Split Decision
15%
10%
15%
10%
5%
5%
0%
0%
0
20
40
60
80
0
20
40
60
80
MDavg value
MDavg value
(a) RA configuration
(b) LB configuration
Fig. 4. M Davg distribution of split and terminated CTUs. (BasketballDrill,
QP = 32, first 25 frames are tested)
In [35], motion information has been used to analyze block characteristics.
In this paper, to reveal the motion homogeneity as well as not to increase too
much computational complexity, a parameter called average motion diversity
(M Davg ) is defined as the first order absolute central moment value
M Davg =
H−1
−1
∑ W∑
D(M Vi,j , M Vm )
i=0 j=0
H ×W
(3)
where M Vi,j stands for the estimated motion vector of the pixel at (i, j). M Vm
is the average motion vector in a CU and D() is the block distance of two motion
vectors which is the sum of absolute difference. Thus
M Vm =
H−1
−1
∑ W∑
M Vi,j
i=0 j=0
(4)
H ×W
D(M V 1, M V 2) = |M V 1x − M V 2x | + |M V 1y − M V 2y |
H and W stand for the height and width of the block.
To reveal the correlation between M Davg and CU partitioning, we collect
the M Davg distribution of split and terminated CTUs when the collocated CTU
is split and results are shown in Fig. 4. The same sequence as Fig. 3, Basket10
ballDrill is selected. The vertical coordinate stands for the percentage of total
CTUs. We can see that if the collocated CTU is split, the probability of current
CTU to be terminated is low (about 20% to 30%). In addition, the two distributions of split and terminated CTUs are significantly different. Most M Davg
values of terminated CTUs are close to zero, indicating that terminated CTUs
have less motion diversity. While split CTUs have a wider M Davg distribution
from 0 to 80. Thus large M Davg value indicates a split decision. We assume
that the M Davg of collocated CTU is highly relative with the current CTU.
Thus when the collocated CTU is split and its M Davg value is large enough,
the current CTU is early split and mode evaluations are skipped.
Based on the analysis of temporal partitioning correlation and M Davg ,
MBS2S is proposed and introduced as follows.
First, if the collocated CU is split, then the current CU is probably split
as well. The collocated CU is a conceptual CU in the neighboring frame which
covers the same area as current CU, like 64x64 for depth 0 and 32x32 for depth 1.
In MBS2S, we use the first reference frame list of current frame and choose the
first reference candidate in order for S2S decision. The collocated CU is then
selected in this neighboring frame. To simplify the discussion, we define the
probability of a CU to be split given that the collocated CU is split already as
d
d
p(wS,n+1
|wS,n
), where d is the depth of current CU and d ∈ {0, 1}. The variable
n and n+1 are used to indicate that the CU is in neighboring and current frame
d
d
|wS,n
)
respectively. Because MBS2S only considers large CU splitting, p(wS,n+1
in non-homogeneous sequences is high according to Fig. 4. M Davg is adopted
in the next step to avoid potentially wrong decision. Thus, QPI difference
between current and collocated CUs has little influence on the final accuracy.
Experimental results in Section 4.2 aldo prove that the coding efficiency loss
caused by MBS2S is negligible.
M Davg of the collocated CU is calculated with the assumption that M Davg
value of the collocated CU can reflect motion homogeneity of the current CU. It
d
d
is reasonable that CUs with larger M Davg values have higher p(wS,n+1
|wS,n
).
So a threshold thdM Davg is set. If the collocated CU is split with M Davg larger
11
than thdM Davg , the current CU is decided to be split early and thus evaluation
process for current depth is skipped. In this approach, the S2S rate θd is used
to decide thdM Davg and θd is represented as
d
θ =
∫
∞
thd
M Davg
d
d
d
p(M Davg
|wS,n
)dM Davg
(5)
d
d
d
The p(M Davg
|wS,n
) is the probability density of M Davg
when collocated CU is
split. A parameter updating process is performed every n frames to decide θd
d
d
d
and thdM Davg . Statistical parameters p(wS,n+1
|wS,n
) and M Davg
are obtained
d
d
and stored during the updating process. The p(M Davg
|wS,n
) can then be de-
d
rived based on statistical distribution of M Davg
in parameter updating process.
In Fig. 4 it is observed that M Davg distributions of split and terminated CUs
have intersections, so a slack gap is needed and S2S rate θd should be smaller
d
d
than p(wS,n+1
|wS,n
). This setup can enlarge the threshold thdM Davg and ensure
the decision accuracy. So θd is represented as
d
d
) − βd
|wS,n
θd = p(wS,n+1
d
d
))
|wS,n
β d = 0.5 × (1 − p(wS,n+1
(6)
The β d is an experimental value to control the decision accuracy. It should be
d
d
d
d
negatively related to p(wS,n+1
|wS,n
) because when p(wS,n+1
|wS,n
) is large, the
decision accuracy is high and there is less need to set the slack gap. Tests show
β d in Equation (6) can exclude most of terminated CUs.
d
are obtained. In addition,
Then thdM Davg is updated after θd and M Davg
further tests reveal that MBS2S is not suitable for some homogeneous sequences,
d
d
which means p(wS,n+1
|wS,n
) is low. To ensure the validity of θd and exclude
homogeneous sequences, an adaptive turn-off scheme as well as a flag called
d
d
“bIsS2S ” is adopted. Only when p(wS,n+1
|wS,n
) is greater than α will “bIsS2S ”
be true and then MBS2S is performed. α is an experimental value and is set to
be 0.67.
12
3.2. Minimum Risk SKIP Detection (MRSD)
In HEVC, SKIP mode indicates a motion sharing relation. It is important
because it occupies more than half CUs [20] and can indicate homogeneous
regions for early CU termination decision. In this section, the minimum risk
SKIP detection (MRSD) is proposed and the SKIP mode based termination
decision (SMBT) is introduced in Section 3.3.
A few methods for early SKIP mode detection have been proposed, such
as SMD [12], ES-MD [20] and ESD [26]. In this paper, a minimum risk SKIP
detection (MRSD) is proposed. RD cost is utilized as a feature indicating CU
characteristics. SKIP mode is early determined for CUs with small expected
risk. In addition, the discrimination function is updated periodically so it can
adapt to the video content. Thus the decision accuracy is high while much
coding complexity is reduced.
According to [36], PART 2Nx2N mode occupies nearly 20% share in inter CU
modes showing a fundamental role in block partitioning, so PART 2Nx2N evaluation should not be skipped in early SKIP mode detection. The PART 2Nx2N
mode evaluation includes both inter 2Nx2N and merge 2Nx2N mode evaluations.
When SKIP mode is selected after PART 2Nx2N evaluation, the probability of
SKIP mode to be optimum after all mode evaluations is tested and the results
are shown in Table 1. To ensure the validity of the proposed method, test sequences and quantization parameters (QPs) are different from those adopted in
Section 4. Six sequences are selected to cover different types of motion and texture complexity. For example, Tennis contains lots of motion blocks and moving
background while Vidyo1 and Vidyo3 are homogeneous with static background.
In addition, the test sequences are with various resolutions from 1920x1080 to
1024x768, which aims to test general performance of the method. For each
sequence, three QPs (25, 30, 35) are used to test the validity of the proposed
method in a wide QP range. Test results show that on average, the SKIP decision accuracy is more than 95%. Even for those non-homogeneous sequences
like Tennis, over 85% CTUs are SKIP mode after the evaluation of all modes
if they are already decided to be SKIP mode after PART 2Nx2N evaluation.
13
Table 1. The Probability of SKIP Mode to be Optimum after All Mode Evaluations When it is Selected after PART 2Nx2N Evaluation
Resolution
Sequence
Tennis
1920x1080
Cafe
Vidyo1
1280x720
Vidyo3
Balloons
1024x768
Kendo
Random Access
depth 0 depth 1 depth 2
85.2%
94.2%
97.9%
88.5%
95.8%
98.4%
87.7%
95.2%
98.5%
98.8%
99.4%
99.8%
99.2%
99.6%
99.9%
98.9%
99.7%
99.9%
94.4%
98.1%
99.4%
96.6%
99.0%
99.7%
98.2%
99.3%
99.8%
92.0%
97.3%
99.2%
95.5%
98.5%
99.5%
97.0%
99.0%
99.7%
93.6%
98.4%
99.6%
97.1%
99.3%
99.8%
98.5%
99.5%
99.9%
97.4%
99.0%
99.7%
98.0%
99.3%
99.8%
98.6%
99.4%
99.8%
95.3%
98.3%
99.5%
QP
25
30
35
25
30
35
25
30
35
25
30
35
25
30
35
25
30
35
Average
Based on test results shown in Table 1, in our proposed method, the fast SKIP
mode decision is performed after SKIP and PART 2Nx2N mode evaluations to
make sure that the method achieves good coding efficiency.
d
After SKIP and PART 2Nx2N mode evaluations, RD cost JSKIP
is utilized
d
as a feature. JSKIP
is the RD cost of CUs when they are coded by SKIP mode
after PART 2Nx2N mode evaluation. In such approach, the decision criteria
is crucial. To make an accurate decision, a discriminant function minimizing
expected risk is defined and the method is introduced as follows. Two classes
{S d , S̄ d } are classified, S d stands for CUs with SKIP mode as the optimum
mode and S̄ d is the class containing CUs which SKIP mode is not optimum.
The superscript d is the depth of current CU.
14
Decision loss Li,j is defined standing for the loss of deciding a CU in class
j to be i. The variable RSd is used to stand for the expected risk of deciding a
CU to be SKIP mode. It is represented as
d
d
RSd = LdS,S × p(S d |JSKIP
) + LdS,S̄ × p(S̄ d |JSKIP
)
(7)
S and S̄ are SKIP class and non-SKIP class respectively. So LdS,S̄ is the decision
loss of deciding a non-SKIP CU to be SKIP mode. The expected risk of deciding
d
(w ∈
non-SKIP mode RS̄d is similar to RSd . Then the discriminant function gw
{S, S̄}) is defined as


gSd
gS̄d


 = −
RSd
RS̄d


LdS,S
 = −
LdS̄,S


d
p(S d |JSKIP
)


d
d d
LS̄,S̄
p(S̄ |JSKIP )
LdS,S̄
(8)
The decision is to choose the maximum discriminant function value, therefore
the expected risk is small and decision accuracy is ensured. The decision criterion is
wd =


S d ,
gSd > gS̄d

S̄ d , g d ≤ g d
S
S̄
(9)
d
Bayes’ theorem gives the calculation of p(wd |JSKIP
) as
d
)=
p(wd |JSKIP
d
p(JSKIP
|wd ) × p(wd )
d
p(JSKIP
)
(10)
d
In [5, 12, 37], p(JSKIP
|wd ) is assumed to be a Normal distribution function
d
d
and p(wd ), an on-line parameter
N (µdw , σw
). To obtain parameters µdw , σw
updating period is performed so that the decision can adapt to the video content.
The right decision does not result in a loss, so LdS,S and LdS̄,S̄ are set as 0.
Since the decision that a SKIP mode CU is classified as a non-SKIP mode CU
will not affect coding efficiency, LdS̄,S is simply set as 1. In our consideration,
smaller depth has larger decision loss because it will affect CU decisions in
smaller scale, so LdS,S̄ is a function of depth and is positively related to (4 − d).
The constant 4 is selected because the maximum depth level in HEVC is 4 from
15
Tennis
RaceHorses
ChinaSpeed
BasketballDrill
QP25
QP30
QP35
QP40
99%
Decision Accuracy
Decision Accuracy
99%
98%
97%
98%
97%
96%
95%
5
10
15
20
25
30
5
10
15
α value
20
25
30
α value
(a) Different sequences (QP = 30)
(b) Different
QP
values
(Basket-
ballDrill)
Fig. 5. Different α values’ impact on decision accuracy. (RA configuration,
depth 0 is tested)
0 to 3. We test several functions for decision loss LdS,S̄ and find the square root
function is the most suitable. In the proposed method, it is set as
LdS,S̄ = α ×
√
4−d
(11)
d ∈ {0, 1, 2, 3}. α is an experimental coefficient standing for the decision strictness level.
Larger α value means a more strict decision. The final decision is contentd
d
adaptive because the probability distribution p(S d |JSKIP
) and p(S̄ d |JSKIP
)
are collected in the parameter updating process. To decide the α value, we test
the impact of different α values on decision accuracy. Since Table 1 already
shows that decision accuracy for homogeneous sequences is very high, selection
of α value focus on non-homogeneous sequences. Four high-motion sequences,
Tennis, RaceHorses, ChinaSpeed and BasketballDrill are selected and for BasketballDrill, four QPs, 25, 30, 35 and 40 are tested. The results are shown in
Fig. 5. Fig. 5(a) is the decision accuracy of different α values for various sequences when QP is 30 and Fig. 5(b) is for different QP configurations when
testing BasketballDrill. Both figures show that when α is small, the decision accuracy improves rapidly as α increases until it reaches a certain value. Then the
16
accuracy remains nearly stable for increasing α. It is noticed that in Fig. 5(b),
QP 30 has the highest accuracy. It is because for non-homogeneous sequences,
though more SKIP CUs in depth 0 are early decided in larger QP, the wrong
decision number also increases and decision accuracy is affected. Since larger
α will reduce CUs that are early decided to be SKIP mode, considering the
trade-off between coding efficiency and its corresponding coding complexity reduction, the α is set to be 20. Based on Fig. 5, this value can ensure the decision
accuracy for different sequences and QP configurations over 97%.
3.3. SKIP Mode Based Termination Decision (SMBT)
SKIP mode indicates that current block has uniform motion and the motion
is similar to that of neighboring blocks. Simple approach has been proposed
using SKIP mode characteristics [38]. In this part, the characteristics are fully
tested and further utilized.
Table 2 shows the probability of a CU to be terminated given that SKIP
mode is the optimum mode. Test configurations are the same as those in Section 3.2. On average, most SKIP mode CUs (over 95%) are decided to be
terminated. For smaller CUs, the probability is much higher. This result shows
strong relations between SKIP mode and CU termination. In addition, this relation is suitable for various sequences with different QPs. For non-homogeneous
sequences such as Tennis, the probability is over 90% under all QP configurations.
Furthermore, we test the PU modes share in all terminated CUs to verify the
efficiency of the method. Both homogeneous and non-homogeneous sequences
are select to test the efficiency in different types of situations. Test results are
shown in Fig. 6. As can be observed, for sequences with many static regions
like Vidyo1, over 90% terminated CUs are coded by SKIP mode. Even for nonhomogeneous sequences like Tennis, SKIP mode still occupies a large share in
terminated CUs, over 40%. Moreover, the share of SKIP mode increases with
an increasing QP. Motivated by the results, if a CU with SKIP mode can be
early decided to be terminated with high accuracy, vast computational com17
Table 2. The Probability of a CU with SKIP Mode to be Terminated
Resolution
Sequence
1920x1080
Tennis
Cafe
Vidyo1
1280x720
Vidyo3
Balloons
1024x768
Kendo
Random Access
depth 0 depth 1 depth 2
94.1%
98.7%
99.7%
96.5%
99.2%
99.8%
96.6%
99.5%
99.9%
98.2%
99.5%
99.8%
98.9%
99.7%
99.9%
98.9%
99.8%
99.9%
95.1%
98.8%
99.6%
97.5%
99.5%
99.8%
98.1%
99.7%
99.9%
94.5%
98.7%
99.6%
97.1%
99.3%
99.8%
98.0%
99.6%
99.9%
96.0%
99.2%
99.7%
98.0%
99.5%
99.9%
98.9%
99.7%
99.9%
97.8%
99.4%
99.8%
98.3%
99.7%
99.9%
98.7%
99.6%
99.9%
97.3%
99.4%
99.8%
QP
25
30
35
25
30
35
25
30
35
25
30
35
25
30
35
25
30
35
Average
plexity can be reduced for both homogeneous sequences and non-homogeneous
sequences while coding efficiency is not affected much. So our SKIP mode based
termination decision (SMBT) is proposed utilizing both mode information and
similar discriminant function adopted in MRSD. Thus the decision accuracy
improves a lot compared with [38] for non-homogeneous sequences.
For SKIP mode CUs, two classes {T d , T̄ d } are defined to stand for CUs which
d
are terminated or not in depth d (d ∈ {0, 1, 2}). The discriminant function gw
is given as


gTd
gT̄d


 = −
LdT,T
LdT̄ ,T

d
p(T d |JCU
)


d
LdT̄ ,T̄
p(T̄ d |JCU
)
LdT,T̄

(12)
T and T̄ are termination class and non-termination class respectively. LT,T̄ is
18
Percentage
80%
60%
40%
80%
20%
60%
40%
20%
0%
0%
Depth 0
Depth 1
Depth 2
Depth 0
Depth 1
Depth 2
Depth
Depth
(a) Vidyo1, QP = 25
(b) Vidyo1, QP = 35
80%
60%
40%
SKIP
Merge
PART_2Nx2N
PART_2NxN
PART_Nx2N
Others
100%
80%
Percentage
SKIP
Merge
PART_2Nx2N
PART_2NxN
PART_Nx2N
Others
100%
Percentage
SKIP
Merge
PART_2Nx2N
PART_2NxN
PART_Nx2N
Others
100%
Percentage
SKIP
Merge
PART_2Nx2N
PART_2NxN
PART_Nx2N
Others
100%
20%
60%
40%
20%
0%
0%
Depth 0
Depth 1
Depth 2
Depth 0
Depth 1
Depth 2
Depth
Depth
(c) Tennis, QP = 25
(d) Tennis, QP = 35
Fig. 6. PU modes distribution in terminated CUs. (RA configuration)
the decision loss of deciding a split CU to be terminated early. Selection of the
decision loss is the same as MRSD.



LdT,T = LdT̄ ,T̄ = 0



LdT̄ ,T = 1




Ld = α × √4 − d, α = 20
T,T̄
(13)
If the current CU is coded by SKIP mode and gTd is greater than gT̄d , then
d
d
the current CU is terminated. Similarly, p(T d |JCU
) and p(T̄ d |JCU
) is obtained
d
based on the Bayes’ theorem. µdw , σw
and p(wd ) with w ∈ {T, T̄ } are updated
during a parameter updating period every n frames.
SMBT contains two operation stages shown in Fig. 7(a). CUs with SKIP
mode indicates motion-homogeneous regions, then a minimum risk decision
is performed thus the decision accuracy is improved. For non-homogeneous
19
Homogeneous
SKIP Mode
region detection
݃ܶ݀
൐
Decision Accuracy
Minimum risk
decision
[38]
SMBT
100%
݃ܶ݀
Termination
Decision
95%
90%
85%
20
25
30
35
40
QP
(a)
(b)
Fig. 7. (a) Two stages of SMBT. (b) Decision accuracy comparison (BasketballDrill, RA configuration, depth 0 is tested).
B
B
B
B
B
B
B
B
B
I
Encoding
Order
0
4
3
5
2
B
7
6
Random Access
8
B
I
1
0
B
1
2
3
4
Low-delay B
Fig. 8. Example of hierarchical encoding structure of in two configurations.
sequence, the performance of [38] and SMBT is compared. The same nonhomogeneous sequence BasketballDrill is selected and results are shown in Fig. 7(b).
SMBT outperforms [38] in terms of decision accuracy for all QP configurations,
so the coding efficiency is ensured.
3.4. Overall Algorithm
In proposed algorithm, some parameters in three methods need to be collected first. Also in MBS2S, the early split decision may be wrong and such
decision errors may accumulate during the S2S process. Thus a parameter updating period for these three individual methods is essential every n frames.
When selecting frames used to update statistical parameters, encoding structure in different configurations should be considered. Fig. 8 is an example of
20
Start encoding a CU with
depth d
MBS2S
Collocated CU is
split?
No
Yes
Yes
݀
݀
‫݃ݒܽܦܯ‬
൐ ‫ܦܯ݄ݐ‬
?
ܽ‫݃ݒ‬
No
Encode SKIP and
PART_2Nx2N mode
MRSD
݃ܵ݀ ൐ ݃ܵ݀ ˛
Yes
No
Encode remaining modes
Determine the optimal
mode
SMBT
SKIP mode &&
݃ܶ݀ ൐ ݃ܶ݀ ?
Yes
No
Further split CU
if d != 3
Coding sub-CUs with
depth = d + 1
End encoding a CU
Fig. 9. Flowchart of the proposed algorithm. MBS2S, MRSD and SMBT are
three proposed methods introduced previously.
hierarchical encoding structure in RA and LB configurations. “I” means Iframe and “B” is B-frame, the arrow stands for reference relationship between
two frames. Frames in different temporal layers have different QP offsets, and
the RD cost distributions vary as well. In proposed algorithm, the updating
period is adaptive and the training size is set as the GOP size. For example, in
RA-main configuration, the GOP size is 8, so first 8 B-frames are adopted for
21
updating process. For LB-main, first 4 B-frames are used for training. It is a
simple approach but it ensures that parameters in all QP offsets are recorded.
In [12], the overall RD cost distribution is assumed as Gaussian distribution despite of different temporal layers, and the original decision accuracy in MRSD
and SMBT is already high. Thus, in our proposed algorithm, same parameters
are applied for all CUs which simplifies calculation. Further tests show that
this approach works well and does not cause significant coding efficiency loss.
Parameters collected for three methods during updating process are listed as
follows:
d
d
d
1. MBS2S: p(wS,n+1
|wS,n
), M Davg
.
2. MRSD: µdS , σSd , p(S d ), µdS̄ , σS̄d , p(S̄ d ).
3. SMBT: µdT , σTd , p(T d ), µdT̄ , σT̄d , p(T̄ d ).
The parameters are estimated based on records of all CUs with different QP
offsets and depths. For example, in MRSD, the SKIP probability p(S d ) and
average RD cost of SKIP CUs µdS are calculated as
NSd
N
∑totald
J
µdS = i d i
NS
p(S d ) =
(14)
where NSd is number of SKIP CUs, Ntotal is the total number of CUs and Jid
is the RD cost value of every SKIP CU. After the updating process, three
methods are performed to reduce computational complexity. To make sure the
parameters can adapt to the video content and decision errors in MBS2S are
not severe, the updating period n is set to be 50.
Fig. 9 shows the flowchart of the proposed fast inter CU decision algorithm.
MBS2S is firstly performed so that the current depth evaluation of CUs with
various motions is early skipped. Then MRSD can detect SKIP mode before
all possible mode evaluations are performed. And CUs with SKIP mode will
be tested further to decide whether the current CU is terminated or not in
SMBT. Since SKIP mode is more common in homogeneous sequences, these two
22
methods are more effective for them. The whole algorithm can deal with both
homogeneous and non-homogeneous situations so it performs well for various
sequences and detailed results are provided in the next section.
4. Experimental Results
4.1. Test Conditions
In this section, performance of the proposed algorithm is evaluated in detail.
The proposed algorithm is implemented in HEVC test software HM-16.7 and
tested under JCT-VC common test conditions [39]. Test conditions adopted in
this paper are shown as follows:
1. The test software runs on an Intel Core i7-2600 CPU @ 3.4GHz with 8GB
memory.
2. HEVC main profile is used.
3. CTU size and maximum depth are set as 64x64 and 4, respectively.
4. Four QPs: 22, 27, 32 and 37 and two configurations: random access (RA)
and low-delay B (LB) are used.
5. For RA configuration, class A, B, C, D and F are tested. And for LB
configuration, class B, C, D, E and F are tested.
The coding efficiency is evaluated by the BD bit rate (BDBR) using the
Bjontegaard’s method [40]. The computational complexity reduction is measured by time saving (TS). TS is defined as
TS =
Torg − Tpro
× 100%
Torg
(15)
where Torg is the coding time of the original HM-16.7 and Tpro stands for computational complexity of the proposed CU decision algorithm including coding
time of both updating process and the following fast encoding process.
23
Table 3. Performance of The Proposed Methods for RA Configuration
class
sequence
Traffic
PeopleOnStreet
Kimono
ParkScene
Cactus
BasketballDrive
BQTerrace
BasketballDrill
BQMall
PartyScene
RaceHorses
BasketballPass
BQSquare
BlowingBubbles
RaceHorses
BasketballDrillText
ChinaSpeed
SlideEditing
SlideShow
Average
A
B
C
D
F
MBS2S
BDBR (%) TS (%)
0.19
4.2
0.15
9.4
0.16
2.4
0.11
4.0
0.12
3.9
0.21
4.6
0.14
4.2
0.11
3.9
0.14
5.2
0.08
5.8
0.10
6.7
0.10
6.2
0.13
5.0
0.08
6.4
0.13
7.6
0.09
4.6
0.13
6.0
0.00
-0.4
0.18
-1.3
0.12
4.7
MRSD
BDBR (%) TS (%)
0.09
37.4
0.13
17.7
0.04
30.8
0.14
35.8
0.13
33.1
0.12
32.3
0.06
36.9
0.01
26.2
0.17
31.4
0.15
26.2
0.20
18.3
0.11
22.8
0.11
32.2
0.17
26.8
0.24
16.5
0.10
25.6
0.15
22.8
0.08
57.1
0.10
46.3
0.12
30.3
SMBT
BDBR (%) TS (%)
0.08
44.1
0.03
16.7
0.12
36.6
0.15
41.7
0.17
38.8
0.14
37.8
0.09
42.4
0.04
29.2
0.11
35.3
0.09
26.1
0.11
17.7
0.06
22.2
0.07
34.4
0.15
29.1
0.15
14.7
0.05
27.4
0.06
23.8
0.00
73.1
0.03
57.1
0.09
34.2
MRSD + SMBT
BDBR (%) TS (%)
0.42
54.8
0.23
24.1
0.25
46.1
0.49
51.9
0.54
48.6
0.39
47.2
0.36
53.5
0.12
38.4
0.39
45.9
0.38
36.0
0.32
25.5
0.33
31.6
0.34
46.9
0.48
39.0
0.35
22.1
0.21
37.0
0.37
32.2
0.05
86.3
0.08
70.9
0.32
44.1
Table 4. Performance of The Proposed Methods for LB Configuration
class
B
C
D
E
F
sequence
Kimono
ParkScene
Cactus
BasketballDrive
BQTerrace
BasketballDrill
BQMall
PartyScene
RaceHorses
BasketballPass
BQSquare
BlowingBubbles
RaceHorses
Johnny
FourPeople
KristenAndSara
BasketballDrillText
ChinaSpeed
SlideEditing
SlideShow
Average
MBS2S
BDBR (%) TS (%)
0.14
3.7
0.07
5.1
0.14
3.9
0.10
4.2
0.08
4.6
0.14
5.7
-0.04
7.2
0.06
6.4
0.16
9.2
0.04
9.8
0.12
7.8
-0.10
8.8
0.15
10.4
0.09
0.6
0.00
1.6
0.24
2.9
0.06
6.6
0.15
6.4
0.21
1.2
0.27
2.7
0.10
5.4
MRSD
BDBR (%) TS (%)
0.04
26.3
0.20
29.7
0.27
28.4
0.10
26.6
0.14
30.0
0.12
22.5
0.14
25.7
0.19
16.2
0.20
15.4
0.17
19.9
0.31
21.5
0.09
18.5
0.18
13.1
0.23
45.9
0.10
42.1
0.33
43.8
0.14
23.4
0.24
20.6
0.18
57.9
0.24
48.0
0.18
28.8
SMBT
BDBR (%) TS (%)
0.10
28.2
0.09
31.4
0.26
31.4
0.07
29.4
0.08
31.7
0.06
23.4
0.11
27.0
0.08
15.1
0.08
14.2
0.00
18.7
0.09
20.1
-0.08
16.7
0.04
10.3
0.06
57.0
0.17
50.6
0.06
52.7
0.03
23.2
0.08
19.5
-0.12
71.8
0.69
56.3
0.10
31.4
MRSD + SMBT
BDBR (%) TS (%)
0.24
37.4
0.58
40.6
0.72
41.2
0.36
39.2
0.21
44.6
0.38
32.6
0.35
36.4
0.30
22.5
0.41
21.8
0.36
27.1
0.43
28.8
0.31
25.8
0.39
18.8
0.46
69.7
0.51
63.4
0.49
65.7
0.27
32.1
0.43
29.2
0.29
86.1
0.89
70.0
0.42
41.7
4.2. Performance of Individual Methods
Table 3 and 4 show BDBR increase and coding time saving when comparing
three individual methods introduced in Section 3 with the original HM-16.7
reference software for RA and LB configurations, respectively. BDBR column
24
shows the BDBR increase and TS column is the time saving calculated by
Equation (15). The results indicate following features:
1. MBS2S attempts to split large CUs with various motions into smaller
CUs. For non-homogeneous sequences, lots of CUs have various motions
and thus many are early decided to be split. For example, when coding
RaceHorses sequence, MBS2S can save 7.6% and 10.4% coding time for
RA and LB configurations respectively. For homogeneous sequences like
Johnny and SlideEditing, the method is not effective. That is because
the motive objects in these sequences are small and random, which means
even the collocated CU is split, the probability of current CU to be split
is low. MBS2S is turned off based on the video contents and thus the
unexpected coding efficiency loss is avoided.
2. MRSD aims to detect SKIP mode early while SMBT aims to determine
CU termination. Both methods need to test SKIP mode which is more
common in homogeneous sequences. So these two methods are more suitable for homogeneous sequences and achieve significant performance in
terms of complexity reduction. Meanwhile, for those non-homogeneous
sequences, though two methods are less effective relatively because the
share of SKIP mode is low, low-motion regions such as backgrounds still
exist. These regions are mostly coded by SKIP mode and MRSD and
SMBT can still save nearly 20% coding time for these sequences. Overall,
both methods can reduce computational complexity by about 30% with
little coding efficiency loss.
3. Since SKIP mode utilized in SMBT is early determined in MRSD, the decision accuracy of MRSD will strongly affect SMBT and may lead to large
BDBR increase. So the performance of MRSD + SMBT is also tested.
From TABLE 3 and 4 we can see that the BDBR increase are only 0.32%
and 0.42% for RA and LB configurations respectively. The maximum
BDBR increase in less than 1% when performing MRSD + SMBT. This
is quite acceptable and shows the validity of the proposed methods.
25
In addition, three methods are compared with the state-of-the-art algorithms
for further evaluation. MBS2S is compared with CUSE method in [12] for both
methods aim to early split CUs. Though CUSE achieves about 20% time saving
when coding non-homogeneous sequences for RA configuration which is better
than MBS2S. The BDBR increase is also much larger than ours, averagely over
0.50%. When coding homogeneous sequences, the BDBR increase of CUSE
can be up to 0.90% while the coding time performance is not good. MBS2S
can be alternatively turned off for homogeneous sequences to ensure the coding
efficiency. So in MBS2S the maximum BDBR increase is only about 0.20%.
[11] and [26] both propose a fast SKIP mode decision algorithm and are selected for comparison with MRSD. In [11], 41% coding time is saved with 0.27%
BDBR increase for RA configuration. [26] achieves 34.6% and 36.5% time saving
with 0.40% and 0.50% BDBR increase for RA and LB configurations respectively. MRSD can save about 30% coding time which is slightly less than [11]
and [26], but the BDBR increase is much less thus the overall performance is
comparable. Besides, high coding efficiency (with less than 0.20% BDBR increase) makes MRSD easier to implement with other fast coding methods with
a good balance between complexity reduction and coding efficiency.
To evaluate the performance of SMBT, [30] and [31] are chosen for the
comparison. Both works achieve about 40% time saving for RA and LB configurations, but the coding loss is also high. In [30], the maximum BDBR increase
is over 2.5% (2.9% for RA and 2.7% for LB). In [31], the value is 1.8% for RA
configuration and 2.2% for LB configuration. The coding efficiency of SMBT
is much better than [30] and [31], with 0.09% and 0.10% BDBR increase for
RA and LB configurations and maximum value less than 0.70%. In addition,
SMBT averagely achieves 53.4% time saving for LB configuration when coding
class E, which is slightly better than [31] (with 52.8% coding time saving). This
indicates that SMBT has advantage when dealing with homogeneous sequences.
Table 3 and 4 indicate that for various sequences, the proposed methods are
effective and lead to little BDBR increase. So we can infer that the whole algorithm performs well in terms of both complexity reduction and coding efficiency.
26
Table 5. Coding Efficiency and Complexity Reduction of the Proposed Algorithm and HM-Fast for RA and LB Configurations
Class
A
B
C
D
E
F
Sequence
Traffic
PeopleOnStreet
Kimono
ParkScene
Cactus
BasketballDrive
BQTerrace
BasketballDrill
BQMall
PartyScene
RaceHorses
BasketballPass
BQSquare
BlowingBubbles
RaceHorses
Johnny
KristenAndSara
FourPeople
BasketballDrillText
ChinaSpeed
SlideEditing
SlideShow
Average
Proposed Algorithm
Random Access
Low-Delay B
BDBR (%) TS (%) BDBR (%) TS (%)
0.63
57.9
0.40
31.7
0.37
47.1
0.42
41.3
0.62
55.3
0.64
46.0
0.65
51.2
0.73
43.8
0.58
50.3
0.42
42.1
0.49
57.3
0.33
49.3
0.21
41.8
0.38
37.2
0.55
50.7
0.58
41.1
0.49
42.7
0.40
28.7
0.50
31.1
0.51
27.9
0.48
36.9
0.51
35.1
0.39
50.3
0.58
36.6
0.67
45.2
0.46
31.7
0.55
30.4
0.69
26.8
0.80
71.0
0.56
66.9
0.68
65.4
0.31
41.2
0.51
37.1
0.56
36.7
0.60
33.3
0.06
86.2
0.27
86.1
0.31
71.6
0.86
70.0
0.46
48.2
0.55
45.9
HM-Fast
Random Access
Low-Delay B
BDBR (%) TS (%) BDBR (%) TS (%)
2.09
64.9
1.80
31.6
1.71
55.6
1.30
45.5
2.15
63.1
1.72
52.5
2.29
55.2
1.43
46.0
1.30
51.4
0.83
42.3
2.19
62.5
1.02
54.1
1.08
45.1
0.58
34.4
2.86
55.6
1.54
45.4
2.28
46.7
1.33
34.0
1.55
32.5
0.66
23.3
2.49
38.2
1.31
31.0
0.85
57.8
0.89
39.3
1.72
48.6
0.94
33.1
3.18
32.0
1.45
23.9
0.79
76.5
0.76
70.6
0.82
70.1
2.65
42.9
0.77
34.5
3.07
39.7
1.56
33.2
0.08
88.0
0.43
88.2
0.16
74.1
1.07
71.8
1.97
51.9
1.06
47.5
4.3. Performance of Overall Algorithm
Table 5 shows the coding efficiency and complexity reduction of the proposed algorithm for RA and LB configurations. 48.2% of coding complexity is
reduced with 0.46% BDBR increase for RA configuration. For LB configuration,
45.9% of coding time is saved while BDBR only increases by 0.55%. That means
the proposed algorithm can significantly reduce computational complexity with
negligible coding efficiency loss. In addition, the maximum BDBR increase for
RA and LB configurations are only 0.67% and 0.86%, respectively. This shows
the algorithm achieves satisfied coding efficiency for various sequences. Furthermore, since several fast coding methods are already included in HM software
such as early SKIP mode decision (ESD) and early CU (ECU), we also test
the performance of HM-fast anchor for comparison. The results are shown in
Table 5 as well. On average, the HM-fast anchor can reduce 51.9% and 47.5%
coding time with 1.97% and 1.06% BDBR increase for RA and LB configura-
27
HM-16.7
Proposed Algorithm
42
40
40
Y-PSNR (dB)
Y-PSNR (dB)
HM-16.7
Proposed Algorithm
42
38
36
38
36
34
34
0
2000
4000
6000
8000
10000
12000
14000
0
1000
bit rate (kbps)
3000
4000
5000
bit rate (kbps)
(a) Traffic
(b) Kimono
HM-16.7
Proposed Algorithm
40
2000
HM-16.7
Proposed Algorithm
42
40
38
Y-PSNR (dB)
Y-PSNR (dB)
38
36
34
32
36
34
32
30
30
0
1000
2000
3000
4000
0
5000
200
400
600
800
1000
1200
1400
1600
bit rate (kbps)
bit rate (kbps)
(c) RaceHorsesC
(d) BasketballPass
Fig. 10. RD curves of four individual sequences. (RA configuration)
tions, respectively. The proposed algorithm achieves very close performance in
terms of coding time with much better coding efficiency. In addition, we also implement the proposed algorithm into the HM-fast anchor. Experimental results
show that as compared to the original HM-fast anchor, the proposed algorithm
plus HM-Fast can still save 10.8% and 9.1% coding time with 0.21% and 0.23%
BDBR increase for RA and LB configurations, respectively.
To evaluate the RD performance of the proposed algorithm, RD curves of
the HM reference software and proposed algorithm are shown in Fig. 10. Four
sequences with different resolutions, Traffic, Kimono, RaceHorsesC and BasketballPass, are tested for RA configuration. Fig. 10 shows that the proposed
algorithm achieves RD performance very similar to that of the original HM ref28
Table 6. The Performance Comparison with Existing Algorithms for RA Configuration
class
A
B
C
D
sequence
Traffic
PeopleOnStreet
Kimono
ParkScene
Cactus
BasketballDrive
BQTerrace
BasketballDrill
BQMall
PartyScene
RaceHorses
BasketballPass
BQSquare
BlowingBubbles
RaceHorses
Average
proposed
BDBR (%) TS (%)
0.6
57.9
0.4
31.7
0.4
47.1
0.6
55.3
0.7
51.2
0.6
50.3
0.5
57.3
0.2
41.8
0.6
50.7
0.5
42.7
0.5
31.1
0.5
36.9
0.4
50.3
0.7
45.2
0.6
30.4
0.5
45.3
Shen et al. [20]
BDBR (%) TS (%)
0.9
59.3
1.1
38.3
0.7
42.8
1.1
56.4
0.9
54.7
0.6
48.7
0.7
58.8
0.1
46.3
0.9
52.2
0.8
42.7
1.0
34.6
0.4
51.4
0.5
53.5
0.6
44.0
0.9
31.5
0.7
47.7
Wu et al. [28]
BDBR (%) TS (%)
1.3
55.2
1.3
24.9
0.8
35.8
1.2
52.1
1.5
45.3
0.6
38.7
1.0
53.0
0.7
35.6
1.7
36.8
1.1
33.4
1.0
25.1
1.4
38.2
0.9
46.1
1.2
32.9
1.4
22.8
1.1
38.4
Ahn et al. [29]
BDBR (%) TS (%)
0.8
61.6
0.9
26.9
1.3
58.2
1.2
52.6
2.8
56.8
2.0
50.9
1.6
54.5
1.9
45.2
2.2
48.6
0.8
37.7
2.2
33.9
1.5
33.6
0.6
45.1
0.7
38.2
1.1
26.6
1.4
44.7
erence software for different QP configurations and sequences. Therefore, the
RD performance degradation of the proposed algorithm is negligible.
4.4. Performance Compared with Previous Works
To further evaluate the performance of the proposed algorithm, experimental results are compared with those in previous works. Firstly, three algorithms,
Shen et al.’s [20], Wu et al.’s [28] and Ahn et al.’s [29], are chosen for detailed
comparison. Table 6 and Table 7 show the performance comparison in terms
of BDBR increase and time saving for RA and LB configurations, respectively.
HM-10.0, HM-16.4, HM-12.0 are employed in [20], [28], [29], respectively. According to [41], coding results for different HM versions are similar for HM does
not involve significant changes among these versions. Therefore the comparison with other works under different HM versions is acceptable. Comparison
results show that the proposed algorithm achieves better coding efficiency while
maintaining a comparable complexity reduction.
For RA configuration, our algorithm achieves comparable time saving with [20]
and [29] and outperforms [28]. Meanwhile, the BDBR increase of our algorithm
is the least, less than half of [28]and [29]. Compared with [20], our algorithm
29
Table 7. The Performance Comparison with Existing Algorithms for LB Configuration
class
B
C
D
E
sequence
Kimono
ParkScene
Cactus
BasketballDrive
BQTerrace
BasketballDrill
BQMall
PartyScene
RaceHorses
BasketballPass
BQSquare
BlowingBubbles
RaceHorses
Johnny
KristenAndSara
FourPeople
Average
proposed
BDBR (%) TS (%)
0.4
41.3
0.6
46.0
0.7
43.8
0.4
42.1
0.3
49.3
0.4
37.2
0.6
41.1
0.4
28.7
0.5
27.9
0.5
35.1
0.6
36.6
0.5
31.7
0.7
26.8
0.8
71.0
0.6
66.9
0.7
65.4
0.5
43.2
Shen et al. [20]
BDBR (%) TS (%)
0.8
42.7
1.2
54.9
0.0
52.7
1.0
47.8
0.9
57.6
0.7
45.2
1.2
52.2
1.0
45.6
1.0
36.6
1.3
49.6
1.0
45.8
1.8
39.9
1.6
32.9
1.0
71.5
1.4
69.3
1.0
72.3
1.1
51.0
Wu et al. [28]
BDBR (%) TS (%)
0.3
30.0
1.0
41.3
0.7
35.8
0.3
29.9
0.7
43.9
0.4
27.6
1.7
36.8
0.5
24.5
0.3
21.0
0.7
30.5
0.3
27.0
0.8
25.2
0.5
19.3
0.5
62.8
0.4
61.3
0.6
59.7
0.6
36.0
Ahn et al. [29]
BDBR (%) TS (%)
0.8
47.9
1.1
48.3
2.2
48.0
1.2
42.6
0.3
47.5
2.0
37.0
1.2
40.9
0.4
33.0
1.1
26.1
1.2
27.7
0.2
38.8
0.3
31.5
0.7
21.2
-0.3
73.9
0.5
69.6
1.6
65.6
0.9
43.7
also reduces BDBR increase by 0.2%. Thus the proposed algorithm can reduce
much coding complexity with better coding efficiency compared with existing
works.
For LB configuration, although [20] reduces more computational complexity
by about 8%, our algorithm improves a lot in terms of coding efficiency. The
BDBR increase is less than half of [20]. Compared with [28], the proposed
algorithm slightly reduces BDBR increase by 0.1% with better performance
on time saving. Over 7% coding time is further reduced. When compared
with [29], the proposed algorithm has comparable coding time saving while the
BDBR increase is reduced by half, which shows the advance of our proposed
algorithm.
Then our results are compared with more state-of-the-art works [12], [13]
and [35]. HM-10.1, HM-15.0 and HM-9.0 are employed in [12], [13] and [35]
respectively. [13] achieves 53.6% and 48.4% time saving with 0.71% and 0.63%
BDBR increase for RA and LB configurations. Our algorithm has the overall
comparable results compared with it. [12] and [35] achieve about 10% to 15%
more time saving than ours, but they are at the cost of much more coding
30
efficiency loss. The BDBR increase of [12] for RA configuration is 1.43% and
for LB configuration, it is 1.22%. Both are more than twice of ours. [35] has the
2.74% BDBR increase for RA configuration, which is larger than ours by over
2%. This shows our algorithm achieves a better balance between the complexity
reduction and coding efficiency.
In addition, because of the turn-off scheme adopted in MBS2S and minimum
risk decision adopted in MRSD and SMBT, the proposed algorithm achieves
satisfied coding efficiency for all sequences. For RA configuration, the maximum
BDBR increase in [12] [13] and [35] are 2.33%, 1.33% and 3.98%, respectively.
While it is only 0.67% in our proposed algorithm.
5. Conclusion
A fast inter CU decision algorithm has been proposed, including motion
based split-to-split method (MBS2S), minimum risk SKIP detection (MRSD)
and SKIP mode based termination decision (SMBT). Partitioning information
and motion diversity of the collocated CU are utilized to early determine CU
splitting in MBS2S. Then, a discriminant function minimizing expected risk
is defined for MRSD and SMBT. The proposed algorithm is content-adaptive
and can reduce much computational complexity with high accuracy. For random
access configuration, the proposed algorithm can reduce 48.2% coding time with
only 0.46% BDBR increase. For low-delay B configuration, it can also save
coding time by 45.9% with 0.55% BDBR increase. Compared with other existing
works, this algorithm can achieve a better coding efficiency while maintaining a
high coding complexity reduction. The performance could be further improved
with separate parameter models for different temporal layres.
Acknowledgment
This work was supported by CityU Internal Grant (7004440) and Croucher
Startup Allowance (9500015).
31
References
[1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the
H.264/AVC video coding standard, IEEE Transactions on Circuits and
Systems for Video Technology 13 (7) (2003) 560–576. doi:10.1109/TCSVT.
2003.815165.
[2] G. J. Sullivan, J. R. Ohm, W. J. Han, T. Wiegand, Overview of the high
efficiency video coding (HEVC) standard, IEEE Transactions on Circuits
and Systems for Video Technology 22 (12) (2012) 1649–1668. doi:10.
1109/TCSVT.2012.2221191.
[3] J. R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, T. Wiegand, Comparison of the coding efficiency of video coding standards-including high
efficiency video coding (HEVC), IEEE Transactions on Circuits and Systems for Video Technology 22 (12) (2012) 1669–1684. doi:10.1109/TCSVT.
2012.2221192.
[4] F. Bossen, B. Bross, K. Sühring, D. Flynn, HEVC complexity and implementation analysis, IEEE Transactions on Circuits and Systems for Video
Technology 22 (12) (2012) 1685–1696. doi:10.1109/TCSVT.2012.2221255.
[5] S. Cho, M. Kim, Fast CU spliting and pruning for suboptimal CU partitioning in HEVC intra coding, IEEE Transactions on Circuits and Systems
for Video Technology 23 (9) (2013) 1555–1564. doi:10.1109/TCSVT.2013.
2249017.
[6] B. Min, R. C. C. Cheung, A fast CU size decision algorithm for the HEVC
intra encorder, IEEE Transactions on Circuits and Systems for Video Technology 25 (5) (2015) 892–896. doi:10.1109/TCSVT.2014.2363739.
[7] L. Zhao, X. Fan, S. Ma, D. Zhao, Fast intra-encoding algorithm for high
efficiency video coding, Signal Processing: Image Communication 29 (9)
(2014) 935–944. doi:10.1016/j.image.2014.06.008.
32
[8] K. Lim, J. Lee, S. Kim, S. Lee, Fast PU skip and split termination algorithm
for HEVC intra prediction, IEEE Transactions on Circuits and Systems for
Video Technology 25 (8) (2015) 1335–1346. doi:10.1109/TCSVT.2014.
2380194.
[9] Z. Liu, X. Yu, Y. Gao, S. Chen, X. Ji, D. Wang, Cu partition mode decision
for hevc hardwired intra encoder using convolution neural network, IEEE
Transactions on Image Processing 25 (11) (2016) 5088–5103. doi:10.1109/
TIP.2016.2601264.
[10] T. Zhang, M. T. Sun, D. Zhao, W. Gao, Fast intra mode and cu size
decision for hevc, IEEE Transactions on Circuits and Systems for Video
Technology PP (99) (2016) 1–1. doi:10.1109/TCSVT.2016.2556518.
[11] M. Kim, N. Ling, L. Song, Z. Gu, Fast SKIP mode decision with ratedistortion optimization for high efficiency video coding, in: Multimedia
and Expo Workshops (ICMEW), 2014 IEEE International Conference on,
2014, pp. 1–6. doi:10.1109/ICMEW.2014.6890721.
[12] J. Lee, S. Kim, K. Lim, S. Lee, A fast CU size decision algorithm for HEVC,
IEEE Transactions on Circuits and Systems for Video Technology 25 (3)
(2015) 411–421. doi:10.1109/TCSVT.2014.2339612.
[13] H. S. Kim, R. H. Park, Fast CU partitioning algorithm for HEVC using
online learning based Bayesian decision rule, IEEE Transactions on Circuits
and Systems for Video Technology 26 (1) (2016) 130–138. doi:10.1109/
TCSVT.2015.2444672.
[14] Q. Hu, Z. Shi, X. Zhang, Z. Gao, Early skip mode decision based on
bayesian model for hevc, in: 2015 Visual Communications and Image Processing (VCIP), 2015, pp. 1–4. doi:10.1109/VCIP.2015.7457828.
[15] S. h. Jung, H. W. Park, A fast mode decision method in hevc using adaptive
ordering of modes, IEEE Transactions on Circuits and Systems for Video
Technology 26 (10) (2016) 1846–1858. doi:10.1109/TCSVT.2015.2473303.
33
[16] Q. Hu, X. Zhang, Z. Shi, Z. Gao, Neyman-pearson-based early mode decision for hevc encoding, IEEE Transactions on Multimedia 18 (3) (2016)
379–391. doi:10.1109/TMM.2015.2512799.
[17] J. Xiong, H. Li, F. Meng, Q. Wu, K. N. Ngan, Fast hevc inter cu decision
based on latent sad estimation, IEEE Transactions on Multimedia 17 (12)
(2015) 2147–2159. doi:10.1109/TMM.2015.2491018.
[18] W. Zhao, T. Onoye, T. Song, Hierarchical structure-based fast mode decision for H.265/HEVC, IEEE Transactions on Circuits and Systems for
Video Technology 25 (10) (2015) 1651–1664. doi:10.1109/TCSVT.2015.
2395751.
[19] Y. Zhang, H. Wang, Z. Li, Fast coding unit depth decision algorithm for
interframe coding in HEVC, in: Data Compression Conference (DCC),
2013, 2013, pp. 53–62. doi:10.1109/DCC.2013.13.
[20] L. Shen, Z. Zhang, Z. Liu, Adaptive inter-mode decision for HEVC jointly
utilizing inter-level and spatiotemporal correlations, IEEE Transactions on
Circuits and Systems for Video Technology 24 (10) (2014) 1709–1722. doi:
10.1109/TCSVT.2014.2313892.
[21] P. Nalluri, L. N. Alves, A. Navarro, Complexity reduction methods for fast
motion estimation in HEVC, Signal Processing: Image Communication 39,
Part A (2015) 280–292. doi:10.1016/j.image.2015.09.015.
[22] F. Chen, P. Li, Z. Peng, G. Jiang, M. Yu, F. Shao, A fast inter coding
algorithm for HEVC based on texture and motion quad-tree models, Signal
Processing: Image Communication 47 (2016) 271 – 279. doi:10.1016/j.
image.2016.07.002.
[23] L. Shen, Z. Liu, X. Zhang, W. Zhao, Z. Zhang, An effective cu size decision
method for hevc encoders, IEEE Transactions on Multimedia 15 (2) (2013)
465–470. doi:10.1109/TMM.2012.2231060.
34
[24] Z. Liu, T.-L. Lin, C.-C. Chou, Efficient prediction of CU depth and PU
mode for fast HEVC encoding using statistical analysis, Journal of Visual
Communication and Image Representation 38 (2016) 474 – 486. doi:10.
1016/j.jvcir.2016.03.025.
[25] Z. Pan, J. Lei, Y. Zhang, X. Sun, S. Kwong, Fast motion estimation based
on content property for low-complexity h.265/hevc encoder, IEEE Transactions on Broadcasting 62 (3) (2016) 675–684. doi:10.1109/TBC.2016.
2580920.
[26] J. Kim, J. Yang, K. Won, B. Jeon, Early determination of mode decision
for HEVC, in: Picture Coding Symposium (PCS), 2012, 2012, pp. 449–452.
doi:10.1109/PCS.2012.6213251.
[27] J. Zhang, B. Li, H. Li, An efficient fast mode decision method for inter
prediction in HEVC, IEEE Transactions on Circuits and Systems for Video
Technology 26 (8) (2016) 1502–1515. doi:10.1109/TCSVT.2015.2461991.
[28] J. Wu, B. Guo, J. Hou, Y. Yan, J. Jiang, A fast CU encoding scheme based
on the joint constraint of best and second-best PU modes for HEVC inter
coding, in: 2015 IEEE International Conference on Imaging Systems and
Techniques (IST), 2015, pp. 1–5. doi:10.1109/IST.2015.7294566.
[29] S. Ahn, B. Lee, M. Kim, A novel fast CU encoding scheme based on spatiotemporal encoding parameters for HEVC inter coding, IEEE Transactions on Circuits and Systems for Video Technology 25 (3) (2015) 422–435.
doi:10.1109/TCSVT.2014.2360031.
[30] X. Shen, L. Yu, J. Chen, Fast coding unit size selection for hevc based
on bayesian decision rule, in: 2012 Picture Coding Symposium, 2012, pp.
453–456. doi:10.1109/PCS.2012.6213252.
[31] X. Shen, L. Yu, Cu splitting early termination based on weighted svm,
EURASIP Journal on Image and Video Processing 2013 (1) (2013) 4. doi:
10.1186/1687-5281-2013-4.
35
[32] J. Xiong, H. Li, Q. Wu, F. Meng, A fast hevc inter cu selection method
based on pyramid motion divergence, IEEE Transactions on Multimedia
16 (2) (2014) 559–564. doi:10.1109/TMM.2013.2291958.
[33] H. L. Tan, C. C. Ko, S. Rahardja, Fast coding quad-tree decisions using
prediction residuals statistics for high efficiency video coding (hevc), IEEE
Transactions on Broadcasting 62 (1) (2016) 128–133. doi:10.1109/TBC.
2015.2505406.
[34] P. Hele, S. Oudin, B. Bross, D. Marpe, M. O. Bici, K. Ugur, J. Jung,
G. Clare, T. Wiegand, Block merging for quadtree-based partitioning in
HEVC, IEEE Transactions on Circuits and Systems for Video Technology
22 (12) (2012) 1720–1731. doi:10.1109/TCSVT.2012.2223051.
[35] J. Xiong, H. Li, F. Meng, S. Zhu, Q. Wu, B. Zeng, Mrf-based fast HEVC
inter CU decision with the variance of absolute differences, IEEE Transactions on Multimedia 16 (8) (2014) 2141–2153. doi:10.1109/TMM.2014.
2356795.
[36] J. Vanne, M. Viitanen, T. D. Hämäläinen, Efficient mode decision schemes
for HEVC inter prediction,, IEEE Transactions on Circuits and Systems
for Video Technology 24 (9) (2014) 1579–1593. doi:10.1109/TCSVT.2014.
2308453.
[37] M. Bystrom, I. Richardson, Y. Zhao, Efficient mode selection for h.264 complexity reduction in a bayesian framework, Signal Processing: Image Communication 23 (2) (2008) 71–86. doi:10.1016/j.image.2007.11.001.
[38] K. Choi, E. S. Jang, Fast coding unit decision method based on coding tree
pruning for high efficiency video coding, Optical Engineering 51 (3) (2012)
030502–1–030502–3. doi:10.1117/1.OE.51.3.030502.
[39] F. Bossen, Common test conditions and software reference configuration,
document JCTVC-L1100, Geneva.
36
[40] G. Bjontegaard, Calcuation of average PSNR differences between RDcurves, document VCEG-M33 ITU-T Q6/16, Austin, TX, USA.
[41] T. Zhang, M. T. Sun, D. Zhao, W. Gao, Fast intra mode and cu size
decision for hevc, IEEE Transactions on Circuits and Systems for Video
Technology PP (99) (2016) 1–1. doi:10.1109/TCSVT.2016.2556518.
37
*Highlights (for review)
Highlights:



An integrated fast inter CU decision algorithm for HEVC with good balance between
computational complexity reduction and coding efficiency is proposed.
Motion diversity and split decision of collocated CU are utilized to help early split current
CU.
New minimum risk decision is adopted to ensure the coding efficiency while the effect on
computational complexity reduction is minimal.
Документ
Категория
Без категории
Просмотров
2
Размер файла
5 831 Кб
Теги
2017, 008, image
1/--страниц
Пожаловаться на содержимое документа