close

Вход

Забыли?

вход по аккаунту

?

978-3-319-69471-9 21

код для вставкиСкачать
Predicting Vulnerable Software Components
Using Software Network Graph
Shengjun Wei1(&), Xiaojiang Du2, Changzhen Hu1, and Chun Shan1
1
2
Beijing Key Laboratory of Software Security Engineering Technique,
Beijing Institute of Technology, Beijing 100081, China
{sjwei,chzhoo,sherryshan}@bit.edu.cn
Department of Computer and Information Sciences, Temple University,
Philadelphia, PA 19122, USA
dxj@ieee.org
Abstract. Vulnerability Prediction Models (VPMs) are used to predict
vulnerability-prone modules and now many software security metrics have been
proposed. In this paper, we predict vulnerability-prone components. Based on
software network graph we define component cohesion and coupling metrics
which are used as security metrics to build the VPM. To validate the prediction
performance, we conduct an empirical study on Firefox 3.6. We compare the
results with other works’, it shows that our model has a good performance in the
accuracy, precision, and recall, and indicate that the proposed metrics are also
effective in vulnerability prediction.
Keywords: Software security Vulnerability prediction Component cohesion
and coupling Software network
1 Introduction
With rapid development of information technology, Information security issues are
becoming increasingly prominent. Among these issues, software vulnerability and
software security are the core points, which have been studied in several papers
(e.g., [1–11]) from multiple perspectives. If a single software vulnerability is utilized by
a hostile attacker, severe damage will be caused to an organization. Software developer
are unlikely to find all vulnerabilities in a software product because of the limited time
and budget. If those vulnerability-prone modules can be predicted, testing and
inspection efforts will be prioritized on these modules, more security vulnerabilities
would be found.
“High Cohesion and Low Coupling” are thought as marks of high quality software,
which are always the ultimate goal pursued by a software developer [12]. High complexity and coupling and low cohesion can lead to the difficulty of software understanding, developing, testing and maintaining, and so may introduce vulnerabilities into
software systems [13, 14]. Many software security metrics including complexity,
cohesion and coupling metrics have been proposed to predict vulnerabilities, the correlations between the metrics and vulnerabilities are generalized [14–18]. Paper [18]
discovers that the sole classic nine code complexity metrics have weak correlation with
© Springer International Publishing AG 2017
S. Wen et al. (Eds.): CSS 2017, LNCS 10581, pp. 280–290, 2017.
https://doi.org/10.1007/978-3-319-69471-9_21
Predicting Vulnerable Software Components Using Software Network Graph
281
security problems for Mozilla JavaScript Engine, and paper [16] believes that univariate
analysis with only one of the complexity, coupling and cohesion metrics is not effective
to identify vulnerability-prone modules under many situations, and multivariate interactions should be taken into account. In this paper, we conduct vulnerability prediction
research at component level. We use component cohesion and coupling metrics
simultaneously which are defined based on software network graph to build prediction
models. To validate the efficiency of the model, experiments are carried out targeting an
open source software Firefox based on their public vulnerabilities.
The rest of this paper is organized as follows: Sect. 2 provides related work.
Section 3 introduces metrics we collect for this study. Section 4 describes the case
study design and evaluation criteria. Section 5 provides the results and discusses our
findings and limitations and Sect. 6 summarizes our study.
2 Related Work
The code metrics which include code complexity and developing process metrics
which include code churn, developer’s experience and the development team’s organizational structure, etc. are firstly used to build Defect Prediction Models (DPMs) to
predict potential defect-proneness of program modules [19–21], and then by reference,
these metrics are used to build VPMs because of regarding software security vulnerabilities as special defects.
Shin et al. [18, 22] study the correlations between nine classic code complexity
metrics and security vulnerabilities at function level targeting JavaScript Engine, the
experimental results show that the correlations are weak, the models have high false
negative rate. Later, they take the code complexity metrics, the dependency network
complexity metrics and execution complexity metrics into account together, the
granularity is at file-level, the experimental results show that the false negative rate
decreases [23]. Later on, they use the complexity, code churn and past fault history
metrics, as well as the complexity, code churn, and developer activity (CCD) metrics,
they conduct experiments on Mozilla Firefox web browser and the Red Hat Enterprise
Linux kernel at file level, the recalls are over 80% and the false positives are also over
20% [15, 24].
Zimmermann et al. [25] adopt the churn, coverage, dependency measures, and
organizational structure of the company metrics, they perform their analysis at binary
level, conduct experiment targeting windows vista, they observe that classical metrics
predict vulnerabilities with a high precision but low recall values. However, the
dependencies predict vulnerabilities with a lower precision but higher recall.
Nguyen et al. [26] propose code metrics based on dependency graphs, they conduct
a prediction model which targets the JavaScript Engine at component level. They
obtain a good accuracy but a very low recall rate.
Chowdhury et al. [27] carry on researches on the classical object-oriented architecture metrics which are class complexity, coupling and cohesion metrics, they perform their analysis at file level, evaluate the model on Mozilla Firefox and achieve a
means recall of 74.22%.
282
S. Wei et al.
Neuhaus et al. [28] discover a correlation between import/function calls and vulnerabilities, they use them to build a classifier at file level. Their experiment targeting
on the Mozilla Firefox show that a recall of 45% and a precision of 70% can be
achieved.
Scandariato et al. [29] treat a source code file as a textfile, with the help of text
mining, they build a classifier. The features are a set of words mined out from the code
file text. In following research [30], they compare the performance of software metrics
and text mining based on the same vulnerability database built by themselves, they
discover that text mining models had higher recall than software metrics. Jimenez et al.
[31] also compare the performance of software metrics, text mining and includes and
function calls, in the context of the Linux kernel, they find that models based on code
metrics perform poorly.
3 Metrics
A software network graph describes the architecture of a software system, at different
granularities, the nodes of which could be component, package, class, method, and data
etc., and the edges of which are dependencies among the nodes.
A software system can be regarded as a set of components and dependencies among
these components. The members of a component are data items and executable codes.
The relations between data item and executable code are data write and data read, and
relations between executable codes are function call and function return.
We firstly construct the software network graph at method and data granularities,
then the software network graph at component granularity is obtained by treating all the
methods and data belonging to a component as a whole, as well as retaining all edges
between two components.
Fig. 1. Software network graph
Predicting Vulnerable Software Components Using Software Network Graph
3.1
283
Software Network Graph
Definition 1. The Software
Network Graph of a software system is a direct graph
GSN V d ; V m ; E c ; Er ; E d , where: V d : is a set of data nodes; V m : is a set of function
nodes; Ec V m V m : is a set of call-edges; E r V m V m : is a set of return-edges;
E d V d V m : is a set of data-edges, including data write and data read.
As shown in Fig. 1, it is a software network graph of a software system. This
system has four components: A, B, C, and D, which are represented by four big dashed
quadrangles. In every component, there are some functions which are represented by
elliptical nodes and data items which are represented by rectangular nodes. The solid
Table 1. Cohesion and coupling metrics based on software network graph
Cohesion metrics
AveDegofIntNod
AveCltCoeofIntNod
Maxz-ValofIntNod
Coupling metrics
DegofCom
ClsCoeofCom
BetofCom
z-ValofCom
Description and rationale
The degree of a node is the number of all the edges connected in or out
the node. The AveDegofIntNod is the average degree of all the nodes
in a component. the larger the value, the stronger the dependency
between the nodes, and the more important the component
The definition of clustering coefficient Ci of a node vi : Suppose node vi
has ki adjacent nodes and there have Mi edges among these ki adjacent
nodes, but the possible maximal number of edges among these ki
adjacent nodes is ki ðki 1Þ, so Ci ¼ Mi =½ki ðki 1Þ.
AveCltCoeofIntNod is the average clustering coefficient of all the
nodes in a component. AveCltCoeofIntNod reflects the cohesion of
elements in a component. The larger the value, the higher the cohesion
We define the z-value of a node vi is zi: zi ¼ ðki kC Þ=rkC , where, ki is
the degree of node vi in the component, kC and rkC are the average
degree and standard deviation of degrees of all nodes in the component
respectively. Maxz-ValofIntNod is the maximal z-value in the
component. zi reflects the connection strength between vi and the other
nodes. The larger the value, the higher the node cohesion
Description and rationale
DegofCom is the degree of a component node: the number of edges
connected in or out the component. The larger the value, the higher the
coupling
ClsCoeofCom is the clustering coefficient of a component node. The
larger the value, the higher the coupling
BetofCom is the betweenness of a component node. There has a
shortest path between any two reachable nodes in the graph. The
betweenness of a node is the number of the shortest paths across the
node. The larger the value, the higher the coupling
The z-ValofCom is the z-value of a component. The z-value of a
component c is zC ¼ ðk k Þ=rk , where, k is the degree of component
c, k is the average degree of all components, and rk is the standard
deviation of the degrees of all components. The larger the value, the
higher the coupling
284
S. Wei et al.
line represents both data-edge and call-edge regarding to the source and the target. The
dash line represents the return-edge.
3.2
Metrics
We perform our analysis aiming at a component. The high cohesion of members in a
component and low coupling between components indicate a high quality for a software, we adopt component cohesion and coupling metrics to build a classifier. We
define the component cohesion metrics and coupling metrics based on the software
network graph described in Sect. 3.1, as shown in Table 1. The component cohesion is
the dependency among members in a component, the component coupling is the
dependency between components.
4 Case Studies
We perform an empirical case study on Mozilla Firefox 3.6. According to the security
advisories for Firefox 3.6 [32], Firefox 3.6 contains 36 releases (main and minor)
which are reported owning security advisories. Firefox 3.6 has 8,571 C/C++ files
which are combined into 6,025 components. We assume that the same vulnerability
exists in all previous revisions.
4.1
Vulnerability Data Collection
The discovered vulnerabilities in Mozilla Firefox are reported as Mozilla Foundation
Security Advisories (MFSAs) [32]. For Firefox 3.6, there are 124 MFSAs, in which
bug IDs are included for the bug reports in the bug database. From these bug reports,
the files that are fixed to mitigate vulnerabilities can be identified. Among those bug
reports linked from MFSAs for Firefox 3.6, if there exists a bug report that has bug
patches for the file, the file is considered vulnerable. As a file is contained in a
component, whether a component is vulnerable or not is determined.
There are three steps for collecting vulnerabilities data:
(1) Searching vulnerabilities from MFSA,
(2) For every vulnerability obtained by step (1), found out the corresponding bug
links in Bugzilla,
(3) For every bug obtained by step (2), found out the patched files for fixing the bug,
then plus 1 to the vulnerability number of the files.
We develop a crawler tool for collecting vulnerability data, this tool are working
with 4 loops, as shown in Fig. 2. The tool’s run results show that among the 6,025
components of Firefox 3.6, there are 212 vulnerable components (3.5% of the total
components).
Predicting Vulnerable Software Components Using Software Network Graph
285
Start
If finished traversing the list of all
vulnerabiliƟes in MFSA?
The final vulnerability
database
Yes
No
Access the next page of vulnerabiliƟes.
End
Fetch serial numbers of vulnerabiliƟes.
Yes
If finished traversing the corresponding
bug links in Bugzilla?
No
Access the next bug links in
Bugzilla.
Yes
If finished traversing the records of all bug
fixing?
Fetch the corresponding bug ID.
No
Read the next record of bug fixing.
Yes
If finished traversing all the patched files?
No
Fetch the vulnerable files’
names.
Save to the vulnerability
dataBase.
Fig. 2. Flowchart of the crawler tool
4.2
Data Collection
Doxygen is an open source tool for code analysis [33]. We develop a software network
graph generation tool based on the Doxygen. We design the required functions. We
analyze code structure, extract the private and static members of a component, fetch
dependency information for these members, generate the network graph finally, and
then compute the metric values associated with the nodes and edges based on the
network graph.
Figure 3 is a part of the software network graph of Firefox 3.6, it shows the
dependencies between component nsJsWinProfile and component nsWinProfile, as well
as the dependencies between component nsJsWinProfile and component nsSoftware
UpdateRun.
4.3
Prediction Performance Measures
To compare our results with [26], we select the five same machine learning techniques
with [26] which are Bayesian Network (BN), Naïve Bayes (NB), Neural Network
(NN), Random Forest (RF), and Support Vector Machine (SVM).
In our experiment, we set the threshold at 0.5. That is, a component is classified as
vulnerable when its estimated probability of vulnerability is over 0.5. Otherwise, a
component is classified as neutral.
286
S. Wei et al.
Fig. 3. A part of software network graph of Firefox 3.6
We use four metrics: accuracy (Acc), precision (Pr), recall (Re), and false positive
(FP) rate to evaluate the performances of our models. For the two-class problem
(vulnerable or neutral), these performance measures are explained using a confusion
matrix, shown in Table 2.
Table 2. Confusion matrix
Actual
Predicted as
Neutral
Vulnerable
Neutral
TN = True Negative FP = False Positives
Vulnerable FN = False Negatives TP = True Positives
The four prediction performance measures are defined in the following formula:
Acc ¼ ðTP þ TN Þ=ðTP þ FP þ TN þ FN Þ
ð1Þ
Pr ¼ TP=ðTP þ FPÞ
ð2Þ
FP ¼ FP=ðFP þ TN Þ
ð3Þ
Re ¼ TP=ðTP þ FN Þ
ð4Þ
Predicting Vulnerable Software Components Using Software Network Graph
287
5 Results and Discussions
5.1
Predictive Power
Before using machine learning techniques to build the VPMs, we employ the Wilcoxon
rank sum test to evaluate the discriminative power of the metrics. The null hypothesis is
that there is no statistically significant difference between the metric values for vulnerable components and neutral components. The results show that the measures of all
seven metrics for the vulnerable components and the neutral components in Firefox 3.6
are significantly different at the 0.05 significance level.
We choose Weka [34] to implement the five machine learning techniques introduced in Sect. 4.3. The parameters in the models are initialized with the default settings. The source code of Firefox 3.6 contains 6,025 components, among which, 212
are vulnerable (less than 3.5%). Such a dataset is heavily imbalanced. We use
undersampling to balance the dataset. With undersampling, all the vulnerable components in the dataset are retained, while only a subset of the neutral components is
selected. The sample of neutral components is randomly chosen that the number of
vulnerable components matches the number of neutral components. The dataset has
424 items, each is 50%. To some degree, undersampling provides better results than
using the unbalanced data. We split the dataset into 10 folds of equal size randomly for
10 10 cross validation and use one fold for testing and the other 9 folds for training,
rotating each fold as the test fold. The entire process is then repeated 10 times and the
results are averaged. Table 3 shows the Acc, Pr, FP, and Re in the experiments.
Table 3. Prediction results in our experiments
Technique
BN
NB
NN
RF
SVM
Average
5.2
Acc (%)
85.99
85.47
87.92
88.73
87.03
87.03
Pr (%)
84.35
84.24
91.75
87.80
84.56
86.54
FP (%)
16.79
16.32
8.020
13.11
16.60
14.17
Re (%)
88.77
87.26
83.87
90.57
90.66
88.23
Discussion
From Table 3, we can make some conclusions as follows.
• From the experimental results, five techniques have similar performance, but NN
has the best level at Pr and FP, which are over 90% and below 10% respectively.
From the average value of five results, Acc, Pr, and Re are all over 85%, and FN is
14.17%. That is, among all the 424 components, 87.03% are classified correctly,
and among the components classified as vulnerable, 86.54% are actual vulnerable
ones. Among all the 212 vulnerable component, 88.23% can be classified correctly.
Among all the normal components, 14.17% are classified falsely. These results
288
S. Wei et al.
Table 4. Results comparison with others’
Acc
Pr
FP
Re
Our results
0.8703
0.8654
0.1417
0.8823
Results of [26]
0.8461
0.6801
0.0870
0.5953
indicate that the proposed metrics are effective for vulnerable components
prediction.
• Compared with the results of [26], there is an increase in the FP (still below 15%),
but the Pr and Re are improved obviously, as shown in Table 4. Pr is increased by
27.2%, Re is increased by 48.2%, up to 50%, and Acc is also increased simultaneously. Figure 4 shows the results comparison with a histogram. We believe this is
worthwhile under the circumstance of a little loss of the FP.
• A security vulnerability can cause severe damage to an organization, we think we
should pay more attentions on the Recall rate, but relax FP rate within a rational
scope. We believe that it is more important to identify the vulnerable components,
even at the expense of incorrectly predicting some non-vulnerable components as
vulnerability-prone. On the other hand, our FP rate of 14.17% is still below 15%.
Fig. 4. Comparison of our results with others’ results in [26]
5.3
Threats to Validity
We collect vulnerability data from the published security advisories, the unpublished
security advisories and vulnerabilities would be missed. The assumption of the same
vulnerability existing in all previous revisions is not correct complete. Therefore, the
quality of vulnerability data may be impacted. However, we believe the overall
comparative tendency is correct.
Predicting Vulnerable Software Components Using Software Network Graph
289
6 Summary and Future Work
“High Cohesion and Low Coupling” are thought as marks of high quality software, we
use cohesion and coupling metrics to predict vulnerability-proneness. We define the
component cohesion and coupling metrics based on a software network graph. We
conduct experiments to validate the efficiency of these metrics and compare the results
with other works. We discover that the proposed metrics are effective, and the accuracy, precision, and recall rate are improved according to the comparison with other
works. In our future work, we will conduct more experiments targeting on more
projects and make more comparison with other different works.
Acknowledgments. This work was supported by National Natural Science Foundation of China
(NSFC) (Grant No. U1636115).
References
1. Liang, S., Du, X.: Permission-combination-based scheme for android mobile malware
detection. In: Proceedings of the IEEE ICC 2014, Sydney, Australia (2014)
2. Du, X., Rozenblit, M., Shayman, M.: Implementation and performance analysis of SNMP on
a TLS/TCP base. In: 7th IFIP/IEEE International Symposium on Integrated Network
Management, Seattle, WA, pp. 453–466 (2001)
3. Xiao, Y., Chen, H., Du, X., Guizani, M.: Stream-based cipher feedback mode in wireless
error channel. IEEE Trans. Wireless Commun. 8(2), 662–666 (2009)
4. Yao, X., Han, X., Du, X., Zhou, X.: A lightweight multicast authentication mechanism for
small scale IoT applications. IEEE Sens. J. 13(10), 3693–3701 (2013)
5. Cheng, Y., Fu, X., Du, X., Luo, B., Guizani, M.: A lightweight live memory forensic
approach based on hardware virtualization, vol. 379, pp. 23–41. Elsevier Information
Sciences (2017)
6. Fu, X., Graham, B., Bettati, R., Zhao, W.: On countermeasures to traffic analysis attacks. In:
4th IEEE SMC Information Assurance Workshop (2003)
7. Ling, Z., Luo, J., Yu, W., Fu, X., Xuan, D., Jia, W.: A new cell counting based attack against
tor. IEEE/ACM Trans. Network. (ToN) 20(4), 1245–1261 (2012)
8. Yue, Q., Ling, Z., Fu, X., Liu, B., Ren, K., Zhao, W.: Blind recognition of touched keys on
mobile devices. In: 21st ACM Conference on Computer and Communications Security,
Scottsdale, Arizona, USA (2014)
9. Qian, Y., Moayeri, N.: Design of secure and application-oriented VANETs. In: Proceedings
of IEEE VTC2008-Spring, Singapore (2008)
10. Zhou, J., Hu, R., Qian, Y.: Scalable distributed communication architectures to support
advanced metering infrastructure in smart grid. IEEE Trans. Parallel Distrib. Syst. 23(9),
1632–1642 (2012)
11. Wei, L., Hu, R., Qian, Y., Wu, G.: Enabling device-to-device communications underlaying
cellular networks: challenges and research aspects. IEEE Commun. 52(6), 90–96 (2014)
12. Taube-Schock, C., Walker, R.J., Witten, I.H.: Can we avoid high coupling? In: Mezini, M.
(ed.) ECOOP 2011. LNCS, vol. 6813, pp. 204–228. Springer, Heidelberg (2011). doi:10.
1007/978-3-642-22655-7_10
13. Viega, J., Mcgraw, G.: Building Secure Software. Addison-Wesley, Boston (2002)
290
S. Wei et al.
14. Morrison, P., Herzig, K., Murphy, B., Williams, L.: Challenges with applying vulnerability
prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of
Security. ACM-Association for Computing Machinery (2015)
15. Shin, Y., Meneely, A., Williams, L., Osborne, J.A.: Evaluating complexity, code churn, and
developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng.
37(6), 772–787 (2011)
16. Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early
indicators of vulnerabilities. J. Syst. Archit. 57(3), 294–313 (2011)
17. Zimmermann, T., Nagappan, N., Williams, L.: Searching for a needle in a haystack:
predicting security vulnerabilities for windows vista. In: Software Testing, Verification and
Validation (ICST), pp. 421–428. IEEE (2010)
18. Shin, Y., Williams, L.: Is complexity really the enemy of software security? In: Proceedings
of the ACM Workshop Quality Protection, pp. 47–50 (2008)
19. Fenton, N., Krause, P., Neil, M.: A probabilistic model for software defect prediction. IEEE
Trans. Softw. Eng. 2143, 444–453 (2001)
20. Emam, K., Melo, W., Machado, J.C.: The prediction of faulty classes using object-oriented
design metrics. J. Syst. Softw. 56, 63–75 (2001)
21. Succi, G., Pedrycz, W., Stefanovic, M., Miller, J.: Practical assessment of the models for
identification of defect-prone classes in object-oriented commercial systems using design
metrics. J. Syst. Softw. 65, 1–12 (2003)
22. Shin, Y., Williams, L.: An empirical model to predict security vulnerabilities using code
complexity metrics. In: Proceedings of the International Symposium Empirical Software
Engineering and Measurement, pp. 315–317 (2008)
23. Shin, Y., Williams, L.: An initial study on the use of execution complexity metrics as
indicators of software vulnerabilities. In: SESS 2011, Waikiki, Honolulu, HI, USA (2011)
24. Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability
prediction? Empir. Softw. Eng. 18, 25–59 (2013)
25. Zimmermann, T., Nagappan, N., Williams, L.: Searching for a needle in a haystack:
predicting security vulnerabilities for windows vista. In: Third International Conference on
Software Testing, Verification and Validation (ICST), pp. 421–428. IEEE (2010)
26. Nguyen, V.H., Tran, L.M.S.: Predicting vulnerable software components with dependency
graphs. In: MetriSec2010, Bolzano-Bozen, Italy (2010)
27. Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early
indicators of vulnerabilities. J. Syst. Architect. 57, 294–313 (2011)
28. Neuhaus S., Zimmermann T., Holler C., Zeller A.: Predicting vulnerable software
components. In: CCS’07, pp. 529–540 (2007)
29. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software
components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
30. Walden, J., Stuckman, J., Scandariato, R.: Predicting vulnerable components: software
metrics vs text mining. In: IEEE 25th International Symposium on Software Reliability
Engineering, pp. 23–33 (2014)
31. Jimenez, M., Papadakis, M., Traon, Y.L.: Vulnerability prediction models: a case study on
the linux kernel. In: IEEE International Working Conference on Source Code Analysis and
Manipulation (SCAM), pp. 1–10 (2016)
32. Mozilla Foundation Security Advisories. https://www.mozilla.org/en-US/security/knownvulnerabilities/. Accessed July 2017
33. Doxygen. http://www.doxygen.org. Accessed July 2017
34. WeKa. http://www.cs.waikato.ac.nz/ml/weka/. Accessed July 2017
Документ
Категория
Без категории
Просмотров
0
Размер файла
955 Кб
Теги
978, 69471, 319
1/--страниц
Пожаловаться на содержимое документа