A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction

Journal of Intelligent Systems 23 (1):75-82 (2014)
  Copy   BIBTEX

Abstract

Predicting the defect-prone modules when the previous defect labels of modules are limited is a challenging problem encountered in the software industry. Supervised classification approaches cannot build high-performance prediction models with few defect data, leading to the need for new methods, techniques, and tools. One solution is to combine labeled data points with unlabeled data points during learning phase. Semi-supervised classification methods use not only labeled data points but also unlabeled ones to improve the generalization capability. In this study, we evaluated four semi-supervised classification methods for semi-supervised defect prediction. Low-density separation, support vector machine, expectation-maximization, and class mass normalization methods have been investigated on NASA data sets, which are CM1, KC1, KC2, and PC1. Experimental results showed that SVM and LDS algorithms outperform CMN and EM-SEMI algorithms. In addition, LDS algorithm performs much better than SVM when the data set is large. In this study, the LDS-based prediction approach is suggested for software defect prediction when there are limited fault data.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,219

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Human Semi-Supervised Learning.Bryan R. Gibson, Timothy T. Rogers & Xiaojin Zhu - 2013 - Topics in Cognitive Science 5 (1):132-172.
Semi-supervised ensemble learning of data streams in the presence of concept drift.Zahra Ahmadi & Hamid Beigy - 2012 - In Emilio Corchado, Vaclav Snasel, Ajith Abraham, Michał Woźniak, Manuel Grana & Sung-Bae Cho (eds.), Hybrid Artificial Intelligent Systems. Springer. pp. 526--537.
Active learning approach to concept drift problem.Bartosz Kurlej & Michal Wozniak - 2012 - Logic Journal of the IGPL 20 (3):550-559.
On the Theoretical Limits to Reliable Causal Inference.Benoit Desjardins - 1999 - Dissertation, University of Pittsburgh
Statistical Learning Theory: A Tutorial.Sanjeev R. Kulkarni & Gilbert Harman - 2011 - Wiley Interdisciplinary Reviews: Computational Statistics 3 (6):543-556.

Analytics

Added to PP
2017-01-12

Downloads
13 (#978,482)

6 months
6 (#431,022)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references