Ensemble feature ranking approach for software fault prediction

Agrawalla, Bikash; Shukla, Alok Kumar; Tripathi, Diwakar; Singh, Koushlendra Kumar; Ramachandra Reddy, B.

doi:10.3233/JIFS-219431

Ensemble feature ranking approach for software fault prediction

Article type: Research Article

Authors: Agrawalla, Bikash^a | Shukla, Alok Kumar^b | Tripathi, Diwakar^c | Singh, Koushlendra Kumar^a | Ramachandra Reddy, B.^{a; *}

Affiliations: [a] Department of CSE, National Institute of Technology Jamshedpur, India | [b] Thapar Institute of Engineering & Technology, Patiala, Punjab, India | [c] Indian Institute of Information Technology Sonepat, India

Correspondence: [*] Corresponding author. B. Ramachandra Reddy, Department of CSE, National Institute of Technology Jamshedpur, India. E-mail: [email protected].

Abstract: Software fault prediction, which aims to find and fix probable flaws before they appear in real-world settings, is an essential component of software quality assurance. This article provides a thorough analysis of the use of feature ranking algorithms for successful software failure prediction. In order to choose and prioritise the software metrics or qualities most important to fault prediction models, feature ranking approaches are essential. The proposed focus on applying an ensemble feature ranking algorithm to a specific software fault dataset, addressing the challenge posed by the dataset’s high dimensionality. In this extensive study, we examined the effectiveness of multiple machine learning classifiers on six different software projects: jedit, ivy, prop, xerces, tomcat, and poi, utilising feature selection strategies. In order to evaluate classifier performance under two scenarios—one with the top 10 features and another with the top 15 features—our study sought to determine the most relevant features for each project. SVM consistently performed well across the six datasets, achieving noteworthy results like 98.74% accuracy on “jedit” (top 10 features) and 91.88% on “tomcat” (top 10 features). Random Forest achieving 89.20% accuracy on the top 15 features, on “ivy.” In contrast, NB repeatedly recording the lowest accuracy rates, such as 51.58% on “poi” and 50.45% on “xerces” (the top 15 features). These findings highlight SVM and RF as the top performers, whereas NB was consistently the least successful classifier. The findings suggest that the choice of feature ranking algorithm has a substantial impact on the fault prediction models’ predictive accuracy and effectiveness. When using various ranking systems, the research also analyses the trade-offs between computing complexity and forecast accuracy.

Keywords: Software fault prediction, ensemble techniques, feature ranking, random forests, support vector machine

DOI: 10.3233/JIFS-219431

Journal: Journal of Intelligent & Fuzzy Systems, vol. Pre-press, no. Pre-press, pp. 1-14, 2024

Published: 26 April 2024

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia