Perbandingan Algoritma Klasifikasi untuk Prediksi Cacat Software dengan Pendekatan CRISP-DM
Main Article Content
Abstract
Proses prediksi cacat software merupakan bagian terpenting dalam sebuah pengujian kuliatas software sering juga disebut dengan software quality yang bertujuan untuk mengetahui mutu software dalam pemenuhan kebutuhan fungsional dan kinerjanya. Metode machine learning mempunyai kinerja lebih baik untuk menemukan cacat software daripada metode manual. Algoritma klasifikasi dalam machine learning yang pernah digunakan untuk prediksi cacat software antara lain k-Nearest Neighbor (k-NN), Naïve Bayes (NB) dan Decision Tree (CART). Dalam penelitian ini akan dibandingkan kinerja antara algoritma - algoritma klasifikiasi yaitu k-NN, NB, dan CART untuk prediksi cacat software dengan pendekatan CRISP-DM. CRISP-DM merupakan model proses data mining dengan 6 tahapan yaitu: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, dan Deployment dalam menentukan perbandingan algoritma klasifikasi dalam memprediksi cacat software. Software Matrix yang digunakan pada penelitian ini adalah tujuh dataset dari NASA MDP. Hasil penelitian menunjukkan bahwa nilai rata-rata akurasi algoritma CART lebih baik daripada algoritma k-NN dan NB dengan nilai 0,867. Sedangkan nilai rata-rata akurasi algoritma k-NN dan NB masing-masing 0,859 dan 0,778.
Article Details
References
T. M. Khoshgoftaar and E. B. Allen, “A practical classification-rule for software-quality models,†IEEE Trans. Reliab., vol. 49, no. 2, pp. 209–216, 2000, doi: 10.1109/24.877340.
I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,†Inf. Softw. Technol., vol. 58, pp. 388–402, 2015, doi: 10.1016/j.infsof.2014.07.005.
K. Gao, T. M. Khoshgoftaar, and R. Wald, “Combining Feature Selection and Ensemble Learning for Software Quality Estimation,†in The Twenty-Seventh International Flairs Conference, 2014, pp. 47–52.
Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A General Software Defect-Proneness Prediction Framework,†IEEE Trans. Softw. Eng., vol. 37, no. 3, pp. 356–370, 2010.
J. Suntoro, F. W. Christanto, and H. Indriyawati, “Software Defect Prediction Using AWEIG + ADACOST Bayesian Algorithm for Handling High Dimensional Data and Class Imbalanced Problem,†Int. J. Inf. Technol. Bus., vol. 1, no. 1, pp. 36–41, 2018.
J. D. Strate and P. A. Laplante, “A Literature Review of Research in Software Defect Reporting,†IEEE Trans. Reliab., vol. 62, no. 2, pp. 444–454, 2013.
C. Jones and O. Bonsignour, The Economics of Software Quality. Boston: Pearson Education, Inc., 2012.
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings,†IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, 2008.
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, “Defect prediction from static code features: current results, limitations, new approaches,†Autom. Softw. Eng., vol. 17, no. 4, pp. 375–407, 2010.
Ö. F. Arar and K. Ayan, “Software defect prediction using cost-sensitive neural network,†Appl. Soft Comput., vol. 33, pp. 263–277, 2015, doi: 10.1016/j.asoc.2015.04.045.
B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,†Empir. Softw. Eng., vol. 14, no. 5, pp. 540–578, 2009, doi: 10.1007/s10664-008-9103-7.
C. Catal and B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem,†Inf. Sci. (Ny)., vol. 179, no. 8, pp. 1040–1058, 2009, doi: 10.1016/j.ins.2008.12.001.
E. Arisholm, L. C. Briand, and E. B. Johannessen, “A systematic and comprehensive investigation of methods to build and evaluate fault prediction models,†J. Syst. Softw., vol. 83, no. 1, pp. 2–17, 2010, doi: 10.1016/j.jss.2009.06.055.
N. Gayatri, S. Nickolas, and A. V. Reddy, “Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions,†2010.
Z. Sun, Q. Song, and X. Zhu, “Using coding-based ensemble learning to improve software defect prediction,†IEEE Trans. Syst. Man, Cybern. Part C (Applications Rev., vol. 42, no. 6, pp. 1806–1817, 2012, doi: 10.1109/TSMCC.2012.2226152.
V. Kotu and B. Deshpande, Predictive Analytics and Data Mining: Concepts and Practice with Rapidminer. Waltham: Morgan Kaufmann Publishers is an imprint of Elsevier, 2015.
H. Liu and S. Zhang, “Noisy data elimination using mutual k-nearest neighbor for classification mining,†J. Syst. Softw., vol. 85, no. 5, pp. 1067–1074, 2012, doi: 10.1016/j.jss.2011.12.019.
J. Suntoro, Data Mining Algoritma dan Implementasi Menggunakan Bahasa Pemrograman PHP. Jakarta: Elex Media Komputindo, 2019.
J. Suntoro and C. N. Indah, “Average Weight Information Gain Untuk Menangani Data Berdimensi,†J. Buana Inform., vol. 8, no. 3, pp. 131–140, 2017.
R. S. Wahono, N. Suryana, and S. Ahmad, “Metaheuristic Optimization based Feature Selection for Software Defect Prediction,†J. Softw., vol. 9, no. 5, pp. 1324–1333, 2014, doi: 10.4304/jsw.9.5.1324-1333.
R. S. Wahono, N. S. Herman, and S. Ahmad, “A Comparison Framework of Classification Models for Software Defect Prediction,†Adv. Sci. Lett., vol. 20, no. 10–11, pp. 1945–1950, 2014, doi: 10.1166/asl.2014.5640.
C. Dawson, Projects in Computing and Information Systems. Canada: Pearson Education, 2009.
B. Lantz, Machine Learning with R. Birmingham: Packt Publishing Ltd, 2013.
A. A. Aburomman and M. B. I. Reaz, “A novel SVM-kNN-PSO ensemble method for intrusion detection system,†Appl. Soft Comput., vol. 38, pp. 360–372, 2016, doi: 10.1016/j.asoc.2015.10.011.
J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques Third Edition. Waltham: Morgan Kaufmann Publishers is an imprint of Elsevier, 2012.
M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms Second Edition. Canada: Wiley-IEEE Press, 2011.
D. Ryu and J. Baik, “Effective multi-objective naïve Bayes learning for cross-project defect prediction,†Appl. Soft Comput., vol. 49, pp. 1062–1077, 2016, doi: 10.1016/j.asoc.2016.04.009.
J. Suntoro, A. Ilham, and H. A. D. Rani, “New Method Based Pre-Processing to Tackle Missing and High Dimensional Data of CRISP-DM Approach,†J. Phys. Conf. Ser., vol. 1471, no. 1, 2020.
M. North, Data Mining for the Masses. Global Text Project, 2012.
J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,†J. Mach. Learn. Res., vol. 7, pp. 1–30, 2006, doi: 10.1016/j.jecp.2010.03.005.
H.-L. Dai, “Class imbalance learning via a fuzzy total margin based support vector machine,†Appl. Soft Comput., vol. 31, pp. 172–184, 2015, doi: 10.1016/j.asoc.2015.02.025.