基于多核集成学习的跨项目软件缺陷预测-《计算机技术与发展》

文章信息/Info

Title:: Cross-project Software Defect Prediction Based on Multiple Kernel Ensemble Learning

文章编号:: 1673-629X(2019)06-0027-05

作者:: 黄琳; 荆晓远; 董西伟; 南京邮电大学自动化学院,江苏南京 210003

Author(s):: HUANG Lin; JING Xiao-yuan; DONG Xi-wei; School of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

关键词:: 跨项目缺陷预测; 多核学习; 集成学习; 代价敏感学习; 有监督学习

Keywords:: cross - project software defect prediction; multiple kernel learning; ensemble learning; cost - sensitive learning; supervised learning

分类号:: TP181

DOI:: 10. 3969 / j. issn. 1673-629X. 2019. 06. 006

摘要:: 软件缺陷预测的目的是通过历史缺陷数据预测新软件模块的缺陷倾向性,从而提高软件系统的质量。软件的缺陷模块存在结构复杂和类别分布不平衡的问题,并且历史数据是有限的。针对这些问题,提出了一种多核集成学习的跨项目软件缺陷预测方法。跨项目软件缺陷预测是解决项目初期缺陷预测缺乏数据集的有效途径。多核学习方法能够将不同特性的核函数进行组合,使数据在新的特征空间中得到更好的表达,提高预测精度。集成学习方法能够解决类别分布不平衡问题。考虑到在软件缺陷预测中将有缺陷模块预测为无缺陷模块的风险远远大于将无缺陷模块预测为有缺陷模块,在计算误差时引入了代价敏感矩阵。使用 NASA 和 AEEEM 这两个数据库来评估所有比较方法的性能,实验结果表明,提出的算法能够达到很好的效果。

Abstract:: Software defect prediction aims to predict the defect proneness of new software modules with the historical defect data so as to improve the quality of software system. The defect modules of software have complex structure and unbalanced category distribution with limited historical data. In order to solve these problems,we propose a cross-project software defect prediction method based on multiple kernel ensemble learning. Cross-project software defect prediction is an effective way to solve the lack of datasets in the initial project defect prediction. Multiple kernel learning can combine kernel functions with different characteristics to make the data better expressed in the new feature space and improve the prediction accuracy. Ensemble learning can solve the problem of category distribution imbalance. Considering that the risk of predicting a defective module as a defect-free module in software defect prediction is far greater than predicting a defect-free module as a defective module,a cost-sensitive matrix is introduced in the calculation of the error. The NASA and AEEEM datasets as test data are used to evaluate the performance of all comparison methods. The experiment shows that the proposed algorithm is efficient.

相似文献/References:

[1]江伟,潘昊.基于优化的多核学习方法的Web文本分类的研究[J].计算机技术与发展,2013,(10):80.
　JIANG Wei[],PAN Hao[].Research of Web Document Classification Based on Optimized Multiple Kernel Learning Method[J].,2013,(06):80.
[2]李勇,刘战东,张海军.跨项目软件缺陷预测方法研究综述[J].计算机技术与发展,2020,30(03):98.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 019]
　LI Yong,LIU Zhan-dong,ZHANG Hai-jun.Review on Cross-project Software Defects Prediction Methods[J].,2020,30(06):98.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 019]

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

相似文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics