[1]肖 梁,韩 璐,魏鹏飞,等.基于 Bagging 集成学习的多集类不平衡学习[J].计算机技术与发展,2021,31(10):1-6.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 001]
 XIAO Liang,HAN Lu,WEI Peng-fei,et al.Bagging Ensemble Learning Based Multiset Class-imbalanced Learning[J].,2021,31(10):1-6.[doi:10. 3969 / j. issn. 1673-629X. 2021. 10. 001]
点击复制

基于 Bagging 集成学习的多集类不平衡学习()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年10期
页码:
1-6
栏目:
大数据分析与挖掘
出版日期:
2021-10-10

文章信息/Info

Title:
Bagging Ensemble Learning Based Multiset Class-imbalanced Learning
文章编号:
1673-629X(2021)10-0001-06
作者:
肖 梁1 韩 璐2 魏鹏飞1 郑鑫浩1 张 上1 吴 飞1
1. 南京邮电大学 自动化学院、人工智能学院,江苏 南京 210003;
2. 南京邮电大学 现代邮政学院,江苏 南京 210003
Author(s):
XIAO Liang1 HAN Lu2 WEI Peng-fei1 ZHENG Xin-hao1 ZHANG Shang1 WU Fei1
1. School of Automation and AI,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;
2. School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
类不平衡学习重采样线性判别分析集成学习多集学习
Keywords:
class-imbalanced learningresamplinglinear discriminant analysisensemble learningmultiset learning
分类号:
TP181
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 10. 001
摘要:
类不平衡分类问题是模式识别与机器学习领域研究的热点问题之一,广泛出现在软件缺陷预测、医疗诊断、目标检测等实际应用场景中。 现有的类不平衡学习算法通常关注如何通过减少多数类样本数量或增加少数类样本数量来平衡数据集,而忽视了类不平衡数据中常存在的噪声样本以及各类样本间分布重叠的问题,导致算法的分类性能有待提升。为解决上述问题,提出基于 Bagging 集成学习的多集类不平衡学习算法。 该算法由基于 Bagging 的多集构建和特征提取与多集融合两个模块构成,基于 Bagging 的多集构建部分通过改进的重采样算法构建多个平衡训练集并去除多数类样本中的噪声样本;特征提取与多集融合部分利用线性判别分析提高样本分离度并融合多个训练集所训练的分类器的预测结果。 实验结果表明,该方法具有良好的类不平衡分类性能。
Abstract:
Imbalanced data classification is one of the hot research problems in the field of pattern recognition and machine learning,which widely occurs in software defect prediction,medical diagnosis,object detection and other real-world applications. The existing class-imbalanced learning algorithms usually focus on how to balance the dataset by reducing majority-class samples or increasing minority-class samples,while ignoring the problems of noise samples and distribution overlap among samples from different classes in class-imbalanced dataset,which leads to the classification performance still needs to be improved. To solve the problems above,we present a multiset class imbalanced learning algorithm based on Bagging ensemble learning, which is composed of two modules: Bagging - based multiset construction,feature extraction and multiset fusion.? ?The Bagging-based multiset construction part constructs multiple balanced training sets and removes noise samples from the majority class through an improved resampling technique; feature extraction and multiset fusion part utilizes linear discriminant analysis to improve the separation of samples from different classes and fuses the prediction results of classifiers trained by multiple training sets. The experiment shows that the proposed method has better class - imbalanced data classification performance.

相似文献/References:

[1]江兆尧 路游 贾广忠 亓永刚 徐振.基于OpenGL的光线投射算法的研究[J].计算机技术与发展,2010,(03):218.
 JIANG Zhao-yao,LU You,JIA Guang-zhong,et al.Research on Ray Casting Algorithm Based on OpenGL[J].,2010,(10):218.

更新日期/Last Update: 2021-10-10