[1]王琦,周鹏,张燕平.基于多层次集成学习的流特征在线稳定选择算法[J].计算机技术与发展,2025,(03):1-8.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0346]
 WANG Qi,ZHOU Peng,ZHANG Yan-ping.Online Stable Streaming Feature Selection Algorithm Using Multi-level Ensemble Learning[J].,2025,(03):1-8.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0346]
点击复制

基于多层次集成学习的流特征在线稳定选择算法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年03期
页码:
1-8
栏目:
大数据与云计算
出版日期:
2025-03-10

文章信息/Info

Title:
Online Stable Streaming Feature Selection Algorithm Using Multi-level Ensemble Learning
文章编号:
1673-629X(2025)03-0001-08
作者:
王琦周鹏张燕平
安徽大学 计算机科学与技术学院,安徽 合肥 230601
Author(s):
WANG QiZHOU PengZHANG Yan-ping
School of Computer Science and Technology,Anhui University,Hefei 230601,China
关键词:
特征选择流特征稳定性集成学习极限学习机
Keywords:
feature selectionstreaming featurestabilityensemble learningextreme learning machine
分类号:
TP311.13
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0346
摘要:
特征选择是数据挖掘预处理阶段中的重要组成部分,旨在从原始数据集中选择出最相关的特征子集。 传统的特征选择方法假设数据集是静态不变的。 然而,在实际应用中,数据可能是动态生成并被处理的。 为此,针对特征以流的方式逐个生成的在线流特征选择方法应运而生。 目前,大多数研究者所提出的在线流特征选择方法主要关注可扩展性、高准确性和低时间开销,而忽视了算法的稳定性。 稳定的特征选择结果才能有效增强用户对算法的可信度,使其具备实用价值。针对在线特征选择算法的稳定性问题,基于多层次集成学习策略,提出了一种新的流特征在线稳定选择算法框架(Multi-level Ensemble Learning Stream Feature Selection,MESFS)。 具体来说,在数据集层面采用极限学习机(Extreme Learning Machine,ELM)对样本进行分组和映射来提高算法的准确性;在特征选择层面通过多次迭代和自适应调整阈值的策略对特征进行权重计算和选择,以减少特征选择结果的波动性和随机性。 选取 4 种传统静态特征选择算法和 5 种先进的在线流特征选择算法,在 UCI、ARFF 以及 NIPS 等 12 个公开数据集上进行了大量实验对比,结果表明该方法可以在训练数据扰动下取得优秀的预测精度和稳定性平衡。
Abstract:
Feature selection is an essential part of the preprocessing phase of data mining,aiming to select the most relevant subset of features from the original data set. Traditional feature selection methods assume that the data set is static and unchanging. However,in real applications,data may be dynamically generated and processed. For this reason,online streaming feature selection methods emerged that generate features one by one in a streaming manner. Currently,most of the online stream feature selection methods proposed by re-searchers mainly focus on scalability, high accuracy, and low time overhead while ignoring the algorithm ’s stability. Stable feature selection results can effectively enhance users’ credibility in the algorithm and make it of practical value. Aiming at the stability problem of the online feature selection algorithm, a new online stable stream feature selection algorithm framework ( Multi - level Ensemble Learning Stream Feature Selection,MESFS) is proposed based on the multi-level ensemble learning strategy. Specifically,at the data set level,Extreme Learning Machine (ELM) is used to group and map samples to improve the accuracy of the algorithm. At the feature selection level,multiple iterations and adaptive threshold adjustment strategies are used to calculate the weight of features and selection to reduce the volatility and randomness of feature selection results. Four traditional static feature selection algorithms and five advanced online flow feature selection algorithms were selected,and many experimental comparisons were conducted on public data sets such as UCI,ARFF,and NIPS. The results show that the proposed method can perform excellently under training data disturbance—the balance between prediction accuracy and stability.

相似文献/References:

[1]刘利 何先平 袁文亮.股票趋势预测中Wrapper方法的研究与应用[J].计算机技术与发展,2010,(01):209.
 LIU Li,HE Xian-ping,YUAN Wen-liang.Research and Application of Wrapper Approach to Stock Trend Prediction[J].,2010,(03):209.
[2]黄炜 黄志华.一种基于遗传算法和SVM的特征选择[J].计算机技术与发展,2010,(06):21.
 HUANG Wei,HUANG Zhi-hua.Feature Selection Based on Genetic Algorithm and SVM[J].,2010,(03):21.
[3]张家柏 王小玲.基于聚类和二进制PSO的特征选择[J].计算机技术与发展,2010,(06):25.
 ZHANG Jia-bai,WANG Xiao-ling.A Novel Algorithm Based on K-Means Clustering and Binary Particle Swarm Optimization[J].,2010,(03):25.
[4]冯甲策 叶明 王惠文.基于Gram—Schmidt过程的支持向量机降维方法[J].计算机技术与发展,2009,(11):7.
 FENG Jia-ce,YE Ming,WANG Hui-wen.Dimension Reduction Method of Support Vector Machine Based on Gram- Schmidt Process[J].,2009,(03):7.
[5]林伟 柳荣其 徐熙.邮件过滤中一种改进的特征选择方法研究[J].计算机技术与发展,2009,(01):84.
 LIN Wei,LIU Rong-qi,XU Xi.Improvement of Feature Selection Algorithm in Spam Filtering[J].,2009,(03):84.
[6]刘毅 张月琳.基于Agent的邮件过滤与个性化分类系统设计[J].计算机技术与发展,2009,(02):66.
 LIU Yi,ZHANG Yue-lin.Design of a Mail Filter and Personalized Classification System Based on Agent[J].,2009,(03):66.
[7]陈素萍 谢丽聪.一种文本特征选择方法的研究[J].计算机技术与发展,2009,(02):112.
 CHEN Su-ping,XIE Li-cong.Research on Document Feature Selection[J].,2009,(03):112.
[8]段震 王倩倩 张燕平 张铃.覆盖算法下文本分类特征选择的研究[J].计算机技术与发展,2008,(11):29.
 DUAN Zhen,WANG Qian-qian,ZHANG Yan-ping,et al.Study on Feature Selection of Text Classification in Cross Cover Algorithm[J].,2008,(03):29.
[9]王希雷.基于Rough集理论的车牌汉字特征提取[J].计算机技术与发展,2007,(06):26.
 WANG Xi-lei.Car Plate Chinese Character Feature Extraction Based on Rough Set Theory[J].,2007,(03):26.
[10]董梅 胡学钢.基于多特征选择的中文文本分类[J].计算机技术与发展,2007,(07):117.
 DONG Mei,HU Xue-gang.Text Categorization Based on Multiple Features Selection[J].,2007,(03):117.

更新日期/Last Update: 2025-03-10