[1]职为梅 范明.样本大小对稀有类分类的影响[J].计算机技术与发展,2011,(05):9-12.
 ZHI Wei-mei,FAN Ming.Impact of Sample Size for Rare-Class Classification[J].,2011,(05):9-12.
点击复制

样本大小对稀有类分类的影响()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年05期
页码:
9-12
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Impact of Sample Size for Rare-Class Classification
文章编号:
1673-629X(2011)05-0009-04
作者:
职为梅 范明
郑州大学信息工程学院
Author(s):
ZHI Wei-meiFAN Ming
College of Information Engineering , Zhengzhou University
关键词:
分类稀有类主成分分析样本大小
Keywords:
classification rare class principal component analysis sample size
分类号:
TP311
文献标志码:
A
摘要:
分类稀有类在现实生活中的很多领域都有广泛的应用,但普通的分类算法在分类稀有类时往往失效。探讨了影响稀有类分类的各个因素,针对影响稀有类中的一个因素,样本大小对稀有类的影响进行了研究。对于UCI学习库中的三个数据集,在weka平台上使用Rotation Forest进行实验,对于相同的类比率,使用unsupervised resample数据预处理方法使样本由小变大。结果表明在特定的类比率下,使样本变大,由数据的不平衡分布造成的分类错误下降,普通的分类算法在分类稀有类时往往也可以取得很好的分类结果
Abstract:
The classification of rarely occurring cases is widely used in many real life applications. Most classifiers, which assume a relatively balanced distribution, lose efficacy. Discuss the factors that influence the modeling of a capable classifier in identifying rare events, especially for the factor of sample size. The experiment study using rotation forest carried on 3 datasets from UCI machine learning repository based on weak shows that,in particular imbalance ratio, increases the size of training set using unsupervised resample the large error rate caused by the imbalanced class distribution decreases. The common classification algorithm can reach good effect

相似文献/References:

[1]刘利 何先平 袁文亮.股票趋势预测中Wrapper方法的研究与应用[J].计算机技术与发展,2010,(01):209.
 LIU Li,HE Xian-ping,YUAN Wen-liang.Research and Application of Wrapper Approach to Stock Trend Prediction[J].,2010,(05):209.
[2]陈锦禾 沈洁.基于信息熵的主动学习半监督分类研究[J].计算机技术与发展,2010,(02):110.
 CHEN Jin-he,SHEN Jie.Active Learning Based on Information Entropy for Semi- supervised Classification[J].,2010,(05):110.
[3]吕秀江 王鹏翔 王德元.基于ART2神经网络算法改进的研究[J].计算机技术与发展,2009,(05):137.
 LU Xiu-jiang,WANG Peng-xiang,WANG De-yuan.Research of an Improved Algorithm Based on ART2 Neural Network[J].,2009,(05):137.
[4]范新 沈闻 丁泉勋 沈洁.基于正例和未标文档的半监督分类研究[J].计算机技术与发展,2009,(06):58.
 FAN Xin,SHEN Wen,DING Quan-xun,et al.Research on Semi- Supervised Classification Based on Positive and Unlabeled Text Document[J].,2009,(05):58.
[5]狄明明 孙德山.聚类分析和支持向量机在股票研究中的应用[J].计算机技术与发展,2009,(06):229.
 DI Ming-ming,SUN De-shan.Applications of Cluster Analysis and Support Vector Machines to Stock Research[J].,2009,(05):229.
[6]汪世义.基于优化支持向量机的网络入侵检测技术研究[J].计算机技术与发展,2009,(07):177.
 WANG Shi-yi.Network Intrusion Detection Based on Improved Support Vector Machine[J].,2009,(05):177.
[7]邱涛 李雯.决策树算法在智能导学系统中的应用[J].计算机技术与发展,2009,(12):189.
 QIU Tao,LI Wen.The Application of Decision Tree Algorithm in Intelligence Teaching System[J].,2009,(05):189.
[8]毕俊蕾 任新会 郭拯危.无线传感器网络路由协议分类研究[J].计算机技术与发展,2008,(05):131.
 BI Jun-lei,PEN Xin-hui,GUO Zheng-wei.Research on Routing Protocol Classification for Wireless Sensor Networks[J].,2008,(05):131.
[9]马丹 王翰虎 陈梅 张小平.Deep Web数据源发现与分类模型[J].计算机技术与发展,2010,(07):65.
 MA Dan,WANG Han-hu,CHEN Mei,et al.Discovery and Classification Model for Deep Web Sources[J].,2010,(05):65.
[10]赵晓芳 刘智勇.基于支持向量数据描述的高速公路事件检测[J].计算机技术与发展,2008,(12):248.
 ZHAO Xiao-fang,LIU Zhi-yong.Freeway Traffic Incident Detection Based on Support Vector Data Description[J].,2008,(05):248.
[11]职为梅 范明.稀有类分类问题探讨[J].计算机技术与发展,2010,(07):250.
 ZHI Wei-mei,FAN Ming.Research on Classification of Rare Classes[J].,2010,(05):250.

备注/Memo

备注/Memo:
河南省自然科学基金(0211050100)职为梅(1977-),女,讲师,硕士研究生,CCF会员,从事数据挖掘的研究
更新日期/Last Update: 1900-01-01