[1]李正杰,黄刚. 基于Hadoop平台的SVM KNN分类算法的研究[J].计算机技术与发展,2016,26(03):75-79.
 LI Zheng-jie,HUANG Gang. Research on SVM KNN Classification Algorithm Based on Hadoop Platform[J].,2016,26(03):75-79.
点击复制

 基于Hadoop平台的SVM KNN分类算法的研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年03期
页码:
75-79
栏目:
智能、算法、系统工程
出版日期:
2016-03-10

文章信息/Info

Title:
 Research on SVM KNN Classification Algorithm Based on Hadoop Platform
文章编号:
1673-629X(2016)03-0075-05
作者:
 李正杰黄刚
 南京邮电大学 计算机学院
Author(s):
 LI Zheng-jieHUANG Gang
关键词:
 数据挖掘Hadoop并行化SVM KNN
Keywords:
 data miningHadoopparallelizationSVM KNN
分类号:
TP301.6
文献标志码:
A
摘要:
 数据的变革带来了前所未有的发展,对丰富且复杂的结构化、半结构化或者是非结构化数据的监测、分析、采集、存储以及应用,已经成为了数据信息时代发展的主流,分类和处理海量数据包含的信息,需要有更好的解决方法。传统的数据挖掘分类方式显然已经不能满足需求,面对这些问题,这里对数据挖掘的一些分类算法进行分析和改进,对算法进行结合,提出了改进的SVM KNN分类算法。在这个基础上,利用Hadoop云计算平台,将研究后的分类算法在MapReduce模型中进行并行化应用,使改进后的算法能够适用于大数据的处理。最后用数据集对算法进行实验验证,通过对比传统的SVM分类算法,结果表明改进后的算法达到了高效、快速、准确、低成本的要求,可以有效地进行大数据分类工作。
Abstract:
 The reform of data has brought the unprecedented development,to monitor,analyze,collect,store and apply to the rich and complex structured,semi-structured or unstructured data has become the mainstream of the development of the information age. To classi-fy and deal with the information contained in mass data,it’ s needed to have a better solution. The traditional data mining classification method cannot meet the demand any longer. To face these problems,it analyzes and improves the classification algorithm in data mining in this paper. Combined with the algorithms,an improved SVM KNN classification algorithm is proposed. Then on this basis,by utilizing Hadoop cloud computing platform,the new classification algorithm is put into MapReduce model for parallelization application,so the im-proved algorithm can be applied to large data processing. Finally,data set is used to conduct experimental verification on the algorithm. By comparing with traditional SVM classification algorithm,the results show that the improved algorithm has become more efficient,fast, accurate and cost-effective,which can effectively carry out large data classification.

相似文献/References:

[1]项响琴 汪彩梅.基于聚类高维空间算法的离群数据挖掘技术研究[J].计算机技术与发展,2010,(01):120.
 XIANG Xiang-qin,WANG Cai-mei.Study of Outlier Data Mining Based on CLIQUE Algorithm[J].,2010,(03):120.
[2]李雷 丁亚丽 罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
 LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(03):143.
[3]吉同路 柏永飞 王立松.住宅与房地产电子政务中数据挖掘的应用研究[J].计算机技术与发展,2010,(01):235.
 JI Tong-lu,BAI Yong-fei,WANG Li-song.Study and Application of Data Mining in E-government of House and Real Estate Industry[J].,2010,(03):235.
[4]杨静 张楠男 李建 刘延明 梁美红.决策树算法的研究与应用[J].计算机技术与发展,2010,(02):114.
 YANG Jing,ZHANG Nan-nan,LI Jian,et al.Research and Application of Decision Tree Algorithm[J].,2010,(03):114.
[5]赵裕啸 倪志伟 王园园 伍章俊.SQL Server 2005数据挖掘技术在证券客户忠诚度的应用[J].计算机技术与发展,2010,(02):229.
 ZHAO Yu-xiao,NI Zhi-wei,WANG Yuan-yuan,et al.Application of Data Mining Technology of SQL Server 2005 in Customer Loyalty Model in Securities Industry[J].,2010,(03):229.
[6]张笑达 徐立臻.一种改进的基于矩阵的频繁项集挖掘算法[J].计算机技术与发展,2010,(04):93.
 ZHANG Xiao-da,XU Li-zhen.An Advanced Frequent Itemsets Mining Algorithm Based on Matrix[J].,2010,(03):93.
[7]王爱平 王占凤 陶嗣干 燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
 WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(03):105.
[8]张广路 雷景生 吴兴惠.一种改进的Apriori关联规则挖掘算法(英文)[J].计算机技术与发展,2010,(06):84.
 ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(03):84.
[9]吴楠 胡学钢.基于聚类分区的序列模式挖掘算法研究[J].计算机技术与发展,2010,(06):109.
 WU Nan,HU Xue-gang.Research on Clustering Partition-Based Approach of Sequential Pattern Mining[J].,2010,(03):109.
[10]吴青 傅秀芬.水平分布数据库的正负关联规则挖掘[J].计算机技术与发展,2010,(06):113.
 WU Qing,FU Xiu-fen.Positive and Negative Association Rules Mining on Horizontally Partitioned Database[J].,2010,(03):113.
[11]李蓉,周维柏. 基于多特征选取和类完全加权的入侵检测[J].计算机技术与发展,2014,24(07):145.
 LI Rong,ZHOU Wei-bai. Intrusion Detection Based on Multiple Feature Selection and Class Fully Weighted [J].,2014,24(03):145.
[12]占美星[],杨颖[],杨磊[]. 基于树结构多重最小支持度的挖掘算法研究[J].计算机技术与发展,2014,24(08):45.
 ZHAN Mei-xing[],YANG Ying[],YANG Lei[]. Study on Mining Algorithm Based on Tree Structure Multiple Minimum Supports[J].,2014,24(03):45.
[13]于海平[],林晓丽[],刘会超[]. 基于数据挖掘的移动广告个性化推荐研究[J].计算机技术与发展,2014,24(08):234.
 YU Hai-ping[],LIN Xiao-li[],LIU Hui-chao[]. Research of Mobile Internet Advertising Personalized Recommendation Based on Data Mining[J].,2014,24(03):234.
[14]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83.
 SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(03):83.
[15]牛永洁,薛苏琴. 基于PDFBox抽取学术论文信息的实现[J].计算机技术与发展,2014,24(12):61.
 NIU Yong-jie,XUE Su-qin. Realization of Extraction of Academic Papers Information Based on PDFBox[J].,2014,24(03):61.
[16]郑超,高茂庭,吴爱华. 基于RFID及其路径约束的生产检查流程控制[J].计算机技术与发展,2015,25(02):225.
 ZHENG Chao,GAO Mao-ting,WU Ai-hua. Production Testing Process Control Based on RFID with Path Constraint[J].,2015,25(03):225.
[17]顾伟[][],傅德胜[][],蔡玮[]. 基于命题逻辑的关联规则挖掘算法[J].计算机技术与发展,2015,25(03):91.
 GU Wei[][],FU De-sheng[][],CAI Wei[]. Association Rules Mining Algorithm Based on Propositional Logic[J].,2015,25(03):91.
[18]陈运文,吴飞,吴庐山,等. 基于异常检测的时间序列研究[J].计算机技术与发展,2015,25(04):166.
 CHEN Yun-wen,WU Fei,WU Lu-shan,et al. Research on Time Series Based on Anomaly Detection[J].,2015,25(03):166.
[19]王晓鹏,武彤. 生产质量控制数据仓库模型设计与实现[J].计算机技术与发展,2015,25(06):181.
 WANG Xiao-peng,WU Tong. Design and Realization of Data Warehouse Model on Production Quality Control[J].,2015,25(03):181.
[20]王玉雷,李玲娟. 一种密度和划分结合的聚类算法[J].计算机技术与发展,2015,25(09):53.
 WANG Yu-le,LI Ling-juan. A Clustering Algorithm of Combination of Density and Division[J].,2015,25(03):53.

更新日期/Last Update: 2016-06-12