[1]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83-86.
 SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(11):83-86.
点击复制

 基于Hadoop平台的C4.5算法的分析与研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年11期
页码:
83-86
栏目:
智能、算法、系统工程
出版日期:
2014-11-10

文章信息/Info

Title:
 Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform
文章编号:
1673-629X(2014)11-0083-04
作者:
 孙媛黄刚
 南京邮电大学 计算机学院
Author(s):
 SUN YuanHUANG Gang
关键词:
 HadoopMapReduce数据挖掘C4.5算法
Keywords:
 HadoopMapReducedata miningC4. 5 algorithm
分类号:
TP301.6
文献标志码:
A
摘要:
 如何能从海量数据中以更快速、高效、低成本的方式挖掘出有价值的信息成为如今数据挖掘技术面临的新课题。文中在研究Hadoop平台的特征和决策树的C4.5算法的过程中,决定在决策树算法领域中引入云计算思维,实现其在Ha-doop平台上的并行化,并且采用MapReduce模型来解决海量数据挖掘问题。最后用打高尔夫球的数据集对新的算法进行验证。实验结果表明对海量数据,基于Hadoop平台的决策树算法可以明显提高数据挖掘的效率,具有可观的高效性和可扩展性,在一定程度上解决了C4.5算法在处理海量数据时计算量大、构建决策树时间长的问题。
Abstract:
 How can dig out the valuable information from the vast amount of data in a more rapid,efficient and low-cost way now be-come a new task faced by the data mining technology. In this paper,in the study of the characteristics of the Hadoop platform and the process of decision tree C4. 5 algorithm,decide to introduce the cloud computing thinking to the field of decision tree algorithm,achieve its parallelization on Hadoop platform and use MapReduce model to solve the problem of massive data mining. Finally with using a round of golf data sets to verify this new algorithm,the results of the experiments show that for the huge amounts of data,the decision tree algo-rithm based on Hadoop platform can significantly improve the efficiency of data mining. It has a good efficiency and scalability. In a cer-tain extent,it also solves the problems of computing huge amounts of data and building the decision tree taking long time that C4. 5 algo-rithm faced when dealing with large amount of calculation.

相似文献/References:

[1]李远方 邓世昆 闻玉彪 韩月阳.Hadoop-MapReduce下的PageRank矩阵分块算法[J].计算机技术与发展,2011,(08):6.
 LI Yuan-fang,DENG Shi-kun,WEN Yu-biao,et al.PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce[J].,2011,(11):6.
[2]李远方 贾时银 邓世昆 韩月阳.基于树结构的MapReduce模型[J].计算机技术与发展,2011,(08):149.
 LI Yuan-fang,JIA Shi-yin,DENG Shi-kun,et al.MapReduce Model Based on Tree Structure[J].,2011,(11):149.
[3]王梅,朱信忠,赵建民,等.基于 Hadoop 的海量图像检索系统[J].计算机技术与发展,2013,(01):204.
 WANG Mei,ZHU Xin-zhong,ZHAO Jian-min,et al.Massive Images Retrieval System Based on Hadoop[J].,2013,(11):204.
[4]王晓军,孙惠.基于MapReduce的多路连接优化方法研究[J].计算机技术与发展,2013,(06):59.
 WANG Xiao-jun,SUN Hui.Research of Optimizing Multiway Joins Based on MapReduce[J].,2013,(11):59.
[5]朱贤军,李敬兆.无加密模式下对云数据的隐私保密[J].计算机技术与发展,2013,(06):216.
 ZHU Xian-jun,LI Jing-zhao.Cloud Data Privacy under None Encryption[J].,2013,(11):216.
[6]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
 ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(11):18.
[7]吕婉琪,钟诚,唐印浒,等.Hadoop分布式架构下大数据集的并行挖掘[J].计算机技术与发展,2014,24(01):22.
 L Wan-qi,ZHONG Cheng,TANG Yin-hu,et al.Parallel Mining of Large Dataset in Hadoop Distributed Computing Framework[J].,2014,24(11):22.
[8]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(11):1.
[9]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(11):5.
[10]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(11):13.
[11]王晓军,邹亮亮. Hadoop迭代优化技术的研究[J].计算机技术与发展,2014,24(09):98.
 WANG Xiao-jun,ZOU Liang-liang. Research on Optimizing Iterative Technology of Hadoop[J].,2014,24(11):98.
[12]徐源吾[][],王珣[][]. 基于Hadoop的智能家居信息处理平台[J].计算机技术与发展,2014,24(09):183.
 XU Yuan-wu[] [],WANG Xun[][]. nformation Processing Platform of Smart Home Based on Hadoop[J].,2014,24(11):183.
[13]王全民,苗雨,何明,等. 基于矩阵分解的协同过滤算法的并行化研究[J].计算机技术与发展,2015,25(02):55.
 ANG Quan-min,MIAO Yu,HE Ming,et al. Parallelized Research on Collaborative Filtering Algorithm Based on Matrix Factorization[J].,2015,25(11):55.
[14]方木云,刘洪彬,谢恩文. Hadoop下基于边聚类的重叠社区发现算法研究[J].计算机技术与发展,2015,25(03):58.
 FANG Mu-yun,LIU Hong-bin,XIE En-wen. Research on Overlapping Communities Detecting Algorithm Using Hadoop Based on Edge Clustering[J].,2015,25(11):58.
[15]秦军[],童毅[],戴新华[],等. 基于MapReduce数据密集型负载调度策略研究[J].计算机技术与发展,2015,25(04):48.
 QIN Jun[],TONG Yi[],DAI Xin-hua[],et al. Research on Scheduling Strategy of Data Intensive Workloads Based on MapReduce[J].,2015,25(11):48.
[16]徐新瑞,孟彩霞,周雯,等. 一种基于Spark时效化协同过滤推荐算法[J].计算机技术与发展,2015,25(06):48.
 XU Xin-rui,MENG Cai-xia,ZHOU Wen,et al. A Real-time Collaborative Filtering Recommendation Algorithm Based on Spark[J].,2015,25(11):48.
[17]李晨,杨子江,朱世伟,等. 基于Hadoop的网络舆情监控平台设计与实现[J].计算机技术与发展,2016,26(02):144.
 LI Chen,YANG Zi-jiang,ZHU Shi-wei,et al. Design and Implementation of Network Consensus Monitoring System Based on Hadoop[J].,2016,26(11):144.
[18]马腾腾[],朱庆华[],曹菡[],等. 基于Hadoop的旅游景点推荐的算法实现与应用[J].计算机技术与发展,2016,26(03):47.
 MA Teng-teng[],ZHU Qing-hua[],CAO Han[],et al. Implementation and Application of Algorithm of Tourist Attractions Recommendation Based on Hadoop[J].,2016,26(11):47.
[19]李正杰,黄刚. 基于Hadoop平台的SVM KNN分类算法的研究[J].计算机技术与发展,2016,26(03):75.
 LI Zheng-jie,HUANG Gang. Research on SVM KNN Classification Algorithm Based on Hadoop Platform[J].,2016,26(11):75.
[20]王凤领. 基于Hadoop高校教育资源云存储平台构建研究[J].计算机技术与发展,2016,26(03):176.
 WANG Feng-ling. Study on Construction of Cloud Storage Platform for College Education Resources Based on Hadoop[J].,2016,26(11):176.

更新日期/Last Update: 2015-04-13