[1]张发扬,李玲娟,陈煜. VFDT算法基于Storm平台的实现方案[J].计算机技术与发展,2016,26(09):192-196.
 ZHANG Fa-yang,LI Ling-juan,CHEN Yu. Implementation Scheme of VFDT Algorithm on Storm Platform[J].,2016,26(09):192-196.
点击复制

 VFDT算法基于Storm平台的实现方案()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年09期
页码:
192-196
栏目:
智能、算法、系统工程
出版日期:
2016-09-10

文章信息/Info

Title:
 Implementation Scheme of VFDT Algorithm on Storm Platform
文章编号:
1673-629X(2016)09-0192-05
作者:
 张发扬李玲娟陈煜
 南京邮电大学 计算机学院
Author(s):
 ZHANG Fa-yangLI Ling-juanCHEN Yu
关键词:
 流数据快速决策树算法分布式并行化Storm
Keywords:
 stream dataVery Fast Decision Tree ( VFDT)distributionparallelizationStorm
分类号:
TP311
文献标志码:
A
摘要:
 以提升流数据的分类效率为目标,研究如何在流数据处理平台 Storm上实现快速决策树算法-VFDT。设计了VFDT基于Storm的分布式并行化实现方案,将VFDT算法分为建树、分类和评价共三个模块,建树模块负责决策树的初始化和增量建树,分类模块负责对待分类样本进行分类标记,评价模块负责用已标记的样本对VFDT决策树进行评价。通过正确设计Storm拓扑中的Spout/Bolt实现各模块的功能,通过为分类Bolt设定多个Task来实现分类模块的并行化;用内存数据库Redis实现三个模块的有效衔接和决策树的保存;用消息中间件Kafka来提高算法对流数据突增的容忍度。基于该方案的VFDT算法实现与测试结果表明:在Storm集群环境下的VFDT算法分类效率相对于单机环境有显著提高,而且合理设定分类Bolt的Task数可使分类效率进一步提高。
Abstract:
 In order to improve the classification efficiency of the stream data, studies how to implement VFDT algorithm on Storm, a stream data processing platform. A scheme of distributed parallel implementing of VFDT algorithm based on Storm platform is designed. The VFDT algorithm is divided into three modules including building tree module, classification module and evaluation module. The building tree module is responsible for the initializing and incremental building of decision tree,and the classification module for classif-ying the samples,and the evaluation module for evaluating the VFDT decision tree using the labeled samples. The functions of each mod-ule are realized by correctly designing the Spout/Bolt of Storm Topology,and the parallelization of the classification module is implemen-ted by deploying multiple tasks for Classification Bolt. The memory database Redis is used to realize the effective connection of the three modules and the preservation of the decision tree. The message middleware Kafka is used to improve the tolerance of burst stream data. The results of implementing and testing VFDT algorithm based on the proposed scheme show that the classification efficiency of VFDT algorithm under the Storm cluster environment is significantly improved compared with that under the single machine environment,and the classification efficiency can be further improved by reasonably setting the task number in Classification Bolt.

相似文献/References:

[1]李子杰 郑诚.流数据和传统数据存储及管理方法比较研究[J].计算机技术与发展,2009,(04):101.
 LI Zi-jie,ZHENG Cheng.Comparative Study on Methods of Storage and Management of Stream and Traditional Data[J].,2009,(09):101.
[2]杨颖 陈德华.基于小波神经网络的时间序列流数据的研究[J].计算机技术与发展,2006,(06):193.
 YANG Ying,CHEN De-hua.Research for Model of Time Series Streaming Based on Wavelet Neural Network[J].,2006,(09):193.
[3]蔡妮明 王翰虎 陈梅.一种基于滑动窗口的流数据聚类算法[J].计算机技术与发展,2011,(01):23.
 CAI Ni-ming,WANG Han-hu,CHEN Mei.A New Streaming Data Cluster Algorithm Based on Sliding Window[J].,2011,(09):23.
[4]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(09):1.
[5]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(09):5.
[6]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(09):13.
[7]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(09):21.
[8]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(09):25.
[9]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(09):29.
[10]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(09):34.

更新日期/Last Update: 2016-10-26