[1]万新贵,李玲娟,马可. 分布式数据流聚类算法及其基于Storm的实现[J].计算机技术与发展,2017,27(07):150-155.
 WAN Xin-gui,LI Ling-juan,MA Ke. Distributed Data Stream Clustering Algorithm and Its Implementation with Storm[J].,2017,27(07):150-155.
点击复制

 分布式数据流聚类算法及其基于Storm的实现()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年07期
页码:
150-155
栏目:
应用开发研究
出版日期:
2017-07-10

文章信息/Info

Title:
 Distributed Data Stream Clustering Algorithm and Its Implementation with Storm
文章编号:
1673-629X(2017)07-0150-06
作者:
 万新贵李玲娟马可
 南京邮电大学 计算机学院
Author(s):
 WAN Xin-guiLI Ling-juanMA Ke
关键词:
 数据流聚类分布式质心距离密度网格Storm
Keywords:
 data stream clusteringdistributedcentroid distancedensity gridStorm
分类号:
TP311
文献标志码:
A
摘要:
 为了提高数据流聚类算法的效率,设计并提出了基于质心距离和密度网格的数据流聚类算法-CDD-Stream,并通过对其中网格结构的更新实施了并行化策略,进而设计并提出了一种分布式数据流聚类算法-DCD-Stream(Distributed Centroid Distance D-Stream).该算法分为在线和离线两个部分,在线部分实时接收数据流,利用局部节点和全局节点实现了网格结构更新的并行化,完成了整体网格结构的增量更新;离线部分基于网格结构的更新结果进行全局聚类,并存储网格帧,供用户查询历史簇.充分利用Storm快速实时处理数据流并显著提高数据流挖掘算法性能的优势,设计并实现了基于Storm的DCD-Stream算法实现方案.该方案通过内存数据库Redis和消息中间件Kafka的应用对DCD-Stream算法的拓扑进行了合理部署与实现.对比验证实验结果表明,相对于其他算法,DCD-Stream算法在数据流对象上有相当高的聚类精度和更好的时效性,基于Storm的DCD-Stream算法实现方案是可行且有效的.
Abstract:
 In order to improve the efficiency of data stream clustering algorithm,a data stream clustering algorithm based on centroid distance and density grid (named as CDD-Stream) has been designed and proposed,and a distributed data stream clustering algorithm DCD-Stream (Distributed Centroid Distance D-Stream) has been designed and proposed through adopting the parallelization strategy of updating grids into CDD-Stream algorithm.The algorithm has been divided into on-line part and off-line part.The online part is responsible for receiving data streams in real time and realizing the parallel updating of the grid structures by using local and global nodes.The off-line part finishes global clustering based on the updated results of grids,and stores grid frames which allows user to query the historical clusters.By making full use of Storm’s fast real-time processing of data stream and improving the performance of data stream mining algorithm significantly,a scheme of implementing DCD-Stream algorithm on Storm platform has been designed and implemented.It uses memory database Redis and messaging middleware Kafka to deploy and realize the topology of DCD-Stream algorithm reasonably.The experimental results have shown that compared with other algorithm,DCD-Stream algorithm has considerable clustering quality and better clustering timeliness on data stream objects,and it is practical and effective for implementing DCD-Stream algorithm based on Storm.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(07):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(07):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(07):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(07):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(07):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(07):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(07):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(07):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(07):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(07):47.

更新日期/Last Update: 2017-08-24