[1]马可,李玲娟,孙杜靖. 分布式并行化数据流频繁模式挖掘算法[J].计算机技术与发展,2016,26(07):75-79.
 MA Ke,LI Ling-juan,SUN Du-jing. Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream[J].,2016,26(07):75-79.
点击复制

 分布式并行化数据流频繁模式挖掘算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年07期
页码:
75-79
栏目:
智能、算法、系统工程
出版日期:
2016-07-10

文章信息/Info

Title:
 Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream
文章编号:
1673-629X(2016)07-0075-05
作者:
 马可李玲娟孙杜靖
 南京邮电大学 计算机学院
Author(s):
 MA KeLI Ling-juanSUN Du-jing
关键词:
数据流频繁模式分布式并行化Storm
Keywords:
 data streamfrequent patterndistributed parallelizationStorm
分类号:
TP311
文献标志码:
A
摘要:
 为了提高数据流频繁模式挖掘的效率,文中基于经典的数据流频繁模式挖掘算法FP-Stream和分布式并行计算原理,设计了一种分布式并行化数据流频繁模式挖掘算法—DPFP-Stream ( Distributed Parallel Algorithm of Mining Frequent Pattern on Data Stream)。该算法将建立频繁模式树的任务分为local和global两部分,并设置了参数“当前时间”;将到达的流数据平均分配到多个不同的local节点,各local节点使用FP-Growth算法产生该单位时间内本节点的候选频繁项集,并按照单位时间将候选频繁项集及其支持度计数打包发送至global节点;global节点按“当前时间”合并各local节点的中间结果并更新模式树Pattern-Tree。在分布式数据流计算平台Storm上进行的算法实现和性能测试结果表明,DPFP-Stream算法的计算效率能够随着local节点或local bolt线程的增加而提高,适用于高效挖掘数据流中的频繁模式。
Abstract:
 In order to improve the efficiency of mining frequent pattern on data stream,a Distributed Parallel Algorithm of Mining Fre-quent Pattern on Data Stream,named DPFP-Stream,is designed in this paper based on the ideas of classical FP-Stream and the distribu-ted parallel computing. It divides the task of building frequent pattern tree into two parts:local and global,and introduces a new parameter“current time”. The arrival data will be equally distributed into different local nodes. Then every local node uses FP-Growth algorithm to produce candidate frequent items,and packages them with relevant support count according to unit time,and sends them to the global node. The global node combines the results produced by local nodes according to the“current time” and updates the global Pattern-Tree. The results of implementing DPFP-Stream algorithm and testing its performance on Storm,a distribution data stream computing platform, show that the computing efficiency of DPFP-Stream can increase linearly with the increasing of local nodes or the local bolts,and DPFP-Stream is applicable to effectively mine frequent pattern from data stream.

相似文献/References:

[1]吴众欣 钱德沛 黄泳翔.基于软件管道Actor模型的BPEL流程转化研究[J].计算机技术与发展,2009,(07):4.
 WU Zhong-xin,QIAN De-pei,HUANG Yong-xiang.Research on BPEL Process Conversion Based on Actor Model with Pipeline[J].,2009,(07):4.
[2]段仰广 韦玉科.基于循环十字链表的频繁模式挖掘算法[J].计算机技术与发展,2009,(10):73.
 DUAN Yang-guang,WEI Yu-ke.Algorithm for Mining Frequent Patterns Based on Circular Orthogonal Linked List[J].,2009,(07):73.
[3]朱桂宏 王刚.基于数据流的网络入侵检测研究[J].计算机技术与发展,2009,(03):175.
 ZHU Gui-hong,WANG Gang.Research on Network Intrusion Detection Based on Data Stream[J].,2009,(07):175.
[4]张友志 江伟 江晋剑.一种基于编码的关联规则挖掘算法[J].计算机技术与发展,2008,(12):92.
 ZHANG You-zhi,JIANG Wei,JIANG Jin-jian.An Association Rule Mining Algorithm Based on Code[J].,2008,(07):92.
[5]司开君 毛宇光.一种新的基于数据流的数据模型[J].计算机技术与发展,2007,(01):1.
 SI Kai-jun,MAO Yu-guang.A New Data Model Based on Data Stream[J].,2007,(07):1.
[6]程舒通.Web点击流的频繁模式聚类算法[J].计算机技术与发展,2007,(09):18.
 CHENG Shu-tong.Clustering Algorithm of Web Click Flow Frequency Pattern[J].,2007,(07):18.
[7]史金成 胡学钢.数据流挖掘研究[J].计算机技术与发展,2007,(11):11.
 SHI Jin-cheng,HU Xue-gang.Study on Data Stream Mining[J].,2007,(07):11.
[8]肖裕权 周肆清.基于粒子群优化算法的数据流聚类算法[J].计算机技术与发展,2011,(10):43.
 XIAO Yu-quan,ZHOU Si-qing.Clustering Evolving Data Streams Based on Particle Swarm Optimization[J].,2011,(07):43.
[9]戴翔[],毛宇光[][],吴非[],等. 基于数据流的测试用例自动生成研究[J].计算机技术与发展,2014,24(09):1.
 DAI Xiang[] MAO Yu-guang[][],WU Fei[],XUE Yi-fan[]. Research on Automatic Test Case Generation Based on Data Flow[J].,2014,24(07):1.
[10]罗雅过[],赵宁社[]. 高校数字化校园数据中心平台的研究与设计[J].计算机技术与发展,2014,24(09):217.
 LUO Ya-guo[],ZHAO Ning-she[]. Research and Design of University Digital Campus Data Center Platform[J].,2014,24(07):217.
[11]程转流[] 王本年.数据流中的频繁模式挖掘[J].计算机技术与发展,2007,(12):53.
 CHENG Zhuan-liu,WANG Ben-nian.Frequent Pattern Mining in Data Streams[J].,2007,(07):53.

更新日期/Last Update: 2016-09-28