[1]陈静,郑彦. 基于二叉树的并行频繁项集挖掘算法[J].计算机技术与发展,2015,25(10):80-83.
 CHEN Jing,ZHENG Yan. Parallel Algorithm of Frequent Itemset Mining Based on Binary-tree[J].,2015,25(10):80-83.
点击复制

 基于二叉树的并行频繁项集挖掘算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
25
期数:
2015年10期
页码:
80-83
栏目:
智能、算法、系统工程
出版日期:
2015-10-10

文章信息/Info

Title:
 Parallel Algorithm of Frequent Itemset Mining Based on Binary-tree
文章编号:
1673-629X(2015)10-0080-04
作者:
 陈静郑彦
 南京邮电大学 计算机学院
Author(s):
 CHEN JingZHENG Yan
关键词:
 频繁项集挖掘MapReduce并行计算二叉树
Keywords:
 frequent itemset miningMapReduceparallel computingbinary-tree
分类号:
TP311
文献标志码:
A
摘要:
 大数据时代的到来,使得人们对数据的处理速度、利用率等方面的要求变得更高。在频繁项集挖掘方面, Count Distribution算法和Data Distribution算法是比较经典的并行频繁项集挖掘算法,由于挖掘过程中需要较大的存储空间和通信开销,挖掘效率并不十分理想。文中提出了一种基于二叉树的并行频繁项集挖掘算法,利用了MapReduce的并行性,先通过遍历二叉树的方法找出数据库中固定大小的所有子集,然后统计每个子集的出现次数,再与事先设定好的一个固定阈值进行比较,超过阈值的子集即为所求的频繁项集。通过对实验结果进行对比分析表明,提出的算法只需要一次Ma-pReduce过程即可完成挖掘,充分利用了集群的并行性,不需要使用迭代的方式进行挖掘,性能上明显优于CD和DD算法,也就是说,该算法具有较高的挖掘效率。
Abstract:
 Along with the advent of the era of big data,people have higher requirements in the speed of data processing and the utilization of data. In the aspect of mining frequent itemset,the algorithms of Count Distribution and Data Distribution are classical parallel algo-rithms for mining frequent itemset,because large storage space and communication overhead are needed in the process of mining,the min-ing efficiency is not very ideal. A parallel algorithm of frequent itemset mining based on the binary-tree is proposed in this paper,it takes advantage of the parallelism of MapReduce. Firstly,find out all subsets of fixed size in the database by using the method of traversing the binary-tree. Secondly,count occurrence numbers of each subset,and compare with a fixed threshold which is set in advance. If the occur-rence number of a subset is more than the threshold value,the subset is the frequent itemset which is requested. The study of the compari-son and analysis of the experimental results show that the proposed algorithm needs only one process of MapReduce to complete the min-ing work,it makes full use of the parallelism of the cluster. It does not need to use iterative way for mining frequent itemset,and the per-formance is superior to the CD and DD algorithms,in other words,it has higher mining efficiency.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(10):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(10):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(10):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(10):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(10):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(10):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(10):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(10):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(10):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(10):47.

更新日期/Last Update: 2015-11-12