[1]秦军[],郝天曙[],董倩倩[]. 基于MapReduce的Apriori算法并行化改进[J].计算机技术与发展,2017,27(04):64-68.
 QIN Jun[],HAO Tian-shu[],DONG Qian-qian[]. Parallel Improvement of Apriori Algorithm Based on MapReduce[J].,2017,27(04):64-68.
点击复制

 基于MapReduce的Apriori算法并行化改进()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年04期
页码:
64-68
栏目:
智能、算法、系统工程
出版日期:
2017-04-10

文章信息/Info

Title:
 Parallel Improvement of Apriori Algorithm Based on MapReduce
文章编号:
1673-629X(2017)04-0064-05
作者:
 秦军[1]郝天曙[2]董倩倩[2]
1. 南京邮电大学 教育科学与技术学院;2.南京邮电大学 计算机学院
Author(s):
 QIN Jun[1]HAO Tian-shu[2]DONG Qian-qian[2]
关键词:
 关联规则数据挖掘MapReduceApriori
Keywords:
 association rulesdata miningMapReduceApriori
分类号:
TP301.6
文献标志码:
A
摘要:
 
基于MapReduce的并行Apriori算法解决了传统Apriori算法多次扫描数据库的问题,但是其候选集仍然由频繁项集经过串行自连接产生,并产生了大量的候选集中间数据.为了提高Apriori算法挖掘频繁项集的效率,在基于MapReduce的Apriori算法的基础上对连接步进行并行化改进,提出大数据环境下挖掘频繁项目集的新算法-CApriori算法.新算法通过Map、Reduce过程从频繁 k- 项集中并行得到 k+1 项候选集,使得Apriori算法产生频繁项集的整个过程并行化,减少了迭代过程中候选集数目,节约了存储空间和时间开销.通过对时间复杂度进行分析比较,改进算法在处理大规模数据时会大大减少连接步的时间消耗.将CApriori算法在Hadoop平台上进行了实验,结果表明改进算法在大数据和较小支持度环境下都具有更高的效率,且能取得优异的加速功能.
Abstract:
 The parallel Apriori algorithm based on the MapReduce solves the problem that the traditional Apriori algorithm scans database for many times,but the candidates are still generated from the connection of serial by the frequent itemsets and generate a large number of data.In order to improve the efficiency of mining frequent itemsets for Apriori,an improved parallel Apriori algorithm named CApriori is proposed in large data environment,which realizes parallel candidate generation steps under MapReduce framework.The new algorithm generates the k+1 candidate itemsets from k frequent itemsets through the process of Map and Reduce,which makes the whole process of generating frequent item sets in parallel,reducing the number of candidate sets,saving storage space and time overhead.On analysis of the time complexity of CApriori algorithm and Apriori algorithm,it indicates that CApriori algorithm reduces the time consumed when connected in dealing with large-scale data.CApriori is executed on Hadoop platform and the results show that the improved algorithm in big data environment and smaller support is more efficient,and can obtain excellent acceleration.

相似文献/References:

[1]李雷 丁亚丽 罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
 LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(04):143.
[2]王爱平 王占凤 陶嗣干 燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
 WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(04):105.
[3]张广路 雷景生 吴兴惠.一种改进的Apriori关联规则挖掘算法(英文)[J].计算机技术与发展,2010,(06):84.
 ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(04):84.
[4]耿波 仲红 徐杰 闫娜娜.用关联分析法对负荷预测结果进行二次处理[J].计算机技术与发展,2008,(04):171.
 GENG Bo,ZHONG Hong,XU Jie,et al.Using Correlation Analysis to Treat Load Forecasting Results[J].,2008,(04):171.
[5]文拯 梁建武 陈英.关联规则算法的研究[J].计算机技术与发展,2009,(05):56.
 WEN Zheng,LIANG Jian-wu,CHEN Ying.Research of Association Rules Algorithm[J].,2009,(04):56.
[6]王晓宇 秦锋 程泽凯 邹洪侠.关联规则挖掘技术的研究与应用[J].计算机技术与发展,2009,(05):220.
 WANG Xiao-yu,QIN Feng,CHENG Ze-kai,et al.Investigation and Application of Association Rules Mining[J].,2009,(04):220.
[7]陈伟.Apriori算法的优化方法[J].计算机技术与发展,2009,(06):80.
 CHEN Wei.Method of Apriori Algorithm Optimization[J].,2009,(04):80.
[8]吕刚[] 郑诚.基于本体的关联规则在电子商务中的应用[J].计算机技术与发展,2009,(06):250.
 LU Gang,ZHENG Cheng.Association Rules with Ontological Information in E- Commerce[J].,2009,(04):250.
[9]郑春香 韩承双.关联规则研究及在远程教育考试系统中的应用[J].计算机技术与发展,2009,(08):186.
 ZHENG Chun-xiang,HAN Cheng-shuang.Research on Association Rule Mining and Application of Long- Distance Education System[J].,2009,(04):186.
[10]郑春香 韩承双 董甲东.关联规则技术在教学评价中的应用[J].计算机技术与发展,2009,(09):215.
 ZHENG Chun-xiang,HAN Cheng-shuang,DONG Jia-dong.Application of Association Rule Mining in Teaching Appraisal[J].,2009,(04):215.
[11]顾伟[][],傅德胜[][],蔡玮[]. 基于命题逻辑的关联规则挖掘算法[J].计算机技术与发展,2015,25(03):91.
 GU Wei[][],FU De-sheng[][],CAI Wei[]. Association Rules Mining Algorithm Based on Propositional Logic[J].,2015,25(04):91.
[12]吴红星,王浩. 基于Apriori改进算法的企业Web日志挖掘研究[J].计算机技术与发展,2015,25(04):43.
 WU Hong-xing,WANG Hao. Research on Enterprise Web Log Mining Based on Improved Apriori Algorithm[J].,2015,25(04):43.
[13]伊瑶瑶,茅苏. Hadoop下的关联规则分析研究[J].计算机技术与发展,2015,25(09):84.
 YI Yao-yao,MAO Su. Research on Association Rules Analysis under Hadoop Platform[J].,2015,25(04):84.
[14]杨成,杜秀春,康文杰. 基于关联规则挖掘的关键基础设施安全事件分析[J].计算机技术与发展,2015,25(10):154.
 YANG Cheng,DU Xiu-chun,KANG Wen-jie. Analysis of Critical Infrastructure Reports Based on Association Rules Mining[J].,2015,25(04):154.
[15]田亚凯,陈小惠. 改进关联规则算法在医疗监控中的应用[J].计算机技术与发展,2015,25(10):183.
 TIAN Ya-kai,CHEN Xiao-hui. Application of an Improved Algorithm of Association Rules in Health Monitoring Center[J].,2015,25(04):183.
[16]刘木林,朱庆华. 基于Hadoop的关联规则挖掘算法研究--以Apriori算法为例[J].计算机技术与发展,2016,26(07):1.
 LIU Mu-lin,ZHU Qing-hua. Research on Association Rules Mining Algorithm Based on Hadoop-Taking Apriori as an Example[J].,2016,26(04):1.
[17]张珏[][],陈莉[],田建学[]. 面向零售业的关联规则挖掘的研究与实现[J].计算机技术与发展,2016,26(10):146.
 ZHANG Jue[][],CHEN Li[],TIAN Jian-xue[]. Research and Realization of Association Rules Mining in Supermarket[J].,2016,26(04):146.
[18]张永梅,许静,郭莎. 基于堆排序的重要关联规则挖掘算法研究[J].计算机技术与发展,2016,26(12):45.
 ZHANG Yong-mei,XU Jing,GUO Sha. Research on Association Rules Mining Algorithm for Main Target[J].,2016,26(04):45.
[19]施海鹰. 基于关联规则挖掘的分类随机游走算法[J].计算机技术与发展,2017,27(09):1.
 SHI Hai-ying. Random-walk Classification Algorithm with Association Rules Mining[J].,2017,27(04):1.

更新日期/Last Update: 2017-06-16