[1]许德心,李玲娟.基于 Spark 的关联规则挖掘算法并行化研究[J].计算机技术与发展,2019,29(03):30-34.[doi:10.3969/ j. issn.1673-629X.2019.03.006]
 XU De-xin,LI Ling-juan.Research on Parallelization of Association Rules Mining Algorithm Based on Spark[J].,2019,29(03):30-34.[doi:10.3969/ j. issn.1673-629X.2019.03.006]
点击复制

基于 Spark 的关联规则挖掘算法并行化研究()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年03期
页码:
30-34
栏目:
智能、算法、系统工程
出版日期:
2019-03-10

文章信息/Info

Title:
Research on Parallelization of Association Rules Mining Algorithm Based on Spark
文章编号:
1673-629X(2019)03-0030-05
作者:
许德心李玲娟
南京邮电大学 计算机学院,江苏 南京 210023
Author(s):
XU De-xinLI Ling-juan
School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
关键词:
Apriori关联规则并行化Spark推荐算法频繁项集挖掘
Keywords:
Aprioriassociation rulesparallelizationSparkrecommendation algorithmfrequent itemsetsmining
分类号:
TP301.6
DOI:
10.3969/ j. issn.1673-629X.2019.03.006
摘要:
关联规则挖掘是一项重要的数据挖掘任务,关联规则挖掘算法能从数据中挖掘出潜在的关联关系,其中 Apriori 算法是典型代表。 Spark 平台是一个分布式的基于内存的适合迭代计算的大数据框架。 以提高强关联规则的挖掘效率为目标,设计了一种 Apriori 算法基于 Spark 的并行化方案。 该方案利用 Spark 平台的分布式架构以及集群调度机制,将事务数据集分发给多个子节点,各子节点调用 transformation 操作求得局部候选项集及支持度,并存储于内存中;汇总节点中的局部候选项集产生全局候选项集和全局频繁项集;不断迭代,直到下一级候选项集不存在为止。 性能测试实验结果表明,基于 Spark 平台的并行化 Apriori 算法可以有效地分析大型数据项集之间的频繁项集和提取强关联规则,具有较高的准确性和时效性。
Abstract:
Association rule mining is an important task of data mining. Association rule mining algorithm can excavate potential relation ships from data,among which Apriori algorithm is a typical representative. The Spark platform is a distributed memory-based big data framework suitable for iterative computing. In order to improve the mining efficiency of strong association rules,we propose a parallelization scheme of Apriori algorithm based on Spark. The scheme utilizes distributed architecture and cluster scheduling mechanism of theSpark platform to distribute the transaction data set to multiple sub nodes. Each sub node invokes transformation operation to obtain local candidate itemsets and support degree,and stores them in memory. Local candidate itemsets in summary nodes generate global candidateitemsets and global frequent itemsets. The process is iterated until the next level candidate set does not exist. The performance test experiment shows that the parallel Apriori algorithm based on the Spark platform can effectively analyze the frequent itemsets in large data itemsets and extract strong association rules,with high accuracy and timeliness.

相似文献/References:

[1]李雷 丁亚丽 罗红旗.基于规则约束制导的入侵检测研究[J].计算机技术与发展,2010,(03):143.
 LI Lei,DING Ya-li,LUO Hong-qi.Intrusion Detection Technology Research Based on Homing - Constraint Rule[J].,2010,(03):143.
[2]王爱平 王占凤 陶嗣干 燕飞飞.数据挖掘中常用关联规则挖掘算法[J].计算机技术与发展,2010,(04):105.
 WANG Ai-ping,WANG Zhan-feng,TAO Si-gan,et al.Common Algorithms of Association Rules Mining in Data Mining[J].,2010,(03):105.
[3]张广路 雷景生 吴兴惠.一种改进的Apriori关联规则挖掘算法(英文)[J].计算机技术与发展,2010,(06):84.
 ZHANG Guang-lu,LEI Jing-sheng,WU Xing-hui.An Improved Apriori Algorithm for Mining Association Rules[J].,2010,(03):84.
[4]耿波 仲红 徐杰 闫娜娜.用关联分析法对负荷预测结果进行二次处理[J].计算机技术与发展,2008,(04):171.
 GENG Bo,ZHONG Hong,XU Jie,et al.Using Correlation Analysis to Treat Load Forecasting Results[J].,2008,(03):171.
[5]文拯 梁建武 陈英.关联规则算法的研究[J].计算机技术与发展,2009,(05):56.
 WEN Zheng,LIANG Jian-wu,CHEN Ying.Research of Association Rules Algorithm[J].,2009,(03):56.
[6]王晓宇 秦锋 程泽凯 邹洪侠.关联规则挖掘技术的研究与应用[J].计算机技术与发展,2009,(05):220.
 WANG Xiao-yu,QIN Feng,CHENG Ze-kai,et al.Investigation and Application of Association Rules Mining[J].,2009,(03):220.
[7]陈伟.Apriori算法的优化方法[J].计算机技术与发展,2009,(06):80.
 CHEN Wei.Method of Apriori Algorithm Optimization[J].,2009,(03):80.
[8]吕刚[] 郑诚.基于本体的关联规则在电子商务中的应用[J].计算机技术与发展,2009,(06):250.
 LU Gang,ZHENG Cheng.Association Rules with Ontological Information in E- Commerce[J].,2009,(03):250.
[9]郑春香 韩承双.关联规则研究及在远程教育考试系统中的应用[J].计算机技术与发展,2009,(08):186.
 ZHENG Chun-xiang,HAN Cheng-shuang.Research on Association Rule Mining and Application of Long- Distance Education System[J].,2009,(03):186.
[10]郑春香 韩承双 董甲东.关联规则技术在教学评价中的应用[J].计算机技术与发展,2009,(09):215.
 ZHENG Chun-xiang,HAN Cheng-shuang,DONG Jia-dong.Application of Association Rule Mining in Teaching Appraisal[J].,2009,(03):215.
[11]朱其祥 徐勇 张林.基于改进Apriori算法的关联规则挖掘研究[J].计算机技术与发展,2006,(07):102.
 ZHU Qi-xiang,XU Yong,ZHANG Lin.Research on Mining Association Rule Based on Improved Apriori Algorithm[J].,2006,(03):102.
[12]李雷,黄蓉.基于Apriori的快速剪枝和连接的新算法[J].计算机技术与发展,2014,24(05):31.
 LI Lei,HUANG Rong.A New Quick Pruning and Connection Algorithm Based on Apriori[J].,2014,24(03):31.
[13]刘木林,朱庆华. 基于Hadoop的关联规则挖掘算法研究--以Apriori算法为例[J].计算机技术与发展,2016,26(07):1.
 LIU Mu-lin,ZHU Qing-hua. Research on Association Rules Mining Algorithm Based on Hadoop-Taking Apriori as an Example[J].,2016,26(03):1.
[14]秦军[],郝天曙[],董倩倩[]. 基于MapReduce的Apriori算法并行化改进[J].计算机技术与发展,2017,27(04):64.
 QIN Jun[],HAO Tian-shu[],DONG Qian-qian[]. Parallel Improvement of Apriori Algorithm Based on MapReduce[J].,2017,27(03):64.
[15]李德辰,吕一帆,赵学健.一种基于预判筛选的频繁项集挖掘算法[J].计算机技术与发展,2018,28(05):99.[doi:10.3969/ j. issn.1673-629X.2018.05.023]
 LI De-chen,LYU Yi-fan,ZHAO Xue-jian.A Frequent Item-set Mining Algorithm Based on Prejudgment and Screening[J].,2018,28(03):99.[doi:10.3969/ j. issn.1673-629X.2018.05.023]

更新日期/Last Update: 2019-03-10