[1]李雪迪,郑彦. 基于分布式倒排索引的频繁项集挖掘[J].计算机技术与发展,2016,26(03):101-104.
 LI Xue-di,ZHENG Yan. Frequent Itemset Mining Based on Distributed Inverted Index[J].,2016,26(03):101-104.
点击复制

 基于分布式倒排索引的频繁项集挖掘()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年03期
页码:
101-104
栏目:
智能、算法、系统工程
出版日期:
2016-03-10

文章信息/Info

Title:
 Frequent Itemset Mining Based on Distributed Inverted Index
文章编号:
1673-629X(2016)03-0101-04
作者:
 李雪迪郑彦
 南京邮电大学 计算机学院
Author(s):
 LI Xue-diZHENG Yan
关键词:
 Eclat算法频繁项集倒排索引并行计算
Keywords:
 Eclat algorithmfrequent itemsetinverted indexparallel computing
分类号:
TP311
文献标志码:
A
摘要:
 频繁项集挖掘是关联规则挖掘中的核心,其直接影响了频繁项集的产生效率。针对Eclat算法在挖掘海量数据中的频繁项集时存在的内存和计算资源不足等问题,文中设计了通过分布式倒排索引实现频繁项集挖掘的DiiEclat算法。倒排索引等同于将数据垂直分布,按事务编号的不同将倒排索引分布式地存储在不同的索引节点上,每个节点上的事务分别做交集,最后由检索代理合并交集结果。在 chess、mushroom、T40I10D100K 和 T10I4D100K 数据集上,对 DiiEclat、Eclat、Diffset等算法进行了实验对比。结果表明,给出的DiiEclat算法通过事务集合垂直划分和并行计算,解决了数据挖掘过程中求交集运算效率低下和内存不足等问题,算法高效、可扩展。
Abstract:
 Mining frequent itemsets is the core of mining association rules, which directly affects the efficiency of generating frequent itemsets. Eclat algorithm exists issues of insufficient memory and computing resource when mining frequent itemset of massive data. The DiiEclat algorithm is proposed for mining frequent itemsets through distributed inverted index. Inverted index is equal to the vertical distri-bution of the data,and according to the number of different transactions inverted index will be distributed on different index nodes,each node calculates the intersection of transactions on itself,the results of the intersection merged by the retrieval agent. The execution time of DiiEclat,Eclat,Diffset and Eclat opt is compared in four datasets such as chess,mushroom,T40I10D100K and T10I4D100K. The experi-mental results show that DiiEclat is given to improve efficiency of intersection operation through the vertical division of the transaction sets and parallel computing,and it is efficient and scalable.

相似文献/References:

[1]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(03):1.
[2]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(03):5.
[3]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(03):13.
[4]侯善江[],张代远[][][]. 基于样条权函数神经网络P2P流量识别方法[J].计算机技术与发展,2014,24(07):21.
 HOU Shan-jiang[],ZHANG Dai-yuan[][][]. P2P Traffic Identification Based on Spline Weight Function Neural Network[J].,2014,24(03):21.
[5]李璨,耿国华,李康,等. 一种基于三维模型的文物碎片线图生成方法[J].计算机技术与发展,2014,24(07):25.
 LI Can,GENG Guo-hua,LI Kang,et al. A Method of Obtaining Cultural Debris’ s Line Chart Based on Three-dimensional Model[J].,2014,24(03):25.
[6]翁鹤,皮德常. 混沌RBF神经网络异常检测算法[J].计算机技术与发展,2014,24(07):29.
 WENG He,PI De-chang. Chaotic RBF Neural Network Anomaly Detection Algorithm[J].,2014,24(03):29.
[7]刘茜[],荆晓远[],李文倩[],等. 基于流形学习的正交稀疏保留投影[J].计算机技术与发展,2014,24(07):34.
 LIU Qian[],JING Xiao-yuan[,LI Wen-qian[],et al. Orthogonal Sparsity Preserving Projections Based on Manifold Learning[J].,2014,24(03):34.
[8]尚福华,李想,巩淼. 基于模糊框架-产生式知识表示及推理研究[J].计算机技术与发展,2014,24(07):38.
 SHANG Fu-hua,LI Xiang,GONG Miao. Research on Knowledge Representation and Inference Based on Fuzzy Framework-production[J].,2014,24(03):38.
[9]叶偲,李良福,肖樟树. 一种去除运动目标重影的图像镶嵌方法研究[J].计算机技术与发展,2014,24(07):43.
 YE Si,LI Liang-fu,XIAO Zhang-shu. Research of an Image Mosaic Method for Removing Ghost of Moving Targets[J].,2014,24(03):43.
[10]余松平[][],蔡志平[],吴建进[],等. GSM-R信令监测选择录音系统设计与实现[J].计算机技术与发展,2014,24(07):47.
 YU Song-ping[][],CAI Zhi-ping[] WU Jian-jin[],GU Feng-zhi[]. Design and Implementation of an Optional Voice Recording System Based on GSM-R Signaling Monitoring[J].,2014,24(03):47.

更新日期/Last Update: 2016-06-12