«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]李远方邓世昆闻玉彪韩月阳.Hadoop-MapReduce下的PageRank矩阵分块算法[J].计算机技术与发展,2011,(08):6-9.
　LI Yuan-fang,DENG Shi-kun,WEN Yu-biao,et al.PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce[J].,2011,(08):6-9.
点击复制

Hadoop-MapReduce下的PageRank矩阵分块算法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:: 2011年08期

页码:: 6-9

栏目:: 智能、算法、系统工程

出版日期:: 1900-01-01

文章信息/Info

Title:: PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce

文章编号:: 1673-629X（2011）08-0006-04

作者:: 李远方邓世昆闻玉彪韩月阳; 云南大学信息学院

Author(s):: LI Yuan-fang; DENG Shi-kun; WEN Yu-biao; HAN Yue-yang; College of Information,Yunnan University

关键词:: PageRank; MapReduce; Hadoop; 矩阵分块

Keywords:: PageRank; MapReduce; Hadoop; partitioned matrix

分类号:: TP301.6

文献标志码:: A

摘要:: PageRank是Web结构挖掘的经典算法,已在Google搜索引擎中取得了巨大成功。但其迭代次数多,时空消耗大,执行速度和收敛速度都还较慢。文中详细讨论了Hadoop-MapReduce的执行流程及其内部实现机制后,提出了一种并行MapReduce实现矩阵分块的PageRank算法,其实质是减少MapReduce框架结构中Map阶段和Reduce阶段的迭代次数,从而减少时空开销。最后搭建Hadoop-MapReduce开源平台,模拟Web结构爬取,比较了传统算法和改进算法的性能。结果表明,改进后的算法迭代次数低,并行效率较高,在模拟环境中PageRank标识网页等级显示其优越性

Abstract:: PageRank is the classical algorithm of Web structure mining,already has been a huge success in Google search engine.But the more iterative times,the more space-time consumption,execution speed and convergence speed are slower.Put forward a kind of parallel MapReduce framework,realize matrix partition using PageRank algorithm,as a matter of fact substance is the iterations of reducing MapReduce frame structure in Map and Reduce phase,thus reducing space-time overhead.Finally build Hadoop-MapReduce open-source platform,simulate Web structure climb taking,the performance in traditional algorithm and improved algorithm is compared.Results show the improved algorithm has lower iteration times,higher parallel efficiency,using PageRank identification shows its superiority in the simulation environment

相似文献/References:

[1]陈学进.网络结构挖掘算法研究[J].计算机技术与发展,2009,(05):41.
　CHEN Xue-jin.Research of Algorithm for Web Structure Mining[J].,2009,(08):41.
[2]常庆周明全耿国华.基于PageRank和HITS的Web搜索[J].计算机技术与发展,2008,(07):77.
　CHANG Qing,ZHOU Ming-quan,GENG Guo-hua.PageRank and HITS- Based Web Search[J].,2008,(08):77.
[3]姜鑫维赵岳松.Topic PageRank——一种基于主题的搜索引擎[J].计算机技术与发展,2007,(05):238.
　JIANG Xin-wei,ZHAO Yue-song.Topic PageRank：a Search Engine Based on Topic[J].,2007,(08):238.
[4]冯振明.Google核心——PageRank算法探讨[J].计算机技术与发展,2006,(07):82.
　FENG Zhen-ming.Google＇ s Core： Discussion about PageRank Algorithm[J].,2006,(08):82.
[5]李玲娟张敏.云计算环境下关联规则挖掘算法的研究[J].计算机技术与发展,2011,(02):43.
　LI Ling-juan,ZHANG Min.Research on Algorithms of Mining Association Rule under Cloud Computing Environment[J].,2011,(08):43.
[6]李远方贾时银邓世昆韩月阳.基于树结构的MapReduce模型[J].计算机技术与发展,2011,(08):149.
　LI Yuan-fang,JIA Shi-yin,DENG Shi-kun,et al.MapReduce Model Based on Tree Structure[J].,2011,(08):149.
[7]王梅,朱信忠,赵建民,等.基于 Hadoop 的海量图像检索系统[J].计算机技术与发展,2013,(01):204.
　WANG Mei,ZHU Xin-zhong,ZHAO Jian-min,et al.Massive Images Retrieval System Based on Hadoop[J].,2013,(08):204.
[8]贺瑶,王文庆,薛飞.基于云计算的海量数据挖掘研究[J].计算机技术与发展,2013,(02):69.
[9]朱贤军,李敬兆.无加密模式下对云数据的隐私保密[J].计算机技术与发展,2013,(06):216.
　ZHU Xian-jun,LI Jing-zhao.Cloud Data Privacy under None Encryption[J].,2013,(08):216.
[10]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
　ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(08):18.
[11]舒琰,向阳,张骐,等.基于PageRank的微博排名MapReduce算法研究[J].计算机技术与发展,2013,(02):73.
　SHU Yan,XIANG Yang,ZHANG Qi,et al.Research on MapReduce Algorithm of Micro Blog Ranking Based on PageRank[J].,2013,(08):73.

备注/Memo

备注/Memo:: 云南省自然科学基金（2007F174M）; 云南大学研究生科研课题资助项目（ynny200928）李远方（1986-），男，四川人，硕士生，研究方向为云计算网络、分布式计算；邓世昆，教授，研究方向为计算机网络、智能建筑

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed2221
全文下载/Downloads1140
评论/Comments

更新日期/Last Update: 1900-01-01