[1]李晨,杨子江,朱世伟,等. 基于Hadoop的网络舆情监控平台设计与实现[J].计算机技术与发展,2016,26(02):144-149.
 LI Chen,YANG Zi-jiang,ZHU Shi-wei,et al. Design and Implementation of Network Consensus Monitoring System Based on Hadoop[J].,2016,26(02):144-149.
点击复制

 基于Hadoop的网络舆情监控平台设计与实现()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年02期
页码:
144-149
栏目:
应用开发研究
出版日期:
2016-02-10

文章信息/Info

Title:
 Design and Implementation of Network Consensus Monitoring System Based on Hadoop
文章编号:
1673-629X(2016)02-0144-06
作者:
 李晨杨子江朱世伟于俊凤
 山东省科学院 情报研究所
Author(s):
 LI ChenYANG Zi-jiangZHU Shi-weiYU Jun-feng
关键词:
 HadoopMapReduce 舆情监控文本聚类热点发现话题跟踪
Keywords:
 HadoopMapReducemonitoring public opinion text clusteringhot topic foundingtopic tracking
分类号:
TP311.1
文献标志码:
A
摘要:
 文中设计并实现了一种基于Hadoop的网络舆情监控系统。该系统以HDFS作为底层存储系统,在其上构建基于HBase的分布式数据库对舆情信息进行统一存储管理。首先利用基于MapReduce的分布式网络爬虫进行数据抓取,以解决单机爬虫效率低、可扩展性差等问题;其次采用Canopy结合K-means的二次聚类算法,克服单一K-means聚类算法的不足,以提高文本聚类的效率和准确度;最后实现基于查询的话题追踪策略,对热点话题进行有效跟踪分析。仿真实验表明:Canopy-Kmeans聚类方法比传统K-means方法漏报率、误报率分别降低1.24%、0.09%,最小标准代价降低1.681%。系统通过提供可视化舆情分析报告,为企业或单位及时掌握舆情热点、制定舆情策略提供科学、系统的技术支持。
Abstract:
 A network consensus monitoring system based on Hadoop was designed and realized. The system adopts HDFS as the underly-ing storage system,and then it builds a distributed database based on HBase with it to realize unified storage and management on the net-work consensus information. Firstly,it grabs the data with the distributed web crawler based on MapReduce to solve the problems of low efficiency and poor expansibility of single crawler. Then it uses the secondary clustering algorithm with Canopy combined with K-means, which can overcome the shortages of single K-means clustering algorithm and could improve the efficiency and precision of text cluste-ring. Finally,it could realize the topics tracking strategy based on query,also could be effective track and analysis of hot topics. The simu-lation experiment results show that compared with the traditional methods,the false negative and false positive of Canopy-Kmeans cluste-ring method is lower at 1. 24% and 0. 09% respectively,the minimum standard price is lower at 1. 681%. Through providing the visual-ized analysis of network consensus,the system proposed could provide scientific and systematical technology support for enterprises and scientific institutions to learn the hot network consensus and make network consensus strategy.

相似文献/References:

[1]李远方 邓世昆 闻玉彪 韩月阳.Hadoop-MapReduce下的PageRank矩阵分块算法[J].计算机技术与发展,2011,(08):6.
 LI Yuan-fang,DENG Shi-kun,WEN Yu-biao,et al.PageRank Matrix Partitioned Algorithm Using Hadoop-MapReduce[J].,2011,(02):6.
[2]李远方 贾时银 邓世昆 韩月阳.基于树结构的MapReduce模型[J].计算机技术与发展,2011,(08):149.
 LI Yuan-fang,JIA Shi-yin,DENG Shi-kun,et al.MapReduce Model Based on Tree Structure[J].,2011,(02):149.
[3]王梅,朱信忠,赵建民,等.基于 Hadoop 的海量图像检索系统[J].计算机技术与发展,2013,(01):204.
 WANG Mei,ZHU Xin-zhong,ZHAO Jian-min,et al.Massive Images Retrieval System Based on Hadoop[J].,2013,(02):204.
[4]王晓军,孙惠.基于MapReduce的多路连接优化方法研究[J].计算机技术与发展,2013,(06):59.
 WANG Xiao-jun,SUN Hui.Research of Optimizing Multiway Joins Based on MapReduce[J].,2013,(02):59.
[5]朱贤军,李敬兆.无加密模式下对云数据的隐私保密[J].计算机技术与发展,2013,(06):216.
 ZHU Xian-jun,LI Jing-zhao.Cloud Data Privacy under None Encryption[J].,2013,(02):216.
[6]周婷,张君瑛,罗成.基于Hadoop的K-means聚类算法的实现[J].计算机技术与发展,2013,(07):18.
 ZHOU Ting[],ZHANG Jun-ying[],LUO Cheng[].Realization of K-means Clustering Algorithm Based on Hadoop[J].,2013,(02):18.
[7]吕婉琪,钟诚,唐印浒,等.Hadoop分布式架构下大数据集的并行挖掘[J].计算机技术与发展,2014,24(01):22.
 L Wan-qi,ZHONG Cheng,TANG Yin-hu,et al.Parallel Mining of Large Dataset in Hadoop Distributed Computing Framework[J].,2014,24(02):22.
[8]张志宏,吴庆波,邵立松,等.基于飞腾平台TOE协议栈的设计与实现[J].计算机技术与发展,2014,24(07):1.
 ZHANG Zhi-hong,WU Qing-bo,SHAO Li-song,et al. Design and Implementation of TCP/IP Offload Engine Protocol Stack Based on FT Platform[J].,2014,24(02):1.
[9]梁文快,李毅. 改进的基因表达算法对航班优化排序问题研究[J].计算机技术与发展,2014,24(07):5.
 LIANG Wen-kuai,LI Yi. Research on Optimization of Flight Scheduling Problem Based on Improved Gene Expression Algorithm[J].,2014,24(02):5.
[10]黄静,王枫,谢志新,等. EAST文档管理系统的设计与实现[J].计算机技术与发展,2014,24(07):13.
 HUANG Jing,WANG Feng,XIE Zhi-xin,et al. Design and Implementation of EAST Document Management System[J].,2014,24(02):13.
[11]王晓军,邹亮亮. Hadoop迭代优化技术的研究[J].计算机技术与发展,2014,24(09):98.
 WANG Xiao-jun,ZOU Liang-liang. Research on Optimizing Iterative Technology of Hadoop[J].,2014,24(02):98.
[12]徐源吾[][],王珣[][]. 基于Hadoop的智能家居信息处理平台[J].计算机技术与发展,2014,24(09):183.
 XU Yuan-wu[] [],WANG Xun[][]. nformation Processing Platform of Smart Home Based on Hadoop[J].,2014,24(02):183.
[13]孙媛,黄刚. 基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014,24(11):83.
 SUN Yuan,HUANG Gang. Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].,2014,24(02):83.
[14]王全民,苗雨,何明,等. 基于矩阵分解的协同过滤算法的并行化研究[J].计算机技术与发展,2015,25(02):55.
 ANG Quan-min,MIAO Yu,HE Ming,et al. Parallelized Research on Collaborative Filtering Algorithm Based on Matrix Factorization[J].,2015,25(02):55.
[15]方木云,刘洪彬,谢恩文. Hadoop下基于边聚类的重叠社区发现算法研究[J].计算机技术与发展,2015,25(03):58.
 FANG Mu-yun,LIU Hong-bin,XIE En-wen. Research on Overlapping Communities Detecting Algorithm Using Hadoop Based on Edge Clustering[J].,2015,25(02):58.
[16]秦军[],童毅[],戴新华[],等. 基于MapReduce数据密集型负载调度策略研究[J].计算机技术与发展,2015,25(04):48.
 QIN Jun[],TONG Yi[],DAI Xin-hua[],et al. Research on Scheduling Strategy of Data Intensive Workloads Based on MapReduce[J].,2015,25(02):48.
[17]徐新瑞,孟彩霞,周雯,等. 一种基于Spark时效化协同过滤推荐算法[J].计算机技术与发展,2015,25(06):48.
 XU Xin-rui,MENG Cai-xia,ZHOU Wen,et al. A Real-time Collaborative Filtering Recommendation Algorithm Based on Spark[J].,2015,25(02):48.
[18]马腾腾[],朱庆华[],曹菡[],等. 基于Hadoop的旅游景点推荐的算法实现与应用[J].计算机技术与发展,2016,26(03):47.
 MA Teng-teng[],ZHU Qing-hua[],CAO Han[],et al. Implementation and Application of Algorithm of Tourist Attractions Recommendation Based on Hadoop[J].,2016,26(02):47.
[19]李正杰,黄刚. 基于Hadoop平台的SVM KNN分类算法的研究[J].计算机技术与发展,2016,26(03):75.
 LI Zheng-jie,HUANG Gang. Research on SVM KNN Classification Algorithm Based on Hadoop Platform[J].,2016,26(02):75.
[20]王凤领. 基于Hadoop高校教育资源云存储平台构建研究[J].计算机技术与发展,2016,26(03):176.
 WANG Feng-ling. Study on Construction of Cloud Storage Platform for College Education Resources Based on Hadoop[J].,2016,26(02):176.

更新日期/Last Update: 2016-04-15