[1]刘华春,王星捷. 网络舆情信息提取技术研究与实现[J].计算机技术与发展,2016,26(09):8-11.
 LIU Hua-chun,WANG Xing-jie. Research and Implementation of Information Extraction Technology in Network Public Opinion[J].,2016,26(09):8-11.
点击复制

 网络舆情信息提取技术研究与实现()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
26
期数:
2016年09期
页码:
8-11
栏目:
应用开发研究
出版日期:
2016-09-10

文章信息/Info

Title:
 Research and Implementation of Information Extraction Technology in Network Public Opinion
文章编号:
1673-629X(2016)09-0008-04
作者:
刘华春王星捷
 成都理工大学 工程技术学院
Author(s):
 LIU Hua-chun WANG Xing-jie
关键词:
舆情信息  Web信息提取 话题线索  DOC树
Keywords:
 public opinion information Web information extraction topic clues DOC tree
分类号:
TP391
文献标志码:
A
摘要:
 网络舆情信息提取是舆情分析系统中最为关键的部分,是实现舆情分析、舆情统计的数据基础。为此,设计和实现了一个基于话题线索的舆情信息提取方案。该方案将舆情页面以话题为线索进行逻辑划分;采用基于DOM树的广度优先搜索方法,设计了舆情信息提取算法;通过设置最低重复话题阈值兹,用户定制提取格式,信息去重去噪措施,实现了舆情信息的有效提取。通过对多个论坛舆情信息的提取实验,结果表明,所设计的方案有很好的提取性能,召回率、正确率、F指数都较高,能够很好地提取出论坛、评论等舆情信息。
Abstract:
 Internet public opinion information extraction is the most critical part of public opinion analysis system,which is also a data base of the public opinion analysis and statistics. For this reason,a public opinion information extraction method based on clues topic is designed and implemented. In the method,pages of public opinion as one topic clue is divided to logical region,and the breadth-first search methods based on DOM tree is applied to design extraction algorithm of public opinion information. By setting a minimum repeat topic threshold?,customized extraction format,removed duplicate and noise of information,public opinion extraction is realized effec-tively. By experiment of the public opinion of multiple forums,the results show that this scheme has good extract performance,and the re-call,the correct rate and F measure are higher,which is able to well extract forum and reviews and other public opinion information.
更新日期/Last Update: 2016-10-24