[1]苏金波,朱剑宇,杨柳,等.基于关键词相关性的有害信息爬虫系统研究[J].计算机技术与发展,2014,24(03):143-146.
 SU Jin-bo,ZHU Jian-yu,YANG Liu,et al.Research on Harmful Information Crawler System Based on Keywords Correlation[J].,2014,24(03):143-146.
点击复制

基于关键词相关性的有害信息爬虫系统研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
24
期数:
2014年03期
页码:
143-146
栏目:
安全与防范
出版日期:
2014-03-31

文章信息/Info

Title:
Research on Harmful Information Crawler System Based on Keywords Correlation
文章编号:
1673-629X(2014)03-0143-04
作者:
苏金波朱剑宇杨柳刘跃
合肥市公安局网安支队
Author(s):
SU Jin-boZHU Jian-yuYANG LiuLIU Yue
关键词:
元搜索爬虫扩充索引
Keywords:
meta-searchcrawlerkeywords expansionindex
分类号:
TP302.1
文献标志码:
A
摘要:
传统的互联网有害信息发现方法是依据Google、百度等元搜索工具,用户输入关键词进行检索,然后对获取的结果进行研判,但是用户经常无法准确地描述所需的资料,给出的关键词不准确,搜索结果常有用户不关心的垃圾数据,而一些有用的数据却不能列出。文中探讨了一种基于元搜索,引入关键词扩充技术的爬虫方法。该方法在网页抓取,用户检索的时候能扩充输入的关键词,从而提高搜索覆盖率和精度。该方法投入小,效果好,还可通过扩展应用到其他领域。
Abstract:
Traditional approaches to harmful information detection on the Internet are based on Google,Baidu etc. ,users enter keywords for search,and then need to study the results obtained,however users often do not accurately describe the information they want,the key-words given by users are inaccurate,the search results often include what users don't want,some data which users care cannot be listed. It explores a crawler method based on meta-search,which introduces technology of keyword expansion. The method expands keywords in the Web capture and user query to improve information coverage and accuracy,with low cost and good effect,which can be applied to other domain with some extension.

相似文献/References:

[1]蔡建超 郭一平 王亮.基于Lucene.Net校园网搜索引擎的设计与实现[J].计算机技术与发展,2006,(11):73.
 CAI Jian-chao,GUO Yi-ping,WANG Liang.Design and Implementation of School Search Engine Based on Lucene. Net[J].,2006,(03):73.
[2]李跃健 朱程荣.基于Larbin的网络爬虫体系结构的研究与改进[J].计算机技术与发展,2012,(07):147.
 LI Yue-jian,ZHU Cheng-rong.Study and Improvement on System Architectures of Larbin Web Crawler[J].,2012,(03):147.
[3]张海亮 袁道华.基于遗传算法的主题爬虫[J].计算机技术与发展,2012,(08):48.
 ZHANG Hai-liang,YUAN Dao-hua.Focused Crawling Based on Genetic Algorithms[J].,2012,(03):48.

更新日期/Last Update: 1900-01-01