[1]肖红玉,贺辉,黄灼东,等.基于Nutch的就业垂直搜索引擎研究[J].计算机技术与发展,2019,29(02):207-211.[doi:10.3969/j.issn.1673-629X.2019.02.043]
 XIAO Hongyu,HE Hui,HUANG Zhuodong,et al.Research on Employment Vertical Search Engine Based on Nutch[J].,2019,29(02):207-211.[doi:10.3969/j.issn.1673-629X.2019.02.043]
点击复制

基于Nutch的就业垂直搜索引擎研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
29
期数:
2019年02期
页码:
207-211
栏目:
应用开发研究
出版日期:
2019-02-10

文章信息/Info

Title:
Research on Employment Vertical Search Engine Based on Nutch
文章编号:
1673-629X(2019)02-0207-05
作者:
肖红玉贺辉黄灼东蔡昭阳
北京师范大学珠海分校 信息技术学院,广东 珠海 519087
Author(s):
XIAO Hong-yuHE HuiHUANG Zhuo-dongCAI Zhao-yang
School of Information Technology,Beijing Normal Univerity,Zhuhai,Zhuhai 519087,China
关键词:
垂直搜索引擎LinkRank 算法就业Nutch
Keywords:
vertical search engineLinkRank algorithmemploymentNutch
分类号:
TP302
DOI:
10.3969/j.issn.1673-629X.2019.02.043
摘要:
针对通用搜索引擎专业性不够、查准率较低的问题,基于 Nutch 开源搜索引擎,采用基于本地词库和动态加载词库的正向迭代最细粒度切分算法实现中文分词。基于特征词和元数据标签的空间向量模型实现就业领域主题相关性判定,基于 MapReduce 引入网页链入链接权重因子和时间衰减因子改进 LinkRank 排序算法等对 Nutch 进行二次开发,并在网页信息抓取和过滤、就业信息搜索和特征词推荐等环节引入就业领域本体信息,采用 Java 框架技术对用户查询接口进行了二次开发,提供了如关键字智能提醒、定制爬虫、二次查找、设定查询结果日期、订阅查询等扩展查询接口,设计并实现了基于 Nutch 的就业垂直搜索引擎。实验结果表明,基于 Nutch 的就业垂直搜索引擎具有较高的查准率,可以满足用户专业检索的需求。
Abstract:
Aiming at the problems that the general search engine has poor profession and low precision rate,based on Nutch,an open source engine,we use forward iteration and fine-grained segmentation algorithm based on local word lexicon and dynamically loaded word lexicon to achieve Chinese word segmentation. Vector space model based on feature words and metadata tags is used to determine topic relevance in employment field. The LinkRank sorting algorithm supporting MapReduce which is introduced the link weight factor and time decay factor is improved to make a secondary development of Nutch and employment domain ontology is applied to web information crawling and filtering,employment information retrieval and feature word recommendation stages. Spring MVC technology is used to develop the user query interface,which provides the extended query interface such as keyword intelligent reminder,customized crawler,secondary search,setting query result date,subscription query and so on. At last,the employment vertical search engine based on Nutch is designed and implemented. Experiment shows that the employment vertical search engine based on Nutch has a high precision and can meet the professional needs of user retrieval.

相似文献/References:

[1]罗林波 陈绮 吴清秀.基于Shark-Search和Hits算法的主题爬虫研究[J].计算机技术与发展,2010,(11):76.
 LUO Lin-bo,CHEN Qi,WU Qing-xiu.Research on Topical Crawler of Shark-Search Algorithm and Hits Algorithm[J].,2010,(02):76.
[2]邹嵩 赵诗阳 周新志.垂直搜索引擎中分词技术的算法研究[J].计算机技术与发展,2012,(02):131.
 ZOU Song,ZHAO Shi-yang,ZHOU Xin-zhi.Word Segmentation Algorithm in Vertical Search Engine[J].,2012,(02):131.
[3]陈燕红[],刘风华[]. 一种改进的潜在语义检索模型研究[J].计算机技术与发展,2014,24(09):120.
  Study on Improved Latent Semantic Retrieval Model[J].,2014,24(02):120.

更新日期/Last Update: 2019-02-10