[1]张玲,许亮,姜华.Web采集中信息组合自学习的研究[J].计算机技术与发展,2013,(11):216-218.
 ZHANG Ling,XU Liang,JIANG Hua.Research on Self-learning of Information Combination in Web Collecting[J].,2013,(11):216-218.
点击复制

Web采集中信息组合自学习的研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2013年11期
页码:
216-218
栏目:
应用开发研究
出版日期:
1900-01-01

文章信息/Info

Title:
Research on Self-learning of Information Combination in Web Collecting
文章编号:
1673-629X(2013)11-0216-03
作者:
张玲许亮姜华
湖南省第一师范学院 信息科学与工程系
Author(s):
ZHANG LingXU LiangJIANG Hua
关键词:
Web采集者链接价值主题搜索搜索策略Web信息组合
Keywords:
Web crawlerslinkage valuetopic searchingsearching strategyWeb information combination
文献标志码:
A
摘要:
Web采集者为了尽可能准确地采集符合主题的网页信息,一般会根据多种Web信息来预测待采集链接的价值。文中为了提高Web采集系统预测链接价值的准确性,提出了一种能根据已采集页面自行调整Web信息重要性的Web采集者。它具有学习能力,能通过对训练集的爬行,分析出对于预测链接价值各种Web信息的重要性,以此调整采集过程中各Web信息的组合权值,得到符合实际Web情况的较优搜索策略。以计算机作为采集主题,对此算法和传统的Web信息固定组合的算法进行了比较。实验结果表明,较之传统的Web采集者,使用此算法的采集者具有较高的Web搜索精度
Abstract:
In order to precisely obtain Web pages on the topic,the Web crawler usually uses various Web information to forecast the linka-ges' value. In this paper,in order to improve the Web crawlers' accuracy in forecasting linkages' value,a Web searching strategy is pro-posed,which can automatically adjust the importance of various Web information according to the crawled Web pages. This crawler has learning ability,which can analyze the importance of Web information through crawling the training set,and then adjust the weights of Web information,get a better search strategy corresponding to actual Web. The algorithm and traditional Web information combination al-gorithm is compared. The experiment result shows that compared with the Web crawler based on fixed weights of Web information,the new crawler has higher searching accuracy

相似文献/References:

[1]张玲,祁玉娟,姜华. 改进的Shark-search算法在网络采集中的应用[J].计算机技术与发展,2017,27(08):192.
 ZHANG Ling,QI Yu-juan,JIANG Hua. Application of Improved Shark-search Algorithm in Web Crawler[J].,2017,27(11):192.

更新日期/Last Update: 1900-01-01