[1]李莹 吴晓军.基于最大流及页面相似度的Web结构挖掘[J].计算机技术与发展,2011,(10):112-115.
 LI Ying,WU Xiao-jun.Web Structure Mining Based on Maximum Flow and Page Similar Value[J].,2011,(10):112-115.
点击复制

基于最大流及页面相似度的Web结构挖掘()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年10期
页码:
112-115
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
Web Structure Mining Based on Maximum Flow and Page Similar Value
文章编号:
1673-629X(2011)10-0112-04
作者:
李莹 吴晓军
陕西师范大学计算机科学学院
Author(s):
LI Ying WU Xiao-jun
School of Computer Science, Shaanxi Normal University
关键词:
Web结构挖掘主题漂移页面相似度值
Keywords:
Web structure mining topic drift page similar value
分类号:
TP301.6
文献标志码:
A
摘要:
针对Web结构挖掘算法容易出现“主题漂移”以及主机间的多重互相加强关系的问题,提出了一种基于最大流与页面相似度值的超链接结构挖掘方法。该方法在传统的超链接结构挖掘算法HITS的基础上引入页面相似度值构造邻接矩阵,并结合基于最大流的Web社区发现技术来构建特征向量空间模型,通过迭代计算最终获得价值最高的权威结果集和中心结果集。实验结果证明该方法有较好的查准率与查全率,并有效抑制了“主题漂移”现象,具有一定的实用价值
Abstract:
Aiming to Web structure mining algorithm is easy for a "topic drift" and mutually strengthening relations among the hots of the problem, a method of hyperlink structure mining based on the maximum flow and the page similarity value is presented. On the basis of traditional HITS algorithm, this method introduced the page similarity value and adopted the Web communities identification based on the maximum flow to construct the models of feature vector space. And then the calculation eventually won the highest Nalue of authority-set and hub-set by iterative method. Experimental results show that the method has better recall and precision, what' s more it effectively inhibits the theme of Web structure mining algorithms drift, has some practical value

相似文献/References:

[1]袁浩 黄烟波.网页标题分析对主题爬虫的改进[J].计算机技术与发展,2009,(06):22.
 YUAN Hao,HUANG Yan-bo.Analysis of Title Page to Improve Focus Crawler[J].,2009,(10):22.
[2]周勇 刘锋.基于粗糙集的Web结构挖掘[J].计算机技术与发展,2008,(03):151.
 ZHOU Yong,LIU Feng.Web Structure Mining Based on Rough Set Theory[J].,2008,(10):151.
[3]张玲,祁玉娟,姜华. 改进的Shark-search算法在网络采集中的应用[J].计算机技术与发展,2017,27(08):192.
 ZHANG Ling,QI Yu-juan,JIANG Hua. Application of Improved Shark-search Algorithm in Web Crawler[J].,2017,27(10):192.

备注/Memo

备注/Memo:
中央高校基本科研业务费专项资金资助(GK201002005);陕西省工业攻关计划(2009K09-21)李莹(1984-),女,硕士研究生,研究方向为嵌入式开发与模式识别;吴晓军,副教授,研究方向为系统工程、模式识别、嵌入式系统、智能系统、计算机软件
更新日期/Last Update: 1900-01-01