«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]蒙韧邵延振袁鼎荣.一种基于页面Block的Web信息提取方法[J].计算机技术与发展,2010,(01):193-197.
　MENG Ren,SHAO Yan-zhen,YUAN Ding-rong.A Web Information Extraction Algorithm Based on Web Page[J].,2010,(01):193-197.
点击复制

一种基于页面Block的Web信息提取方法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:: 2010年01期

页码:: 193-197

栏目:: 应用开发研究

出版日期:: 1900-01-01

文章信息/Info

Title:: A Web Information Extraction Algorithm Based on Web Page

文章编号:: 1673-629X（2010）01-0197-04

作者:: 蒙韧邵延振袁鼎荣; 广西师范大学

Author(s):: MENG Ren; SHAO Yan-zhen; YUAN Ding-rong; Guangxi Normal University

关键词:: 语义Block; Block权值; Block主题提取; Web信息挖掘

Keywords:: semantic block; block weight; block topic extraction; web data mining

分类号:: TP311

文献标志码:: A

摘要:: 基于页面结构的信息提取是Web数据挖掘中三大研究领域之一。该研究的关键技术是如何识别Web页面的组织形式,从中挖掘所需要的页面信息。文中基于页面的语义分块（Block）给出一个新的块主题提取算法,与传统的以页面为单位的Web信息提取相比,更符合实际情况,粒度优势明显。该算法针对页面中不同分块的重要性给予不同的权值,依据权值大小取舍页面信息提供给用户。针对该算法进行了模拟实验,从实验结果可以看出该算法具有一定的实用性和有效性。

Abstract:: Information extraction based web page structure is one of three web data mining s research fields.Key technology of the research is how to recognize web page s organization form and mine the needed information.Intrduces a new block topic-extracted algorit

备注/Memo

备注/Memo:: 蒙韧（1973-），男，工程师，研究方向为数据挖掘。广西自然科学基金（桂科自0640069）

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed661
全文下载/Downloads300
评论/Comments

更新日期/Last Update: 1900-01-01