[1]钱菲 袁春风.一种软/硬模板相结合的定义抽取算法[J].计算机技术与发展,2012,(09):32-36.
QIAN Fei,YUAN Chun-feng.A Definition Extraction Algorithm Combining Hard Pattern Matching and Soft Pattern Matching[J].,2012,(09):32-36.
点击复制
一种软/硬模板相结合的定义抽取算法(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
- 期数:
-
2012年09期
- 页码:
-
32-36
- 栏目:
-
智能、算法、系统工程
- 出版日期:
-
1900-01-01
文章信息/Info
- Title:
-
A Definition Extraction Algorithm Combining Hard Pattern Matching and Soft Pattern Matching
- 文章编号:
-
1673-629X(2012)09-0032-05
- 作者:
-
钱菲 袁春风
-
南京大学计算机科学与技术系
- Author(s):
-
QIAN Fei; YUAN Chun-feng
-
Departrnent of Computer Science and Technology, Nanjing University
-
- 关键词:
-
定义抽取; 硬模板匹配; 软模板匹配; N元语言模型; 词类格
- Keywords:
-
definition extraction; hard pattern matching; soft pattern matching; N-gram language model; word class lattice
- 分类号:
-
TP391
- 文献标志码:
-
A
- 摘要:
-
术语定义抽取是信息抽取研究领域的重要内容之一。文中提出了一种结合硬模板匹配和软模板匹配技术的综合术语定义自动抽取方法。文中首先使用硬模板库对待抽取文本进行了初步的定义句匹配抽取。接着,通过使用基于N元语言模型的软模板匹配模型来计算待匹配文本中每个句子与软模板之间的匹配度,并通过设定匹配得分阈值来抽取定义句或过滤掉错误召回的非定义句。实验结果表明文中的术语定义抽取方法远远优于单纯的硬模板匹配或软模板匹配方法
- Abstract:
-
Definition extraction is an important topic in the field of information extraction. It proposes a definition extraction method based on both hard pattern matching and soft pattern matching. Firstly, conduct hard matching on candidate sentences and hard patterns. Secondly, n-gram based soft pattern matching model is used to get a matching score between the candidate sentence and the soft pattern. In the second step, an upper threshold is set to recall candidate sentences with a high matching score;A lower threshold is used to rule out some wrongly-recalled sentences by hard matching. The experimental results show that the proposed definition extraction method is far superior to both pure hard pattern matching and soft pattern matching method
备注/Memo
- 备注/Memo:
-
国家自然科学基金资助项目(61072152,61021062)钱菲(1989-),女,泰州人,硕士,主要研究方向为自然语言处理;袁春风,教授,CCF高级会员,主要研究方向为Web信息检索与文本挖掘技术、多媒体文档处理等
更新日期/Last Update:
1900-01-01