[1]刘京城 刘锋.一种改进的基于后缀数组的无词典分词方法[J].计算机技术与发展,2011,(11):49-52.
 LIU Jing-cheng,LIU Feng.An Improved Automatic and Dictionary-Free Chinese Word Segmentation Method Based on SuffLx Array[J].,2011,(11):49-52.
点击复制

一种改进的基于后缀数组的无词典分词方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年11期
页码:
49-52
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
An Improved Automatic and Dictionary-Free Chinese Word Segmentation Method Based on SuffLx Array
文章编号:
1673-629X(2011)11-0049-04
作者:
刘京城 刘锋
安徽大学
Author(s):
LIU Jing-cheng LIU Feng
Anhui University
关键词:
自动分词无词典分词后缀数组
Keywords:
automatic word segmentation dictionary-free word segmentationsuffix array
分类号:
TP31
文献标志码:
A
摘要:
文中改进了基于后缀数组的无词典分词算法。原算法通过对输入字符集建立后缀数组并按字典序进行排列来筛选汉字结合模式形成候选词集,并通过置信度的比较来筛选候选词集以获得分词集。文中改进了其计算候选词出现频率的方法并且大大减少了筛选候选词集时两两判断候选词是否具有父子关系的次数。试验表明,改进的算法能够在没有词典的情况下更快速构建候选词集和筛选候选词集。适用于对词条频度敏感,对计算速度要求较高的中文信息处理
Abstract:
It improved the original algorithm of automatic and dictionary-free Chinese segmentation based on suffix array. The original algorithm gets the candidate words by filtering the co-occurrence patterns of Chinese characters extracted from the input corpus with al- phabetically sorted suffix array. And by filtering the candidate words through the confidence comparison the result set words are gotten. In this paper,improved the method that counted the frequency of the candidate words and reduced the number of judgments whether two candidate words have the father-and-son relationship when filtering the candidate words. Experiment results show that by the improved algorithm one can get and filter the candidate words more quickly without the help of the dictionary.' This method is particularly suitable for lexical-frequeney-sensitive as well as time-critical Chinese information processing application

相似文献/References:

[1]杨为民 李龙澍.基于Agent的文本分类系统[J].计算机技术与发展,2007,(02):135.
 YANG Wei-min,LI Long-shu.An Automatic Text Categorization System Based on Agent[J].,2007,(11):135.

备注/Memo

备注/Memo:
安徽省教育厅自然科学研究资助项目(KJ2009A60)刘京城(1986-),男,安徽黄山人,硕士研究生,研究方向为数据挖掘、软件工程;刘锋,教授,硕士生导师,研究方向为软件工程与并行计算
更新日期/Last Update: 1900-01-01