[1]闻玉彪 贾时银 邓世昆 李远方.一种改进的最大匹配中文分词算法[J].计算机技术与发展,2011,(10):92-94.
 WEN Yu-biao,JIA Shi-yin,DENG Shi-kun,et al.An Improved Algorithm for Maximum Matching of Chinese Word Segmentation[J].,2011,(10):92-94.
点击复制

一种改进的最大匹配中文分词算法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年10期
页码:
92-94
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
An Improved Algorithm for Maximum Matching of Chinese Word Segmentation
文章编号:
1673-629X(2011)10-0092-03
作者:
闻玉彪 贾时银 邓世昆 李远方
云南大学信息学院
Author(s):
WEN Yu-biao JIA Shi-yin DENG Shi-kun LI Yuan-fang
College of Information, Yunnan University
关键词:
最大匹配索引词库分词
Keywords:
maximum matching index thesaurus segmentation
分类号:
TP391.1
文献标志码:
A
摘要:
最大匹配算法包括正向最大匹配和逆向最大匹配两种算法,是中文分词领域的基础性算法,目前被广泛应用于众多领域。文中在详细分析了最大匹配算法的优缺点的基础上,提出了一种改进的最大匹配分词算法。改进算法在分词前先对词库进行了规范化预处理,分词时由汉字检索到该字开头的词组,再按词组长度由长到短的顺序使用传统最大匹配算法检索词库。目的是解决传统方法匹配效率低下和不能切分长词的问题。经算法分析结果表明,改进的算法较传统的最大匹配算法高效,分词能力更强
Abstract:
Maximum matching algorithm includes two, they are forward and reverse maximum matching algorithm. R is a fundamental algorithm in the field of Chinese word, is widely used in many fields. In this paper,get a detailed analysis of advantages and disadvantages of the maximum matching algorithm, based on it, proposed an improved algorithra for maximum matching of word segmentation. In improved algorithm deal thesaurus with some rules at first, when doing Chinese word segmentation it searches the relative Chinese phrases with the beginning character of the word, then matches word with the traditional maximum matching algorithm from long to short of the order of phrases' length. The aim is to reduce the inefficiencies of traditional methods and solve the problem that the long word can not split well. The algorithm results show that the improved algorithm is better than the traditional maximum matching algorithm in efficiency, and more powerful

相似文献/References:

[1]李永春 丁华福.Lucene的全文检索的研究与应用[J].计算机技术与发展,2010,(02):12.
 LI Yong-chun,DING Hua-fu.Research and Application of Full Text Search Based on Lucene[J].,2010,(10):12.
[2]倪晓军.高效Unicode/GB编码转换算法的设计和实现[J].计算机技术与发展,2009,(09):21.
 NI Xiao-jun.A High- Performance Unicode/GB Transcoding Algorithm[J].,2009,(10):21.
[3]段红亮 雷向东.移动环境下多信道广播的有效数据访问[J].计算机技术与发展,2009,(04):5.
 DUAN Hong-liang,LEI Xiang-dong.Efficient Data Access of Multi- Channel Broadcast in Mobile Environment[J].,2009,(10):5.
[4]林碧英 赵锐 陈良臣.基于Lucene的全文检索引擎研究与应用[J].计算机技术与发展,2007,(05):184.
 LIN Bi-ying,ZHAO Rui,CHEN Liang-chen.Research and Application of Full Text Search Engine Based on Lucene[J].,2007,(10):184.
[5]许高建 胡学钢 王庆人.文本挖掘中的中文分词算法研究及实现[J].计算机技术与发展,2007,(12):122.
 XU Gao-jian,HU Xue-gang,WANG Oing-ren.Research and Realization of Chinese Text Classification Algorithms on Text Mining[J].,2007,(10):122.
[6]蔡建超 郭一平 王亮.基于Lucene.Net校园网搜索引擎的设计与实现[J].计算机技术与发展,2006,(11):73.
 CAI Jian-chao,GUO Yi-ping,WANG Liang.Design and Implementation of School Search Engine Based on Lucene. Net[J].,2006,(10):73.
[7]李玲娟 倪铖 韩京宇.一种新的基于Dewey编码的XML路径索引[J].计算机技术与发展,2010,(10):98.
 LI Ling-juan,NI Cheng,HAN Jing-yu.A Novel Dewey-Based XML Path Index for XML Data[J].,2010,(10):98.
[8]周锦程 王丹 余泉 张维.基于Lucene的全文检索系统的研究与实现[J].计算机技术与发展,2011,(03):67.
 ZHOU Jin-cheng,WANG Dan,YU Quan,et al.Research and Implementation of Full-Text Retrieval Engine Based on Lucene[J].,2011,(10):67.
[9]张春燕 刘发升.关于Lucene索引工具的性能优化研究[J].计算机技术与发展,2011,(05):121.
 ZHANG Chun-yan,LIU Fa-sheng.Lucene Indexing Tools Research Based on Optimization of Performance[J].,2011,(10):121.
[10]张俊,李鲁群,周熔.基于Lucene的搜索引擎的研究与应用[J].计算机技术与发展,2013,(06):230.
 ZHANG Jun,LI Lu-qun,ZHOU Rong.Research and Application of Search Engine Based on Lucene[J].,2013,(10):230.

备注/Memo

备注/Memo:
云南省自然科学基金(2007F174M);云南大学研究生科研课题资助项目(200928)闻玉彪(1984-),男,云南人,硕士生,研究方向为Web信息挖掘与提取、中文信息处理;邓世昆,教授,研究方向为计算机网络、智能建筑
更新日期/Last Update: 1900-01-01