[1]魏博诚 王爱平 沙先军 王永.一种消除中文分词中交集型歧义的方法[J].计算机技术与发展,2011,(05):60-63.
 WEI Bo-cheng,WANG Ai-ping,SHA Xian-jun,et al.A Method about Removing Overlapping Ambiguity Producing in Chinese Matching[J].,2011,(05):60-63.
点击复制

一种消除中文分词中交集型歧义的方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2011年05期
页码:
60-63
栏目:
智能、算法、系统工程
出版日期:
1900-01-01

文章信息/Info

Title:
A Method about Removing Overlapping Ambiguity Producing in Chinese Matching
文章编号:
1673-629X(2011)05-0060-04
作者:
魏博诚 王爱平 沙先军 王永
安徽大学计算智能与信号处理教育部重点实验室
Author(s):
WEI Bo-cheng WANG Ai-ping SHA Xian-jun WANG Yong
Ministry of Education Key Lab . of Intelligent Computing & Signal Processing, Anhui University
关键词:
中文分词互信息交集型歧义
Keywords:
Chinese word segmentation mutual information overlapping ambiguity
分类号:
TP31
文献标志码:
A
摘要:
切分速度和精度是中文分词系统的两个主要性能指标。针对传统的中文分浏中出脱的分词速度慢和分词精度不高的问题,采用了双层hash结构的词典机制来提升分词的速度,对于匹配结果中出现的交集型歧义字段,通过互信息的方法来消除,以提高分词精度。并对该分词系统进行了实现。通过与传统的中文分词系统的分词速度以及分词效果的对比,发现该系统在分词速度和精度上都有所进步,从而取得较好的分词效果
Abstract:
Segmentation accuracy and speed are the two main perfonnance indexes of the Chinese word segmentation system. According to the question of slow speed and precision of the word in the traditional Chinese word segmentation, it uses the structure dictionary of double-decked hash mechanism to promote the speed of word segmentation. To improve the segmentation accuracy, use the method of the mutual information to eliminate the overlapping ambiguity string which appeared in the matching results, the Chinese word segmenta- tion system is achieved. The system is improved in the speed and accuracy compared with the traditional Chinese word segmentational system. The experiment results make the good participle progress

相似文献/References:

[1]许镇 王洪国 冉玉梅 杨玉会.基于判别模型的垃圾邮件过滤方法[J].计算机技术与发展,2010,(01):177.
 XU Zhen,WANG Hong-guo,RAN Yu-mei,et al.Spam Filter Method Based on Discriminative Model[J].,2010,(05):177.
[2]盛启东 谭守标 徐超 冯二媛 陈军宁.巧用黑盒法逆推百度中文分词算法[J].计算机技术与发展,2010,(04):136.
 SHENG Qi-dong,TAN Shou-biao,XU Chao,et al.Inferring Baidu's Chinese Word Segmentation Algorithm by Supposing a Black Box[J].,2010,(05):136.
[3]王友国.基于互信息的多阈值系统中随机谐振现象研究[J].计算机技术与发展,2010,(06):89.
 WANG You-guo,LIU Hong-wei,LUO Ji.Stochastic Resonance in Multi-Threshold Systems Based on Mutual Information[J].,2010,(05):89.
[4]张赢 万仲保.对专业搜索引擎中未登录词的识别研究[J].计算机技术与发展,2009,(05):134.
 ZHANG Ying,WAN Zhong-bao.Professional Search Engine Unknown Word of Recognition[J].,2009,(05):134.
[5]牟帅 黄映辉 李冠宇.语义Web服务的OWL—S描述及其应用[J].计算机技术与发展,2009,(01):13.
 MU Shuai,HUANG Ying-hui,LI Guan-yu.OWL - S Description of Semantic Web Service and Its Applications[J].,2009,(05):13.
[6]赵俊杰 胡学钢.一种基于段落词频统计的论文抄袭判定算法[J].计算机技术与发展,2009,(04):231.
 ZHAO Jun-jie,HU Xue-gang.A Way to Judge Plagiarism in Academic Papers Based on Word - Frequency Statistics of Paragraphs[J].,2009,(05):231.
[7]罗桂琼 费洪晓 戴弋.基于反序词典的中文分词技术研究[J].计算机技术与发展,2008,(01):80.
 LUO Gui-qiong,FEI Hong-xiao,DAI Yi.Research of Chinese Segmentation Based on Converse Segmentation Dictionary[J].,2008,(05):80.
[8]程节华 段汉根.汉语短语识别方法研究[J].计算机技术与发展,2008,(04):67.
 CHENG Jie-hua,DUAN Han-gen.Research on Phrase Chunking Methods[J].,2008,(05):67.
[9]钟锋 罗燕京 杨曦 李虎.一种基于合并策略的机构名称切分方法[J].计算机技术与发展,2008,(05):12.
 ZHONG Feng,LUO Yan-jing,YANG Xi,et al.An Organization Name Segmentation Approach Based on Combination Strategy[J].,2008,(05):12.
[10]翟利志 王敬东 李鹏.基于邻域信息的红外与可见光图像互信息配准[J].计算机技术与发展,2008,(10):151.
 ZHAI Li-zhi,WANG Jing-dong,LI Peng.Infrared and Visible Light Image Mutual Information Registration Based on Neighborhood Information[J].,2008,(05):151.
[11]韩月阳 邓世昆 贾时银 李远方.基于字分类的中文分词的研究[J].计算机技术与发展,2011,(07):29.
 HAN Yue-yang,DENG Shi-kun,JIA Shi-yin,et al.Chinese Word Segmentation Research Based on Classification of Words[J].,2011,(05):29.

备注/Memo

备注/Memo:
安徽省自然科学基金项目(090412054)魏博诚(1985-),男,硕士研究生,研究方向为数据挖掘;王爱平,教授.研究方向为数据挖掘、人工智能、编译技术、计算机仿真以及滤波算法收敛性等领域
更新日期/Last Update: 1900-01-01