[1]梁喜涛,顾磊. 中文分词与词性标注研究[J].计算机技术与发展,2015,25(02):175-179.
 LIANG Xi-tao,GU Lei. Study on Word Segmentation and Part-of-speech Tagging[J].,2015,25(02):175-179.
点击复制

 中文分词与词性标注研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
25
期数:
2015年02期
页码:
175-179
栏目:
应用开发研究
出版日期:
2015-02-10

文章信息/Info

Title:
 Study on Word Segmentation and Part-of-speech Tagging
文章编号:
1673-629X(2015)02-0175-05
作者:
 梁喜涛顾磊
 南京邮电大学 计算机学院
Author(s):
 LIANG Xi-taoGU Lei
关键词:
 中文分词主动学习词性标注自然语言处理一体化模型
Keywords:
 Chinese word segmentationactive learningPOS taggingCLPjoint model
分类号:
TP311
文献标志码:
A
摘要:
 分词和词性标注是中文语言处理的重要技术,广泛应用于语义理解、机器翻译、信息检索等领域。在搜集整理当前分词和词性标注研究与应用成果的基础上,对中文分词和词性标注的基本方法进行了分类和探讨。首先在分词方面,对基于词典的和基于统计的方法进行了详细介绍,并且列了三届分词竞赛的结果;其次在词性标注方面,分别对基于规则的方法和基于统计的方法进行了阐述;接下来介绍了中文分词和词性标注一体化模型相关方法。此外还分析了各种分词和词性标注方法的优点和不足,在此基础上,为中文分词和词性标注的进一步发展提供了建议。
Abstract:
 Word segmentation and Part-Of-Speech ( POS) tagging are the basic task of the CLP ( Chinese Language Processing) and are widely applied in the semantic understanding,machine translation,information retrieval and other fields. In this paper,based on collecting current research and application results of word segmentation and part-of-speech tagging,analyze and classify the basic methods of Chi-nese Word Segmentation ( CWS) and POS tagging. First in terms of word segmentation,dictionary-based segmentation method and sta-tistics-based segmentation method were introduced in detail and some word segmentation results of the competition were also listed. Sec-ondly in terms of POS tagging,rule-based method and statistics-based method were expounded. Next,the main methods of building the model for joint CWS and POS tagging were presented. In this paper,also analyze the advantages and disadvantages for methods of CWS and POS tagging,based on which suggestions for the further development are put forward.

相似文献/References:

[1]盛启东 谭守标 徐超 冯二媛 陈军宁.巧用黑盒法逆推百度中文分词算法[J].计算机技术与发展,2010,(04):136.
 SHENG Qi-dong,TAN Shou-biao,XU Chao,et al.Inferring Baidu's Chinese Word Segmentation Algorithm by Supposing a Black Box[J].,2010,(02):136.
[2]张赢 万仲保.对专业搜索引擎中未登录词的识别研究[J].计算机技术与发展,2009,(05):134.
 ZHANG Ying,WAN Zhong-bao.Professional Search Engine Unknown Word of Recognition[J].,2009,(02):134.
[3]牟帅 黄映辉 李冠宇.语义Web服务的OWL—S描述及其应用[J].计算机技术与发展,2009,(01):13.
 MU Shuai,HUANG Ying-hui,LI Guan-yu.OWL - S Description of Semantic Web Service and Its Applications[J].,2009,(02):13.
[4]赵俊杰 胡学钢.一种基于段落词频统计的论文抄袭判定算法[J].计算机技术与发展,2009,(04):231.
 ZHAO Jun-jie,HU Xue-gang.A Way to Judge Plagiarism in Academic Papers Based on Word - Frequency Statistics of Paragraphs[J].,2009,(02):231.
[5]罗桂琼 费洪晓 戴弋.基于反序词典的中文分词技术研究[J].计算机技术与发展,2008,(01):80.
 LUO Gui-qiong,FEI Hong-xiao,DAI Yi.Research of Chinese Segmentation Based on Converse Segmentation Dictionary[J].,2008,(02):80.
[6]梁卓明 陈炬桦.基于专有名词优先的快速中文分词[J].计算机技术与发展,2008,(03):24.
 LIANG Zhuo-ming,CFIEN Ju-hua.A Rapid Chinese Word Segmentation Method Based on Priority Special Names[J].,2008,(02):24.
[7]钟锋 罗燕京 杨曦 李虎.一种基于合并策略的机构名称切分方法[J].计算机技术与发展,2008,(05):12.
 ZHONG Feng,LUO Yan-jing,YANG Xi,et al.An Organization Name Segmentation Approach Based on Combination Strategy[J].,2008,(02):12.
[8]许高建 胡学钢 王庆人.文本挖掘中的中文分词算法研究及实现[J].计算机技术与发展,2007,(12):122.
 XU Gao-jian,HU Xue-gang,WANG Oing-ren.Research and Realization of Chinese Text Classification Algorithms on Text Mining[J].,2007,(02):122.
[9]丁兆贵 金敏.基于Lucene的个性化搜索引擎研究与实现[J].计算机技术与发展,2011,(02):105.
 DING Zhao-gui,JIN Min.Research and Implementation of Personal Search Engine Based on Lucene[J].,2011,(02):105.
[10]魏博诚 王爱平 沙先军 王永.一种消除中文分词中交集型歧义的方法[J].计算机技术与发展,2011,(05):60.
 WEI Bo-cheng,WANG Ai-ping,SHA Xian-jun,et al.A Method about Removing Overlapping Ambiguity Producing in Chinese Matching[J].,2011,(02):60.
[11]刁琦[],古丽米拉·克孜尔别克[],钟丽峰[],等. 基于循环神经网络序列标注的中文分词研究[J].计算机技术与发展,2017,27(10):65.
 DIAO Qi[],Gulimila·KEZIERBIEKE[],Zhong Li-feng[],et al.Research on Chinese Word Segmentation Method of Sequence Labeling Based on Recurrent Neural Networks[J].,2017,27(02):65.

更新日期/Last Update: 2015-04-30