[1]刁琦[],古丽米拉·克孜尔别克[],钟丽峰[],等. 基于循环神经网络序列标注的中文分词研究[J].计算机技术与发展,2017,27(10):65-68.
 DIAO Qi[],Gulimila·KEZIERBIEKE[],Zhong Li-feng[],et al.Research on Chinese Word Segmentation Method of Sequence Labeling Based on Recurrent Neural Networks[J].,2017,27(10):65-68.
点击复制

 基于循环神经网络序列标注的中文分词研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
27
期数:
2017年10期
页码:
65-68
栏目:
智能、算法、系统工程
出版日期:
2017-10-10

文章信息/Info

Title:
Research on Chinese Word Segmentation Method of Sequence Labeling Based on Recurrent Neural Networks
文章编号:
1673-629X(2017)10-0065-04
作者:
 刁琦[1];古丽米拉·克孜尔别克[1];钟丽峰[2];张健[3];张志强[1]
 1.新疆农业大学 计算机与信息工程学院;2.新疆维吾尔族自治区图书馆;3.新疆虹联软件有限公司
Author(s):
 DIAO Qi[1];Gulimila·KEZIERBIEKE[1];Zhong Li-feng[2];ZHANG Jian[3];ZHANG Zhi-qiang[1]
关键词:
 自然语言处理循环神经网络序列标注中文分词监督学习
Keywords:
 natural language processingrecurrent neural networksequence annotationChinese word segmentation supervised learning
分类号:
TP301.6
文献标志码:
A
摘要:
 分词是中文自然语言处理中的关键技术.在自然语言处理中,序列标注在中文分词中有着极其重要的应用.当前主流的中文分词方法是基于监督学习,从中文文本中提取特征信息.这些方法未能充分地利用上下文信息对中文进行分割,缺乏长距离信息约束能力.针对上述问题进行研究,提出在序列标注的前提下利用双向循环神经网络模型进行中文分词,避免了窗口对上下文大小的限制,可以获得一个词的前面和后面的上下文信息,通过增加上下文能够有效地解决梯度爆炸和爆的问题,然后再在输入层加入训练好的上下文词向量,取得相对较好的分词效果.实验结果表明,该算法的使用可以达到97.3%的中文分词准确率,与传统机器学习分词算法相比,效果较为显著.
Abstract:
 Word segmentation is a key technology in Chinese natural language processing. In natural language processing,sequence labe-ling plays an important role in Chinese word segmentation. The current mainstream Chinese word segmentation method is based on super-vised learning,extraction of feature information from the Chinese text. However,they cannot make full use of context information to seg-ment Chinese,and lack of long-distance information constraint. In order to solve it,Chinese word segmentation is carried on based on bi-directional recurrent neural network model on the premise of sequence labeling,avoiding the limitation of window size on context,obtai-ning the context information of the front and back of a word. It can effectively solve the problem of gradient explosion and explosion by adding context information,and then add a good context vector in the input layer to obtain a relatively good word segmentation effect. The experimental results show that it can achieve 97. 3% accuracy of Chinese word segmentation and is superior to the traditional ma-chine learning segmentation algorithm in the effect.

相似文献/References:

[1]陈国华 赵克 李亚涛 易帅.自然语言处理系统中的事件类名词的耦合处理[J].计算机技术与发展,2008,(06):60.
 CHEN Guo-hua,ZHAO Ke,LI Ya-tao,et al.Coupling Processing of Event Noun in NLP Systems[J].,2008,(10):60.
[2]程节华.基于FAQ的智能答疑系统中分词模块的设计[J].计算机技术与发展,2008,(07):181.
 CHENG Jie-hua.Design of Words Module in Intelligent Q/A System Based on FAQ[J].,2008,(10):181.
[3]杨欢 许威 赵克 陈余.动词属性在自然语言处理当中的研究与应用[J].计算机技术与发展,2008,(07):233.
 YANG Huan,XU Wei,ZHAO Ke,et al.Research and Application of Verb Attributes in Natural Language Processing[J].,2008,(10):233.
[4]孙超 张仰森.面向综合语言知识库的知识融合与获取研究[J].计算机技术与发展,2010,(08):25.
 SUN Chao,ZHANG Yang-sen.Research of Knowledge Integration and Obtaining Oriented Comprehensive Language Knowledge System[J].,2010,(10):25.
[5]党建 亿珍珍 赵克 殷鸿.数学领域集体词结构形式化处理研究[J].计算机技术与发展,2007,(05):121.
 DANG Jian,YI Zhen-zhen,ZHAO Ke,et al.Research of Formalization Processing for Collective Structures in Mathematics Domain[J].,2007,(10):121.
[6]江有福 郑庆华.自然语言网络答疑系统中倒排索引技术的研究[J].计算机技术与发展,2006,(02):126.
 JIANG You-fu,ZHENG Qing-hua.Research of Inverted Index in NLWAS[J].,2006,(10):126.
[7]刘亚清 张瑾 于纯妍.基于义原同现频率的汉语词义排歧系统[J].计算机技术与发展,2006,(05):184.
 LIU Ya-qing,ZHANG Jin,YU Chun-yan.A Chinese Word Sense Disambiguation System Based on Primitive CO- Occurrence Data[J].,2006,(10):184.
[8]刘政怡 李炜 吴建国.基于IMM—IME的汉字键盘输入法编程技术研究[J].计算机技术与发展,2006,(12):43.
 LIU Zheng-yi,LI Wei,WU Jian-guo.Research of Programming Technology of Chinese Input Method Based on IMM- IME[J].,2006,(10):43.
[9]赵鹏 何留进 孙凯 方薇[].基于情感计算的网络中文信息分析技术[J].计算机技术与发展,2010,(11):146.
 ZHAO Peng,HE Liu-jin,SUN Kai,et al.Analyzing Technologies of Internet Chinese Information Based on Affective Computing[J].,2010,(10):146.
[10]徐远方 李成城.基于SVM和词间特征的新词识别研究[J].计算机技术与发展,2012,(05):134.
 XU Yuan-fang,LI Cheng-cheng.Research on New Word Identification Based on SVM and Word Characteristics[J].,2012,(10):134.
[11]梁喜涛,顾磊. 中文分词与词性标注研究[J].计算机技术与发展,2015,25(02):175.
 LIANG Xi-tao,GU Lei. Study on Word Segmentation and Part-of-speech Tagging[J].,2015,25(10):175.
[12]白振凯,黄孝喜,王荣波,等. 基于主题模型的汉语动词隐喻识别[J].计算机技术与发展,2016,26(11):67.
 BAI Zhen-kai,HUANG Xiao-xi,WANG Rong-bo,et al. Chinese Verb Metaphor Recognition Based on Topic Model[J].,2016,26(10):67.

更新日期/Last Update: 2017-11-23