«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn.1673-629X.2018.07.032]
点击复制

融合深度学习特征的汉维短语表过滤研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 28
期数:: 2018年07期

页码:: 149-154

栏目:: 应用开发研究

出版日期:: 2018-07-10

文章信息/Info

Title:: Research on Chinese-Uyghur Phrase Table Filtering Integrating Deep Learning Features

文章编号:: 1673-629X(2018)07-0149-06

作者:: 朱顺乐; 浙江海洋大学,浙江舟山 316000

Author(s):: ZHU Shun-le; Zhejiang Ocean University,Zhoushan 316000,China

关键词:: 循环神经网络; 贝叶斯定理; 非连续元; 短语表过滤; 汉维翻译

Keywords:: recurrent neural network; Naï; ve Bayes; skip-gram; phrase table filtering; Chinese-Uyghur translation

分类号:: TP301

DOI:: 10.3969/ j. issn.1673-629X.2018.07.032

文献标志码:: A

摘要:: 汉维机器翻译面临着汉维语言构词、语序差异性大,短语表冗余、不合理信息较多,双语资源匮乏以及相应形态分析工具性能欠佳等挑战,严重影响了汉维机器翻译译文质量。针对汉维短语表中出现较多的不合理短语对,影响翻译性能及解码效率这一问题,提出一种融合汉维短语对循环神经网络特征和汉维短语对上下文特征等深度学习特征,以及汉维短语对平均词共现特征这一浅层特征的汉维短语表过滤模型。该模型基于短语对循环神经网络特征、上下文特征以及平均词共现特征,并将各个特征概率及训练实例输入到基于朴素贝叶斯分类器的短语表过滤模型进行训练。该模型结合了汉维候选短语之间更为丰富的语义及上下文信息。实验结果表明,提出的短语表过滤方法能够有效地去除汉维短语表中的不合理短语,汉维机器翻译性能及其解码效率都有所提高。

Abstract:: Chinese-Uyghur machine translation is faced with challenges such as difference of word formation and word order between Chinese and Uyghur,phrase table redundancy,unreasonable phrase pairs,lacking of bilingual resources and poor performance of corresponding morphological analysis tools,which seriously affect the performance of Chinese-Uyghur machine translation model. To solve these problems in Chinese-Uyghur phrase table that many unreasonable phrase pairs exist and affect the performance and productivity of translation model,we propose a Chinese-Uyghur phrase table filtering model integrating deep learning features like recurrent neural network feature and context feature of Chinese-Uyghur phrase pair and shallow feature like average co-occurrence feature. The model is on the basis of phrases for circulation neural network feature,context feature,and the average word co-occurrence feature,and the characteristics of probability and examples of training are input to phrases list filtering model based on Naive Bayesian classifier for training. This model combines the richer semantic and contextual information between the candidate phrases of Chinese-Uyghur. Experiment shows that the proposed phrase table filtering method can effectively eliminate the unreasonable phrases in the phrase table of Chinese-Uyghur and improve the translation performance and decoding efficiency of Chinese-Uyghur translation machine.

相似文献/References:

[1]王峻.一种基于强属性限定的贝叶斯分类模型[J].计算机技术与发展,2007,(02):205.
　WANG Jun.A Restricted Bayesian Classification Model Based on Strong Attributes[J].,2007,(07):205.
[2]刁琦[],古丽米拉·克孜尔别克[],钟丽峰[],等. 基于循环神经网络序列标注的中文分词研究[J].计算机技术与发展,2017,27(10):65.
　DIAO Qi[],Gulimila·KEZIERBIEKE[],Zhong Li-feng[],et al.Research on Chinese Word Segmentation Method of Sequence Labeling Based on Recurrent Neural Networks[J].,2017,27(07):65.
[3]李梦洁,董峦.基于 PyTorch 的机器翻译算法的实现[J].计算机技术与发展,2018,28(10):160.[doi:10.3969/ j. issn.1673-629X.2018.10.033]
　LI Meng-jie,DONG Luan.Implementation of Machine Translation Algorithm Based on PyTorch[J].,2018,28(07):160.[doi:10.3969/ j. issn.1673-629X.2018.10.033]
[4]牛斌,吴鹏,马利,等.一种基于生成对抗网络的行为数据集扩展方法[J].计算机技术与发展,2019,29(07):43.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 009]
　NIU Bin,WU Peng,MA Li,et al.A Behavior Data Set Extension Method Based on Generative Adversarial Network[J].,2019,29(07):43.[doi:10. 3969 / j. issn. 1673-629X. 2019. 07. 009]
[5]龚安,马光明,郭文婷,等.基于 LSTM 循环神经网络的核电设备状态预测[J].计算机技术与发展,2019,29(10):41.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 009]
　GONG An,MA Guang-ming,GUO Wen-ting,et al.Nuclear Power Equipment Status Prediction Based on LSTM Recurrent Neural Network[J].,2019,29(07):41.[doi:10. 3969 / j. issn. 1673-629X. 2019. 10. 009]
[6]宋祖康,阎瑞霞.基于 CNN-BIGRU 的中文文本情感分类模型[J].计算机技术与发展,2020,30(02):166.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 032]
　SONG Zu-kang,YAN Rui-xia.Chinese Comment Sentiment Classification Model Based on CNN-BIGRU[J].,2020,30(07):166.[doi:10. 3969 / j. issn. 1673-629X. 2020. 02. 032]
[7]康嘉钰,苏凡军.基于生成对抗网络的长短兴趣推荐模型[J].计算机技术与发展,2020,30(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 007]
　KANG Jia-yu,SU Fan-jun.A Long-short-term Interests Recommendation Model Based on Generative Adversarial Networks[J].,2020,30(07):35.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 007]
[8]贾畅,叶飞,刘帅君,等.基于字向量和增强表示 BiLSTM 句子相似度研究[J].计算机技术与发展,2020,30(10):97.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 018]
　JIA Chang,YE Fei,LIU Shuai-jun,et al.Research on Sentence Similarity Based on Character Vector and Enhanced Representation BiLSTM[J].,2020,30(07):97.[doi:10. 3969 / j. issn. 1673-629X. 2020. 10. 018]
[9]产世兵,刘宁钟,沈家全.一种轻量级的不规则场景文本识别模型[J].计算机技术与发展,2020,30(11):20.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 004]
　CHAN Shi-bing,LIU Ning-zhong,SHEN Jia-quan.A Lightweight Model for Irregular Scene Text Recognition[J].,2020,30(07):20.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 004]
[10]文必龙,薛广有.面向油藏地质领域的知识图谱构建研究[J].计算机技术与发展,2021,31(12):204.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 034]
　WEN Bi-long,XUE Guang-you.Research on Knowledge Graph Construction in Reservoir Geology[J].,2021,31(07):204.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 034]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed916
全文下载/Downloads564
评论/Comments