[1]宁秋怡,史小静,段湘煜.短语后验证的无监督领域适应电商机器翻译[J].计算机技术与发展,2021,31(12):1-6.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 001]
 NING Qiu-yi,SHI Xiao-jing,DUAN Xiang-yu.Unsupervised Domain Adaptation E-commerce Machine Translation Based on Phrase Post-validation[J].,2021,31(12):1-6.[doi:10. 3969 / j. issn. 1673-629X. 2021. 12. 001]
点击复制

短语后验证的无监督领域适应电商机器翻译()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年12期
页码:
1-6
栏目:
人工智能
出版日期:
2021-12-10

文章信息/Info

Title:
Unsupervised Domain Adaptation E-commerce Machine Translation Based on Phrase Post-validation
文章编号:
1673-629X(2021)12-0001-06
作者:
宁秋怡史小静段湘煜
苏州大学,江苏 苏州 215006
Author(s):
NING Qiu-yiSHI Xiao-jingDUAN Xiang-yu
Soochow University,Suzhou 215006,China
关键词:
无监督领域适应词对匹配短语后验证电子商务领域
Keywords:
unsuperviseddomain adaptationword pair matchingphrase post-validatione-commerce domain
分类号:
TP183
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 12. 001
摘要:
目前机器翻译训练普遍依赖于大规模平行语料,但在电子商务领域中,公开的大规模平行语料几乎不存在且很难构建。 为了解决平行资源稀缺的问题,将大量外领域平行数据迁移应用于内领域非平行数据上的方法称为无监督领域适应。 但神经网络机器翻译的领域适应内领域中存在词对匹配不佳的问题,为了解决该问题提出了短语后验证的无监督领域适应电商机器翻译。 获取了中文和英文电子商务的单语数据,在无监督领域适应的迭代过程中,采取逐步数据混合训练策略,提升翻译性能,在此基础上引入短语后验证改善词对匹配问题。 根据大量电商机器翻译基准实验对比,结果表明,短语后验证的无监督领域适应方法在中英双向电子商务领域上较最强基线系统提升约 1. 5 Bleu 点。
Abstract:
At present,machine translation training commonly relies on large-scale parallel corpus,but in the e-commerce domain,public available parallel corpus are non-existent and difficult to construct. In order to solve the problem of scarce parallel resources,the method of applying a large number of parallel data migration from out domain to non-parallel data in domain is called unsupervised domain adaptation. However,there is a problem of poor word pair matching in the in-domain of domain adaptation for neural network machine translation. To solve this problem,an unsupervised domain adaptation e-commerce machine translation based on phrase post-verification is proposed. The monolingual data of Chinese and English e-commerce are obtained. In the iterative process of unsupervised domain adaptation,the step-by - step data mixing training strategy is adopted to improve the translation performance. On this basis, phrase post -validation is introduced to improve the word pair matching problem. According to a large number of baseline experiments of e-commerce machine translation,the unsupervised domain adaptation method based on post-phrase verification is about 1. 5 Bleu points higher than the strongest baseline system.

相似文献/References:

[1]侯 青,杨荣新,张英杰,等.融合深度学习和聚类分析的自适应图像聚类[J].计算机技术与发展,2022,32(01):98.[doi:10. 3969 / j. issn. 1673-629X. 2022. 01. 017]
 HOU Qing,YANG Rong-xin,ZHANG Ying-jie,et al.Adaptive Image Clustering Integrating Deep Learning andClustering Analysis[J].,2022,32(12):98.[doi:10. 3969 / j. issn. 1673-629X. 2022. 01. 017]

更新日期/Last Update: 2021-12-10