基于端到端的藏汉语音翻译-《计算机技术与发展》

文章信息/Info

Title:: Tibetan-Chinese Speech-to-speech Translation Based on End-to-end

文章编号:: 1673-629X(2025)06-0166-09

作者:: 步寅硕1; 2; 3; 仁增多杰1; 2; 3; 格桑加措1; 2; 3; 拉毛吉1; 2; 3; 尼玛扎西1; 2; 3; 1. 西藏大学信息科学技术学院,西藏拉萨 850000;
2. 西藏自治区藏文信息技术人工智能重点实验室,西藏拉萨 850000;
3. 藏文信息技术教育部工程研究中心,西藏拉萨 850000

Author(s):: BU Yin-shuo1; 2; 3; Renzeng Duojie1; 2; 3; Kalzang Gyatso1; 2; 3; Lhamao Kyi1; 2; 3; Nyima Trashi1; 2; 3; 1. School of Information Science and Technology,Tibet University,Lhasa 850000,China;
2. Key Laboratory of Tibetan Information Technology and Artificial Intelligence of Tibet,Lhasa 850000,China;
3. Engineering Research Center of Tibetan Information Technology,Ministry of Education,Lhasa 850000,China

关键词:: 端到端; 语音翻译; 低资源; 藏语; Hubert

Keywords:: end-to-end; speech-to-speech translation; low resources; Tibetan; Hubert

分类号:: TP391

DOI:: 10.20165/j.cnki.ISSN1673-629X.2025.0042

摘要:: 语音翻译在跨语言交流中具有重要意义,它能够消除语言障碍,实现不同语言间的即时沟通。传统的级联式 S2ST系统存在错误复合和延迟较高的问题。相比之下,端到端模型通过简化处理流程,有效减少延迟并提升翻译准确性。目前,端到端语音翻译的研究主要集中在高资源语言对,而在低资源语言对,尤其是藏汉语音翻译领域缺少相关研究成果。针对该问题,该文提出一种端到端藏汉语音翻译方法。该方法首先引入基于声学特征扰动增强方法的语音数据增强技术,解决藏汉语音翻译数据资源匮乏的问题。其次,引入双边扰动技术对 Hubert 模型进行微调,通过风格归一化和信息增强阶段减少声学多模态对翻译的影响。再次,引入 S2UT(Speech-to-units)模型实现源语言语音到目标语言离散单元的转换,以解决 Mel-spectrogram 映射中存在的语言内容与声学特征混淆的问题。最后,在模型中加入目标语言的语音识别辅助任务,通过联合解码提高语音翻译性能。实验结果显示在藏语-汉语语音翻译任务中,BLEU 分数相比基线模型提升了 12. 61,验证了该模型在低资源多模态语言对下的有效性。

Abstract:: Speech-to-speech translation is of great significance in cross-language communication. It can eliminate language barriers and achieve instant communication between different languages. The traditional cascade S2ST system has the problems of error compounding and high latency. In contrast,the end-to-end model effectively reduces latency and improves translation accuracy by simplifying the pro-cessing flow. At present,the research on end-to-end speech-to-speech translation mainly focuses on high-resource language pairs,while there is a lack of relevant research results in low-resource language pairs,especially in the field of Tibetan-Chinese speech-to-speech translation. Therefore,we propose an end-to-end Tibetan-Chinese speech-to-speech translation method. Firstly,the speech data enhancement technology based on the acoustic feature perturbation enhancement method is introduced to solve the problem of scarce data resources for Tibetan-Chinese speech-to-speech translation. Secondly,the bilateral perturbation technology is introduced to fine-tune the Hubert model,and the influence of acoustic multimodality on translation is reduced through style normalization and information en-hancement stages. Thirdly,the Speech-to-units ( S2UT) model is introduced to realize the conversion of source language speech to target language discrete units to solve the problem of confusion between language content and acoustic features in Mel - spectrogram mapping. Finally,the speech recognition auxiliary task of the target language is added to the model to improve the speech-to-speech translation performance through joint decoding. The experimental results show that in the Tibetan-Chinese speech-to-speech translation task,the BLEU score is improved by 12. 61 compared with the baseline model. The results demonstrate the effectiveness of the proposed model in low-resource multimodal language pairs.

相似文献/References:

[1]刘立明. 分布式环境下端到端的多路并行传输机制研究[J].计算机技术与发展,2017,27(06):1.
　LIU Li-ming. Research on End-to-end Multipath Parallel Transfer Mechanism in Distributed Environments[J].,2017,27(06):1.
[2]姚捃,郭志林.一种端到端的考场多目标行为识别算法[J].计算机技术与发展,2022,32(09):174.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 027]
　YAO Jun,GUO Zhi-lin.An End-to-end Multi-objective Behavior Recognition Algorithm for Examination Room[J].,2022,32(06):174.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 027]
[3]相紫涵,谷潇,饶崇郅,等.低资源青岛方言语音识别方法研究[J].计算机技术与发展,2024,34(04):146.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 022]
　XIANG Zi-han,GU Xiao,RAO Chong-zhi,et al.Research on Low-resource Qingdao Dialect Speech Recognition Method[J].,2024,34(06):146.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 022]

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

相似文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics