[1]步寅硕,仁增多杰,格桑加措,等.基于端到端的藏汉语音翻译[J].计算机技术与发展,2025,(06):166-174.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0042]
 BU Yin-shuo,Renzeng Duojie,Kalzang Gyatso,et al.Tibetan-Chinese Speech-to-speech Translation Based on End-to-end[J].,2025,(06):166-174.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0042]
点击复制

基于端到端的藏汉语音翻译()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年06期
页码:
166-174
栏目:
人工智能
出版日期:
2025-06-10

文章信息/Info

Title:
Tibetan-Chinese Speech-to-speech Translation Based on End-to-end
文章编号:
1673-629X(2025)06-0166-09
作者:
步寅硕123仁增多杰123格桑加措123拉毛吉123尼玛扎西123
1. 西藏大学 信息科学技术学院,西藏 拉萨 850000;
2. 西藏自治区藏文信息技术人工智能重点实验室,西藏 拉萨 850000;
3. 藏文信息技术教育部工程研究中心,西藏 拉萨 850000
Author(s):
BU Yin-shuo123Renzeng Duojie123Kalzang Gyatso123Lhamao Kyi123Nyima Trashi123
1. School of Information Science and Technology,Tibet University,Lhasa 850000,China;
2. Key Laboratory of Tibetan Information Technology and Artificial Intelligence of Tibet,Lhasa 850000,China;
3. Engineering Research Center of Tibetan Information Technology,Ministry of Education,Lhasa 850000,China
关键词:
端到端语音翻译低资源藏语Hubert
Keywords:
end-to-endspeech-to-speech translationlow resourcesTibetanHubert
分类号:
TP391
DOI:
10.20165/j.cnki.ISSN1673-629X.2025.0042
摘要:
语音翻译在跨语言交流中具有重要意义,它能够消除语言障碍,实现不同语言间的即时沟通。 传统的级联式 S2ST系统存在错误复合和延迟较高的问题。 相比之下,端到端模型通过简化处理流程,有效减少延迟并提升翻译准确性。 目前,端到端语音翻译的研究主要集中在高资源语言对,而在低资源语言对,尤其是藏汉语音翻译领域缺少相关研究成果。针对该问题,该文提出一种端到端藏汉语音翻译方法。 该方法首先引入基于声学特征扰动增强方法的语音数据增强技术,解决藏汉语音翻译数据资源匮乏的问题。 其次,引入双边扰动技术对 Hubert 模型进行微调,通过风格归一化和信息增强阶段减少声学多模态对翻译的影响。 再次,引入 S2UT(Speech-to-units)模型实现源语言语音到目标语言离散单元的转换,以解决 Mel-spectrogram 映射中存在的语言内容与声学特征混淆的问题。 最后,在模型中加入目标语言的语音识别辅助任务,通过联合解码提高语音翻译性能。 实验结果显示在藏语-汉语语音翻译任务中,BLEU 分数相比基线模型提升了 12. 61,验证了该模型在低资源多模态语言对下的有效性。
Abstract:
Speech-to-speech translation is of great significance in cross-language communication. It can eliminate language barriers and achieve instant communication between different languages. The traditional cascade S2ST system has the problems of error compounding and high latency. In contrast,the end-to-end model effectively reduces latency and improves translation accuracy by simplifying the pro-cessing flow. At present,the research on end-to-end speech-to-speech translation mainly focuses on high-resource language pairs,while there is a lack of relevant research results in low-resource language pairs,especially in the field of Tibetan-Chinese speech-to-speech translation. Therefore,we propose an end-to-end Tibetan-Chinese speech-to-speech translation method. Firstly,the speech data enhancement technology based on the acoustic feature perturbation enhancement method is introduced to solve the problem of scarce data resources for Tibetan-Chinese speech-to-speech translation. Secondly,the bilateral perturbation technology is introduced to fine-tune the Hubert model,and the influence of acoustic multimodality on translation is reduced through style normalization and information en-hancement stages. Thirdly,the Speech-to-units ( S2UT) model is introduced to realize the conversion of source language speech to target language discrete units to solve the problem of confusion between language content and acoustic features in Mel - spectrogram mapping. Finally,the speech recognition auxiliary task of the target language is added to the model to improve the speech-to-speech translation performance through joint decoding. The experimental results show that in the Tibetan-Chinese speech-to-speech translation task,the BLEU score is improved by 12. 61 compared with the baseline model. The results demonstrate the effectiveness of the proposed model in low-resource multimodal language pairs.

相似文献/References:

[1]刘立明. 分布式环境下端到端的多路并行传输机制研究[J].计算机技术与发展,2017,27(06):1.
 LIU Li-ming. Research on End-to-end Multipath Parallel Transfer Mechanism in Distributed Environments[J].,2017,27(06):1.
[2]姚 捃,郭志林.一种端到端的考场多目标行为识别算法[J].计算机技术与发展,2022,32(09):174.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 027]
 YAO Jun,GUO Zhi-lin.An End-to-end Multi-objective Behavior Recognition Algorithm for Examination Room[J].,2022,32(06):174.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 027]
[3]相紫涵,谷 潇,饶崇郅,等.低资源青岛方言语音识别方法研究[J].计算机技术与发展,2024,34(04):146.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 022]
 XIANG Zi-han,GU Xiao,RAO Chong-zhi,et al.Research on Low-resource Qingdao Dialect Speech Recognition Method[J].,2024,34(06):146.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 022]

更新日期/Last Update: 2025-06-10