[1]相紫涵,谷 潇,饶崇郅,等.低资源青岛方言语音识别方法研究[J].计算机技术与发展,2024,34(04):146-152.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 022]
 XIANG Zi-han,GU Xiao,RAO Chong-zhi,et al.Research on Low-resource Qingdao Dialect Speech Recognition Method[J].,2024,34(04):146-152.[doi:10. 3969 / j. issn. 1673-629X. 2024. 04. 022]
点击复制

低资源青岛方言语音识别方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年04期
页码:
146-152
栏目:
人工智能
出版日期:
2024-04-10

文章信息/Info

Title:
Research on Low-resource Qingdao Dialect Speech Recognition Method
文章编号:
1673-629X(2024)04-0146-07
作者:
相紫涵谷 潇饶崇郅渐 令
中国石油大学(华东) 经济管理学院,山东 青岛 266580
Author(s):
XIANG Zi-hanGU XiaoRAO Chong-zhiJIAN Ling
School of Economics and Management,China University of Petroleum ( East China) ,Qingdao 266580,China
关键词:
语音识别端到端低资源数据增强青岛方言
Keywords:
speech recognitionend-to-endlow resourcedata augmentationQingdao dialect
分类号:
TP391. 42;TN912. 34
DOI:
10. 3969 / j. issn. 1673-629X. 2024. 04. 022
摘要:
方言识别是语音识别的重要研究方向,常见的语音识别系统是基于标准语言训练的,导致其方言识别效果不佳。鉴于此,该文选择青岛方言作为应用案例开展方言语音识别研究。 为解决方言语料匮乏、训练深度网络模型困难导致识别准确率受限等问题,提出应用数据增强方法,搭建基于改进 Conformer 的方言语音识别模型。 首先,收集多源语音数据构建方言小型语料库;其次,采用数据增强技术扩充训练数据,以解决语料匮乏问题;最后,为了更好地提取信息,改进Conformer 模型的降采样结构,引入膨胀卷积和 Mish 激活函数,实现语音到文本的直接映射。 实验结果表明,提出的改进降采样模块的端到端模型结合数据增强方法后字错率可达 25. 96% ,能有效实现低资源条件下的方言识别。
Abstract:
Dialect recognition is an important research direction in automatic speech recognition. Common speech recognition systems arebased on standard language training,which results in poor performance in dialect recognition. In view of this,we choose Qingdao dialectas an application case for dialect speech recognition research. In order to solve?
the problems of lack of dialect corpus and difficulty intraining deep network model,which lead to limited recognition accuracy, we propose to apply data augmentation method?
and build adialect speech recognition model based on improved Conformer. Firstly,multi-source speech data is collected to construct a small-scaledialect corpus. Secondly,
data augmentation techniques are applied to expand the training data to address the problem of data scarcity. Finally,in order to better extract information,the down -sampling structure of the Conformer model is improved,and dilated convolutionand Mish activation function are introduced to realize the direct mapping from speech to text. Experimental results show that the charactererror rate of the end-to-end model with improved down-sampling module combined with data augmentation method can reach 25. 96% ,which can effectively realize dialect recognition under low resource conditions.

相似文献/References:

[1]宋鑫坤 陈万米 朱明 桂春胜 程硕远 陈海波.基于正则表达式的语音识别控制策略研究[J].计算机技术与发展,2010,(02):106.
 SONG Xin-kun,CHEN Wan-mi,ZHU Ming,et al.Study on Speech Recognition Control Strategy Based on Regular Expression[J].,2010,(04):106.
[2]石现峰 张学智 张峰.基于HTK的语音识别系统设计[J].计算机技术与发展,2006,(10):37.
 SHI Xian-feng,ZHANG Xue-zhi,ZHANG Feng.Design of Speech Recognition System Based on HTK[J].,2006,(04):37.
[3]朱宇 宋艳.嵌入式语音识别系统特征参数提取研究[J].计算机技术与发展,2011,(07):246.
 ZHU Yu,SONG Yan.Research of Characteristic Parameters Extraction Based on Embedded Speech Recognition System[J].,2011,(04):246.
[4]林鸣霄.基于SpeechSDK的语音识别技术在三维仿真中的应用[J].计算机技术与发展,2011,(11):160.
 LIN Ming-xiao.Application of Speech Recognition Technology in 3D Simulation Based on Speech SDK[J].,2011,(04):160.
[5]李克粉,王直.改进的小波阈值去噪在语音识别中的应用[J].计算机技术与发展,2013,(05):231.
 LI Ke-fen,WANG Zhi.Application of Improved Wavelet Threshold Denoising in Speech Recognition[J].,2013,(04):231.
[6]王海洋,郭星. 基于语音识别的智慧旅游系统研究[J].计算机技术与发展,2015,25(05):143.
 WANG Hai-yang,GUO Xing. Study on Smart Tourism System Based on Voice Recognition[J].,2015,25(04):143.
[7]孙科学[] [],洪櫆[],章康宁[],等. 一种联合检测门禁系统的设计与实现[J].计算机技术与发展,2016,26(01):155.
 SUN Ke-xue[][],HONG Kui[],ZHANG Kang-ning[],et al. Design and Implementation of Joint Detection Access Control System[J].,2016,26(04):155.
[8]刘立明. 分布式环境下端到端的多路并行传输机制研究[J].计算机技术与发展,2017,27(06):1.
 LIU Li-ming. Research on End-to-end Multipath Parallel Transfer Mechanism in Distributed Environments[J].,2017,27(04):1.
[9]韩志艳,王健. 基于共振峰曲线的语音信号动态特征提取方法[J].计算机技术与发展,2017,27(06):72.
 HAN Zhi-yan,WANG Jian. Dynamic Feature Extraction for Speech Signal Based on Formant Curve[J].,2017,27(04):72.
[10]李全兵,文 钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
 LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(04):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]

更新日期/Last Update: 2024-04-10