[1]李全兵,文 钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26-32.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
 LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(08):26-32.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
点击复制

基于 WGAN 的音频关键词识别研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年08期
页码:
26-32
栏目:
大数据分析与挖掘
出版日期:
2021-08-10

文章信息/Info

Title:
Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network
文章编号:
1673-629X(2021)08-0026-07
作者:
李全兵123文 钊4*田艳梅4*詹茂豪1余秦勇23杨 辉23
1. 中国电子科技网络信息安全有限公司,四川 成都 610041;
2. 提升政府治理能力大数据应用技术国家工程实验室,贵州 贵阳 550022;
3. 中电科大数据研究院有限公司,贵州 贵阳 550022;
4. 电子科技大学 信息与软件工程学院,四川 成都 610054
Author(s):
LI Quan-bing123WEN Zhao4 *TIAN Yan-mei4 *ZHAN Mao-hao1YU Qin-yong23YANG Hui23
1. China Electronic Technology Cyber Security Co. ,Ltd. ,Chengdu 610041,China;
2. Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory,Guiyang 550022,China;
3. CETC Big Data Research Institute Co. ,Ltd. ,Guiyang 550022,China;
4. School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054,China
关键词:
语音识别音频关键词识别深度学习Wasserstein 生成式对抗网络关键词定位
Keywords:
speech recognitionaudio spoken keyword detectiondeep learningWasserstein generative adversarial network (WGAN) keyword targeting
分类号:
TP183
DOI:
10. 3969 / j. issn. 1673-629X. 2021. 08. 005
摘要:
基于语音识别的关键词识别方法增大了关键词识别工作量,降低了识别效率,还使得识别准确率受语音识别和文字查找办法影响,并对无文字语言不适用。 针对此问题,提出将 Wasserstein 生成式对抗网络(WGAN)应用于语音关键词识别中,利用生成器输出的生成序列分析语音中有无关键词。 为了获取语音中关键词的位置信息,该文为 WGAN 网络定义了一个定位损失函数,以此保证生成的掩码序列可以精确定位出关键词的位置。 在四川话、普通话和粤语三门语言的数据集上进行实验,结果表明该技术可以识别无文字语言的关键词,相比于模板匹配方法其识别速度有显著提升。
Abstract:
The keyword recognition method based on speech recognition increases the workload of keyword recognition, reduces the recognition efficiency and makes the accuracy affected by speech recognition and text search methods,which is not applicable to language without words. To solve this problem, the Wasserstein generative adversarial network (WGAN) is applied to speech key word recognition,and the generated sequence output by generator is used to analyze whether there are keywords in speech. In order to obtain the position information of the keywords in speech,we define a positioning loss function for the WGAN to ensure that the generated mask sequence can accurately locate the position of the keywords. Results on datasets of three languages, Sichuan dialect, Mandarin and Cantonese, show that the proposed method can recognize keywords in languages without characters, and the recognition speed is significantly improved compared with the template matching method.

相似文献/References:

[1]宋鑫坤 陈万米 朱明 桂春胜 程硕远 陈海波.基于正则表达式的语音识别控制策略研究[J].计算机技术与发展,2010,(02):106.
 SONG Xin-kun,CHEN Wan-mi,ZHU Ming,et al.Study on Speech Recognition Control Strategy Based on Regular Expression[J].,2010,(08):106.
[2]石现峰 张学智 张峰.基于HTK的语音识别系统设计[J].计算机技术与发展,2006,(10):37.
 SHI Xian-feng,ZHANG Xue-zhi,ZHANG Feng.Design of Speech Recognition System Based on HTK[J].,2006,(08):37.
[3]朱宇 宋艳.嵌入式语音识别系统特征参数提取研究[J].计算机技术与发展,2011,(07):246.
 ZHU Yu,SONG Yan.Research of Characteristic Parameters Extraction Based on Embedded Speech Recognition System[J].,2011,(08):246.
[4]林鸣霄.基于SpeechSDK的语音识别技术在三维仿真中的应用[J].计算机技术与发展,2011,(11):160.
 LIN Ming-xiao.Application of Speech Recognition Technology in 3D Simulation Based on Speech SDK[J].,2011,(08):160.
[5]李克粉,王直.改进的小波阈值去噪在语音识别中的应用[J].计算机技术与发展,2013,(05):231.
 LI Ke-fen,WANG Zhi.Application of Improved Wavelet Threshold Denoising in Speech Recognition[J].,2013,(08):231.
[6]王海洋,郭星. 基于语音识别的智慧旅游系统研究[J].计算机技术与发展,2015,25(05):143.
 WANG Hai-yang,GUO Xing. Study on Smart Tourism System Based on Voice Recognition[J].,2015,25(08):143.
[7]孙科学[] [],洪櫆[],章康宁[],等. 一种联合检测门禁系统的设计与实现[J].计算机技术与发展,2016,26(01):155.
 SUN Ke-xue[][],HONG Kui[],ZHANG Kang-ning[],et al. Design and Implementation of Joint Detection Access Control System[J].,2016,26(08):155.
[8]韩志艳,王健. 基于共振峰曲线的语音信号动态特征提取方法[J].计算机技术与发展,2017,27(06):72.
 HAN Zhi-yan,WANG Jian. Dynamic Feature Extraction for Speech Signal Based on Formant Curve[J].,2017,27(08):72.
[9]伍静,刘德丰,张松,等.智能摔倒检测监控系统设计[J].计算机技术与发展,2018,28(04):6.[doi:10.3969/ j. issn.1673-629X.2018.04.002]
 WU Jing,LIU De-feng,ZHANG Song,et al.Design of an Intelligent Monitoring System for Tumble Detection[J].,2018,28(08):6.[doi:10.3969/ j. issn.1673-629X.2018.04.002]
[10]周炳良,邓立新,洪民江. 一种新的基于 DTW 的孤立词语音识别算法[J].计算机技术与发展,2018,28(04):119.[doi:10.3969/ j. issn.1673-629X.2018.04.025]
 ZHOU Bing-liang,DENG Li-xin,HONG Min-jiang.A Novel Isolated Word Algorithm of Speech Recognition Based on DTW[J].,2018,28(08):119.[doi:10.3969/ j. issn.1673-629X.2018.04.025]

更新日期/Last Update: 2021-08-10