«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2021. 08. 005]
点击复制

基于 WGAN 的音频关键词识别研究()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 31
期数:: 2021年08期

页码:: 26-32

栏目:: 大数据分析与挖掘

出版日期:: 2021-08-10

文章信息/Info

Title:: Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network

文章编号:: 1673-629X(2021)08-0026-07

作者:: 李全兵¹; 2; 3; 文钊^4*; 田艳梅^4*; 詹茂豪¹; 余秦勇²; 3; 杨辉²; 3; 1. 中国电子科技网络信息安全有限公司,四川成都 610041;
2. 提升政府治理能力大数据应用技术国家工程实验室,贵州贵阳 550022;
3. 中电科大数据研究院有限公司,贵州贵阳 550022;
4. 电子科技大学信息与软件工程学院,四川成都 610054

Author(s):: LI Quan-bing¹; 2; 3; WEN Zhao^{4 *}; TIAN Yan-mei^{4 *}; ZHAN Mao-hao¹; YU Qin-yong²; 3; YANG Hui²; 3; 1. China Electronic Technology Cyber Security Co. ,Ltd. ,Chengdu 610041,China;
2. Big Data Application on Improving Government Governance Capabilities National Engineering Laboratory,Guiyang 550022,China;
3. CETC Big Data Research Institute Co. ,Ltd. ,Guiyang 550022,China;
4. School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054,China

关键词:: 语音识别; 音频关键词识别; 深度学习; Wasserstein 生成式对抗网络; 关键词定位

Keywords:: speech recognition; audio spoken keyword detection; deep learning; Wasserstein generative adversarial network (WGAN); keyword targeting

分类号:: TP183

DOI:: 10. 3969 / j. issn. 1673-629X. 2021. 08. 005

摘要:: 基于语音识别的关键词识别方法增大了关键词识别工作量,降低了识别效率,还使得识别准确率受语音识别和文字查找办法影响,并对无文字语言不适用。针对此问题,提出将 Wasserstein 生成式对抗网络(WGAN)应用于语音关键词识别中,利用生成器输出的生成序列分析语音中有无关键词。为了获取语音中关键词的位置信息,该文为 WGAN 网络定义了一个定位损失函数,以此保证生成的掩码序列可以精确定位出关键词的位置。在四川话、普通话和粤语三门语言的数据集上进行实验,结果表明该技术可以识别无文字语言的关键词,相比于模板匹配方法其识别速度有显著提升。

Abstract:: The keyword recognition method based on speech recognition increases the workload of keyword recognition, reduces the recognition efficiency and makes the accuracy affected by speech recognition and text search methods,which is not applicable to language without words. To solve this problem, the Wasserstein generative adversarial network (WGAN) is applied to speech key word recognition,and the generated sequence output by generator is used to analyze whether there are keywords in speech. In order to obtain the position information of the keywords in speech,we define a positioning loss function for the WGAN to ensure that the generated mask sequence can accurately locate the position of the keywords. Results on datasets of three languages, Sichuan dialect, Mandarin and Cantonese, show that the proposed method can recognize keywords in languages without characters, and the recognition speed is significantly improved compared with the template matching method.

相似文献/References:

[1]宋鑫坤陈万米朱明桂春胜程硕远陈海波.基于正则表达式的语音识别控制策略研究[J].计算机技术与发展,2010,(02):106.
　SONG Xin-kun,CHEN Wan-mi,ZHU Ming,et al.Study on Speech Recognition Control Strategy Based on Regular Expression[J].,2010,(08):106.
[2]石现峰张学智张峰.基于HTK的语音识别系统设计[J].计算机技术与发展,2006,(10):37.
　SHI Xian-feng,ZHANG Xue-zhi,ZHANG Feng.Design of Speech Recognition System Based on HTK[J].,2006,(08):37.
[3]朱宇宋艳.嵌入式语音识别系统特征参数提取研究[J].计算机技术与发展,2011,(07):246.
　ZHU Yu,SONG Yan.Research of Characteristic Parameters Extraction Based on Embedded Speech Recognition System[J].,2011,(08):246.
[4]林鸣霄.基于SpeechSDK的语音识别技术在三维仿真中的应用[J].计算机技术与发展,2011,(11):160.
　LIN Ming-xiao.Application of Speech Recognition Technology in 3D Simulation Based on Speech SDK[J].,2011,(08):160.
[5]李克粉,王直.改进的小波阈值去噪在语音识别中的应用[J].计算机技术与发展,2013,(05):231.
　LI Ke-fen,WANG Zhi.Application of Improved Wavelet Threshold Denoising in Speech Recognition[J].,2013,(08):231.
[6]王海洋,郭星. 基于语音识别的智慧旅游系统研究[J].计算机技术与发展,2015,25(05):143.
　WANG Hai-yang,GUO Xing. Study on Smart Tourism System Based on Voice Recognition[J].,2015,25(08):143.
[7]孙科学[] [],洪櫆[],章康宁[],等. 一种联合检测门禁系统的设计与实现[J].计算机技术与发展,2016,26(01):155.
　SUN Ke-xue[][],HONG Kui[],ZHANG Kang-ning[],et al. Design and Implementation of Joint Detection Access Control System[J].,2016,26(08):155.
[8]韩志艳,王健. 基于共振峰曲线的语音信号动态特征提取方法[J].计算机技术与发展,2017,27(06):72.
　HAN Zhi-yan,WANG Jian. Dynamic Feature Extraction for Speech Signal Based on Formant Curve[J].,2017,27(08):72.
[9]伍静,刘德丰,张松,等.智能摔倒检测监控系统设计[J].计算机技术与发展,2018,28(04):6.[doi:10.3969/ j. issn.1673-629X.2018.04.002]
　WU Jing,LIU De-feng,ZHANG Song,et al.Design of an Intelligent Monitoring System for Tumble Detection[J].,2018,28(08):6.[doi:10.3969/ j. issn.1673-629X.2018.04.002]
[10]周炳良,邓立新,洪民江. 一种新的基于 DTW 的孤立词语音识别算法[J].计算机技术与发展,2018,28(04):119.[doi:10.3969/ j. issn.1673-629X.2018.04.025]
　ZHOU Bing-liang,DENG Li-xin,HONG Min-jiang.A Novel Isolated Word Algorithm of Speech Recognition Based on DTW[J].,2018,28(08):119.[doi:10.3969/ j. issn.1673-629X.2018.04.025]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1235
全文下载/Downloads734
评论/Comments