机械设备多模态声源分离方法研究-《计算机技术与发展》

文章信息/Info

Title:: Research on Multimodal Sound Source Separation Method for Mechanical Equipment

文章编号:: 1673-629X(2023)06-0208-07

作者:: 简斌1 ; 肖晓萍2* ; 李自胜1 ; 张楷3 ; 袁昊1; 1. 西南科技大学制造科学与工程学院,四川绵阳 621010;2. 西南科技大学工程技术中心,四川绵阳 621010;3. 西南交通大学机械工程学院,四川成都 610031

Author(s):: JIAN Bin1 ; XIAO Xiao-ping2* ; LI Zi-sheng1 ; ZHANG Kai3 ; YUAN Hao1; 1. School of Manufacturing Science and Engineering,Southwest University of Science and Technology,Mianyang 621010,China;
2. Engineering Technology Center,Southwest University of Science and Technology,Mianyang 621010 China;
3.?School of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China

关键词:: 机械设备; 多模态数据; 特征融合; 声源分离; 卷积神经网络

Keywords:: mechanical equipment; multimodal data; feature fusion; sound source separation; convolutional neural network

分类号:: TP391. 4

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 06. 031

摘要:: 针对单模态混合信号分离方法存在的无法确定机械设备与声源对应关系的问题,提出一种多模态特征融合的机械设备声源分离方法。首先,通过利用多组不同尺度的特征提取层,构建一种多尺度特征提取结构的 Res2Net18 网络,以提取机械设备细粒度视觉特征;再用坐标注意力机制模块替换 UNet 网络中直接跳跃连接,以增强编码器中不同音频特征的空间位置信息表达。其次,将机械设备视觉特征融入混合音频特征中生成对应声源掩码,再利用掩码与混合音频频谱结合得到独立声源频谱,从而实现根据视觉特征分离对应机械设备声源,该方法有效解决了单模态混合信号分离方法存在的无法确定机械设备与声源对应关系的问题。最后,在机械设备数据集上 SDR、SIR 和 SAR 分别达到 6. 14 dB、8. 59dB和 18. 33 dB,与现有三种多模态声源分离模型进行对比,所提多模态声源分离方法在 SDR 和 SAR 均取得最优结果,验证了多模态声源分离方法的有效性。

Abstract:: Aiming at the problem that the corresponding relationship between mechanical equipment and sound source cannot bedetermined in the single-modal mixed-signal separation method,a sound source separation method for mechanical equipment based onmulti-modal feature fusion is proposed. Firstly, by using multiple sets of feature extraction layers of different scales, a Res2Net18network with a multi-scale feature extraction structure is constructed to extract fine-grained visual features of mechanical equipment. Thespatial position information expression?
of different audio features in the encoder is enhanced. Secondly,the visual features of mechanicalequipment are integrated into the mixed audio features to generate a corresponding sound source mask,and then the independent soundsource spectrum is obtained by combining the mask and the mixed audio spectrum, so as to realize the visual feature separationcorresponds?
to the sound source of the mechanical equipment. The proposed method effectively solves the problem of the inability to determine the corresponding relationship between the mechanical equipment and the sound source in the single - mode mixed - signalseparation method. Finally,the SDR,SIR and SAR respectively reach 6. 14 dB, 8. 59 dB and 18. 33 dB on the mechanical equipmentdata set. Compared with the existing three multimodal sound source separation models,the proposed multimodal sound source separationmethod achieves the best results in both SDR and SAR,which verifies its effectiveness.

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

常用功能

导航/Navigate

工具/Tools

统计/Statistics