[1]简 斌,肖晓萍*,李自胜,等.机械设备多模态声源分离方法研究[J].计算机技术与发展,2023,33(06):208-214.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 031]
 JIAN Bin,XIAO Xiao-ping*,LI Zi-sheng,et al.Research on Multimodal Sound Source Separation Method for Mechanical Equipment[J].,2023,33(06):208-214.[doi:10. 3969 / j. issn. 1673-629X. 2023. 06. 031]
点击复制

机械设备多模态声源分离方法研究()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年06期
页码:
208-214
栏目:
新型计算应用系统
出版日期:
2023-06-10

文章信息/Info

Title:
Research on Multimodal Sound Source Separation Method for Mechanical Equipment
文章编号:
1673-629X(2023)06-0208-07
作者:
简 斌1 肖晓萍2* 李自胜1 张 楷3 袁 昊1
1. 西南科技大学 制造科学与工程学院,四川 绵阳 621010;2. 西南科技大学 工程技术中心,四川 绵阳 621010;3. 西南交通大学 机械工程学院,四川 成都 610031
Author(s):
JIAN Bin1 XIAO Xiao-ping2* LI Zi-sheng1 ZHANG Kai3 YUAN Hao1
1. School of Manufacturing Science and Engineering,Southwest University of Science and Technology,Mianyang 621010,China;
2. Engineering Technology Center,Southwest University of Science and Technology,Mianyang 621010 China;
3.?School of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China
关键词:
机械设备多模态数据特征融合声源分离卷积神经网络
Keywords:
mechanical equipmentmultimodal datafeature fusionsound source separationconvolutional neural network
分类号:
TP391. 4
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 06. 031
摘要:
针对单模态混合信号分离方法存在的无法确定机械设备与声源对应关系的问题,提出一种多模态特征融合的机械设备声源分离方法。 首先,通过利用多组不同尺度的特征提取层,构建一种多尺度特征提取结构的 Res2Net18 网络,以提取机械设备细粒度视觉特征;再用坐标注意力机制模块替换 UNet 网络中直接跳跃连接,以增强编码器中不同音频特征的空间位置信息表达。 其次,将机械设备视觉特征融入混合音频特征中生成对应声源掩码,再利用掩码与混合音频频谱结合得到独立声源频谱,从而实现根据视觉特征分离对应机械设备声源,该方法有效解决了单模态混合信号分离方法存在的无法确定机械设备与声源对应关系的问题。 最后,在机械设备数据集上 SDR、SIR 和 SAR 分别达到 6. 14 dB、8. 59dB和 18. 33 dB,与现有三种多模态声源分离模型进行对比,所提多模态声源分离方法在 SDR 和 SAR 均取得最优结果,验证了多模态声源分离方法的有效性。
Abstract:
Aiming at the problem that the corresponding relationship between mechanical equipment and sound source cannot bedetermined in the single-modal mixed-signal separation method,a sound source separation method for mechanical equipment based onmulti-modal feature fusion is proposed. Firstly, by using multiple sets of feature extraction layers of different scales, a Res2Net18network with a multi-scale feature extraction structure is constructed to extract fine-grained visual features of mechanical equipment. Thespatial position information expression?
of different audio features in the encoder is enhanced. Secondly,the visual features of mechanicalequipment are integrated into the mixed audio features to generate a corresponding sound source mask,and then the independent soundsource spectrum is obtained by combining the mask and the mixed audio spectrum, so as to realize the visual feature separationcorresponds?
to the sound source of the mechanical equipment. The proposed method effectively solves the problem of the inability to determine the corresponding relationship between the mechanical equipment and the sound source in the single - mode mixed - signalseparation method. Finally,the SDR,SIR and SAR respectively reach 6. 14 dB, 8. 59 dB and 18. 33 dB on the mechanical equipmentdata set. Compared with the existing three multimodal sound source separation models,the proposed multimodal sound source separationmethod achieves the best results in both SDR and SAR,which verifies its effectiveness.
更新日期/Last Update: 2023-06-10