[1]卢慧君,蔡敦波,黄智国,等.基于深度视听模型的鸡尾酒会问题研究现状与展望[J].计算机技术与发展,2021,31(增刊):8-15.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 002]
 LU Hui-jun,CAI Dun-bo,HUANG Zhi-guo,et al.Research State and Frontiers of Cocktail Party Problem Based on Deep Audio-visual Models[J].,2021,31(增刊):8-15.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 002]
点击复制

基于深度视听模型的鸡尾酒会问题研究现状与展望()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
31
期数:
2021年增刊
页码:
8-15
栏目:
人工智能
出版日期:
2021-12-31

文章信息/Info

Title:
Research State and Frontiers of Cocktail Party Problem Based on Deep Audio-visual Models
文章编号:
1673-629X(2021)S0008-08
作者:
卢慧君蔡敦波黄智国杭 涛冯清松钱 岭
中移(苏州)软件技术有限公司,江苏 苏州 215000
Author(s):
LU Hui-junCAI Dun-boHUANG Zhi-guoHANG TaoFENG Qing-songQIAN Ling
China Mobile ( Suzhou) Software Technology Co. ,Ltd,Suzhou 215000,China
关键词:
鸡尾酒会问题多说话人语音分离深度学习深度视听方法视听数据集
Keywords:
cocktail party problemmulti-speaker speech separationdeep learningdeep visual-audio modelvisual-audio datasets
分类号:
TP183
DOI:
10. 3969 / j. issn. 1673-629X. 2021. S. 002
摘要:
“ 鸡尾酒会问题” 目前依然是语音处理领域很有挑战的一个问题,该问题的核心是多说话人语音分离。 目前对于以上问题的研究取得了较大的进展,但缺少一个系统,简洁的分析和总结。 文章围绕“ 鸡尾酒会问题” 的解决方案,总结了语音处理领域多说话人语音分离方法的发展:(1) 分析了经典的语音分离方法,包括谱减法、维纳滤波、计算听觉场景分析等;(2) 分析了引入深度学习思想后出现的语音分离方法,包括初期的深度音频的方法和其后出现的深度视觉听觉的方法,重点评述了基于深度学习的视觉听觉方法的主要算法思想和效果方面的新进展;(3)总结了目前深度视听方法中常用视听数据集的特点。 文末对深度视听模型解决鸡尾酒会问题的现状以及当前存在的挑战进行了评述,并展望未来的研究方向。
Abstract:
The " cocktail party problem" is still a very challenging problem in the field of speech processing. The core of the problem is the separation of multi-speaker speech. At present,the research on the above issues has made great progress,but it lacks a systematic,concise analysis and summary. The solutions of the " cocktail party problem" are focused on and the development of multi - speaker speech separation methods in the field of speech processing is summarized. Firstly, the classic speech separation methods are analyzed briefly, including spectral subtraction, Wiener filtering, and computational auditory scene analysis. Secondly, the deep learning based speech separation methods are analyzed in-depth,including the auditory methods and deep audio-visual methods,and particularly reviews the new development of deep audio-visual models. Thirdly,the commonly used audio-visual datasets are reviewed. At the end,deepaudio-visual models to solve the cocktail party problem and current challenges are reviewed,and the future directions of research are dis鄄cussed.
更新日期/Last Update: 2021-09-10