«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2023. 07. 001]
点击复制

基于音视频信息的深度多模态抑郁症识别综述()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 33
期数:: 2023年07期

页码:: 1-11

栏目:: 综述

出版日期:: 2023-07-10

文章信息/Info

Title:: A Survey of Deep Multimodal Depression Recognition Based on Audio-visual Cues

文章编号:: 1673-629X(2023)07-0001-11

作者:: 张石清¹; 2 ; 张星楠¹; 2 ; 赵小明²; 1. 浙江理工大学信息学院,浙江杭州 310023;
2. 台州学院智能信息处理研究所,浙江台州 318000

Author(s):: ZHANG Shi-qing1; 2 ; ZHANG Xing-nan1; 2 ; ZHAO Xiao-ming2; 1. School of Information,Zhejiang Sci-Tech University,Hangzhou 310023,China;
2. Institute of Intelligent Information Processing,Taizhou University,Taizhou 318000,China

关键词:: 抑郁症; 深度学习; 音频; 视频; 特征提取; 多模态; 融合方法

Keywords:: depression; deep learning; audio; video; feature extraction; multimodality; fusion method

分类号:: TP301

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 07. 001

摘要:: 抑郁症是一种精神疾病,严重时会导致自杀行为的发生。当前抑郁症患者人数正变得越来越多,越来越普遍化、年轻化。采用机器学习方法开展面向音频、视频等模态信息的多模态抑郁症识别研究已成为一个计算机科学、心理学、医学等多学科交叉的热点课题。近年来,新发展起来的深度学习技术也逐渐被应用于面向音频、视频等模态信息的多模态抑郁症识别中的深度特征提取任务。为了系统总结和归纳近年来深度学习技术在多模态抑郁症识别领域的研究进展,首先介绍了抑郁症的临床表现及心理学诊断方法,随后简要总结了现有的抑郁症数据集,并阐述了代表性深度学习技术的基本原理及进展情况;然后,系统分析和总结了面向音频、视频的多模态抑郁症识别涉及到的关键技术,包括手工特征提取和深度特征提取,以及多模态信息融合策略;最后,指出了该领域存在的机遇与挑战,并对下一步的研究方向进行了总结与展望。

Abstract:: Depression is a mental illness that can lead to suicidal behavior in severe cases. At present,depression is becoming larger,morecommon and younger. The use of machine learning methods to carry out multimodal depression recognition research oriented to thefusion of audio,video and other modal information has become a hot topic in computer science,　psychology,medicine and other interdisciplinary subjects. In recent years,some newly deep learning techniques have also been gradually applied to the deep feature extraction taskin multimodal depression recognition integrating audio, video and other modal information. In order to systematically summarize andconclude the research progress of deep learning technology in the field of multimodal depression recognition in recent years,we firstly introduce the clinical manifestations and psychological diagnosis methods of depression,and then briefly summarize the existing depressiondatasets, and analyze the basic principles and progress of representative deep learning techniques. Then,we systematically analyze andsummarize the key technologies involved in multimodal depression recognition fusion audio and video,including manual feature extractionand deep feature extraction,as well as multimodal information fusion strategies. Finally,the opportunities and challenges in this field arepointed out,and the next research direction is summarized and prospected.

相似文献/References:

[1]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
　CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(07):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[2]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
　SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(07):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
[3]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
　HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(07):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[4]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
　CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(07):19.
[5]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
　GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(07):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[6]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
　HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(07):1.
[7]徐融,邱晓晖.一种改进的 YOLO V3 目标检测方法[J].计算机技术与发展,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
　XU Rong,QIU Xiao-hui.An Improved YOLO V3 Object Detection[J].,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
[8]曾志平[] [],萧海东[],张新鹏[]. 基于DBN的金融时序数据建模与决策[J].计算机技术与发展,2017,27(04):1.
　ZENG Zhi-ping[] [],XIAO Hai-dong[],ZHANG Xin-peng[]. Modeling and Decision-making of Financial Time Series Data with DBN[J].,2017,27(07):1.
[9]李全兵,文钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
　LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(07):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
[10]李宏林. 分析式纹理合成技术及其在深度学习的应用[J].计算机技术与发展,2017,27(11):7.
　LI Hong-lin. Analyzed Texture-synthesis Techniques and Their Applications in Deep Learning[J].,2017,27(07):7.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1332
全文下载/Downloads936
评论/Comments