«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2022. 11. 027]
点击复制

基于增强特征和注意力机制的视频表情识别()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 32
期数:: 2022年11期

页码:: 183-189

栏目:: 人工智能

出版日期:: 2022-11-10

文章信息/Info

Title:: Video Facial Expression Recognition Based on ECNN-SA

文章编号:: 1673-629X(2022)11-0183-07

作者:: 李飞¹ ; 陈瑞² ; 童莹² ; 陈乐³; 1. 南京工程学院电力工程学院,江苏南京 211167;
2. 南京工程学院信息与通信工程学院,江苏南京 211167;
3. 南京邮电大学通信与信息工程学院,江苏南京 210003

Author(s):: LI Fei1 ; CHEN Rui2 ; TONG Ying2 ; CHEN Le3; 1. School of Electric Power Engineering,Nanjing Institute of Technology,Nanjing 211167,China;
2. School of Information and Communication Engineering,Nanjing Institute of Technology,Nanjing 211167,China;
3.?School of Telecommunications and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

关键词:: 人脸表情识别; 视频序列; 自注意力机制; 增强特征; 卷积神经网络

Keywords:: facial expression recognition; video sequence; self-attention mechanism; enhanced feature; convolutional neural network

分类号:: TP391

DOI:: 10. 3969 / j. issn. 1673-629X. 2022. 11. 027

摘要:: 端到端的 CNN-LSTM 模型利用卷积神经网络( Convolutional Neural Network,CNN)提取图像的空间特征,利用长短期记忆网络 LSTM 提取视频帧间的时间特征,在视频表情识别中得到了广泛的应用。但在学习视频帧的分层表示时,CNN-LSTM 模型复杂度较高,且易发生过拟合。针对这些问题,提出一个高效、低复杂度的视频表情识别模型 ECNN-SA(Enhanced Convolutional Neural Network with Self-Attention) 。首先,将视频分成若干视频段,采用带增强特征分支的卷积神经网络和全局平均池化层提取视频段中每帧图像的特征向量。其次,利用自注意力( Self-Attention) 机制获得特征向量间的相关性,根据相关性构建权值向量,主要关注视频段中的表情变化关键帧,引导分类器给出更准确的分类结果。最终,该模型在 CK+和 AFEW 数据集上的实验结果表明,自注意力模块使得模型主要关注时间序列中表情变化的关键帧,相比于单层和多层的 LSTM 网络,ECNN-SA 模型能更有效地对视频序列的情感信息进行分类识别。

Abstract:: The end - to - end CNN - LSTM model uses the convolutional neural network ( CNN) to extract the spatial features of theimage,and uses the long-term and short-term memory ( LSTM) network to extract the temporal features between video frames. It hasbeen widely used in video expression recognition. However,when learning the hierarchical representation of video frames,the CNN -LSTM model is complicated and prone to over fitting. Aiming at these problems,an efficient video expression recognition model withlow complexity named ECNN - SA? ? ? ? ? ?( Enhanced Convolutional Neural Network with Self - Attention) is proposed. Firstly, a video isdivided into several video segments. The feature vector of each frame in one video segment? ? ?is extracted by an enhanced CNN with globalaverage pooling layer. Secondly,the self-attention mechanism is used to obtain the correlation between feature vectors,and the weightvector? ? ? is constructed according to the correlation. The self-attention module with low computational complexity is used to focus on theframes of interest,which is greatly related to expression classification. The experimental results on CK+ and AFEW datasets show that theself-attention module makes the model mainly focus on the key frames of expression changes in the time series. Compared with thesingle-layer and multi- layer LSTM networks,the ECNN - SA model can classify and recognize the emotion information of the videosequence more effectively.

相似文献/References:

[1]潘峥嵘,贺秀伟.人脸表情识别在智能机器人中的应用研究[J].计算机技术与发展,2018,28(02):173.[doi:10．3969/j．issn．1673－629X．2018．02．037]
　PAN Zheng-rong,HE Xiu-wei.Research on Application of Facial Expression Ｒecognition in Intelligent Ｒobot[J].,2018,28(11):173.[doi:10．3969/j．issn．1673－629X．2018．02．037]
[2]崔凤焦.表情识别算法研究进展与性能比较[J].计算机技术与发展,2018,28(02):145.[doi:10．3969/j．issn．1673－629X．2018．02．031]
　CUI Feng-jiao.Ｒesearch and Performance Comparison of Facial Expression Ｒecognition Algorithm[J].,2018,28(11):145.[doi:10．3969/j．issn．1673－629X．2018．02．031]
[3]张雪梅,公维宾,邬建志,等.基于纹理特征融合的人脸表情识别[J].计算机技术与发展,2020,30(03):57.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 011]
　ZHANG Xue-mei,GONG Wei-bin,WU Jian-zhi,et al.Facial Expression Recognition Based on Texture Feature Fusion[J].,2020,30(11):57.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 011]
[4]付倩倩,李昂.一种改进的卷积神经网络的表情识别算法[J].计算机技术与发展,2020,30(11):80.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 015]
　FU Qian-qian,LI Ang.An Improved Facial Expression Recognition Technology Based on Convolutional Neural Network[J].,2020,30(11):80.[doi:10. 3969 / j. issn. 1673-629X. 2020. 11. 015]
[5]王珏,潘沛生.基于超分辨率重建的低分辨率表情识别的研究[J].计算机技术与发展,2021,31(07):47.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 008]
　WANG Jue,PAN Pei-sheng.Research on Low-resolution Facial Expression Recognition Based on Super-resolution Reconstruction[J].,2021,31(11):47.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 008]
[6]植炜基,刘春雨,郑婉君,等.基于生成对抗网络的人脸表情识别技术综述[J].计算机技术与发展,2021,31(增刊):1.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 001]
　ZHI Wei-ji,LIU Chun-yu,ZHENG Wan-jun,et al.Survey of Facial Expression Recognition Technology Based onGenerative Adversarial Network[J].,2021,31(11):1.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 001]
[7]吕鹏,单剑锋.基于多特征融合的人脸表情识别算法[J].计算机技术与发展,2022,32(10):151.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 025]
　LYU Peng,SHAN Jian-feng.Facial Expression Recognition Algorithm Based on Multi-feature Fusion[J].,2022,32(11):151.[doi:10. 3969 / j. issn. 1673-629X. 2022. 10. 025]
[8]叶耀光,陈宗楠,陈丽群,等.基于通道注意的可变形金字塔表情识别网络[J].计算机技术与发展,2022,32(11):64.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 010]
　YE Yao-guang,CHEN Zong-nan,CHEN Li-qun,et al.Channel-attention-based Deformable Pyramid Network for Facial Expression Recognition[J].,2022,32(11):64.[doi:10. 3969 / j. issn. 1673-629X. 2022. 11. 010]
[9]王彬,徐杨*,石进,等.多分支精简双线性池化的人脸表情识别[J].计算机技术与发展,2023,33(03):27.[doi:10. 3969 / j. issn. 1673-629X. 2023. 03. 005]
　WANG Bin,XU Yang*,SHI Jin,et al.Multi-branch Compact Bilinear Pooling for Facial Expression Recognition[J].,2023,33(11):27.[doi:10. 3969 / j. issn. 1673-629X. 2023. 03. 005]
[10]张栋昱,赵磊.融合注意力机制改进 ResNet 的人脸表情识别[J].计算机技术与发展,2023,33(05):130.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 020]
　ZHANG Dong-yu,ZHAO Lei.Improved Facial Expression Recognition in ResNet by Integrating Attention Mechanism[J].,2023,33(11):130.[doi:10. 3969 / j. issn. 1673-629X. 2023. 05. 020]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed816
全文下载/Downloads319
评论/Comments