基于人脸关键区域的深度伪造视频检测方法-《计算机技术与发展》

文章信息/Info

作者:: 董佳乐1; 邓正杰1; 2*; 张宝1; 李喜艳1; 1. 海南师范大学信息科学技术学院,海南海口 571127;2. 广西图像图形与智能处理重点实验室,广西南宁 541004

Author(s):: DONG Jia-le1; DENG Zheng-jie1; 2*; ZHANG Bao1; LI Xi-yan1; 1. School of Information Science and Technology,Hainan Normal University,Haikou 571127,China;2. Guangxi Key Laboratory of Image and Graphic Intelligent Processing,Nanning 541004,China

Keywords:: deepfake detection; EfficientNet_b0 network; facial key regions; image masking; channel attention mechanism

摘要:: 为了应对深度伪造技术对个人、社会和国家安全所带来的问题,该文提出一种基于人脸关键区域的深度伪造视频检测方法。该方法使用 EfficientNet_b0 作为主干网络模型,引入通道注意力模块,并使用掩码处理技术对非关键区域进行遮挡,使模型更专注于人脸关键区域,减少次要特征对模型判断的干扰,从而提高检测效果。在 FaceForensics++视频数据集上的实验显示,该方法在 DeepFake、FaceSwap、Face2Face、NeuralTextures 数据集上的准确率与 EfficientNet_b0 方法相比,分别提升了 2. 15 百分点、2. 79 百分点、1. 92 百分点、2. 51 百分点,同时保持参数量为较少水平,相较于现有主流方法也有明显提升。在 Celeb-DF 视频数据集上的实验结果显示,该方法的准确率达到了 99. 63% ,AUC 达到了 99. 99% 。此外,通过鲁棒性研究验证了该方法的稳健性,使用 gradcam 可视化进行可解释性分析,体现了该方法对于伪造区域的精准定位。综上所述,该模型能够有效地检测人脸伪造视频,具有高准确率。

Abstract:: In order to address the issues posed by deepfake technology to personal,social,and national security,we propose a deepfake video detection method based on facial key regions. This method utilizes EfficientNet_b0 as the backbone network model,introduces channel attention modules,and employs mask processing techniques to occlude non-key regions,enabling the model to focus more on facial key regions and reduce the interference of secondary features on model judgment,thus improving the detection performance. Exper-iments conducted on the FaceForensics++ video dataset demonstrate that compared to the EfficientNet_b0 method,the proposed approach achieves an accuracy improvement by 2. 15 percentage points,2. 79 percentage points,1. 92 percentage points and 2. 51 percentage points on the DeepFake,FaceSwap,Face2Face,and NeuralTextures datasets,respectively. Moreover,the proposed method has fewer parameters and exhibits significant improvements compared to existing mainstream methods. On the Celeb-DF video dataset,the proposed method a-chieves an accuracy of 99. 63% and an AUC of 99. 99% . In addition,the robustness of the proposed method is verified by the robustness study,and the use of gradcam visualization for interpretability analysis demonstrates the method’s accurate location of forged regions. In conclusion,the model proposed effectively detects deepfake videos with high accuracy.