[1]李石峰*,罗晰,刘晓茹,等.基于Transformer架构的端到端视频异常检测方法[J].计算机技术与发展,2025,(06):49-55.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0018]
 LI Shi-feng*,LUO Xi,LIU Xiao-ru,et al.An End-to-end Video Anomaly Detection Method Based on Transformer Architecture[J].,2025,(06):49-55.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0018]
点击复制

基于Transformer架构的端到端视频异常检测方法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年06期
页码:
49-55
栏目:
媒体计算
出版日期:
2025-06-10

文章信息/Info

Title:
An End-to-end Video Anomaly Detection Method Based on Transformer Architecture
文章编号:
1673-629X(2025)06-0049-07
作者:
李石峰*罗晰刘晓茹田野
渤海大学 信息科学与技术学院,辽宁 锦州 121000
Author(s):
LI Shi-feng*LUO XiLIU Xiao-ruTIAN Ye
School of Information Science and Technology,Bohai University,Jinzhou 121000,China
关键词:
视频异常检测Transformer架构时空信息融合模型深度支持向量数据描述联合训练
Keywords:
video anomaly detection Transformer architecture spatio - temporal information fusion model deep support vector data description (Deep SVDD)joint training
分类号:
TP391
DOI:
10.20165/j.cnki.ISSN1673-629X.2025.0018
摘要:
传统的卷积神经网络虽然能够处理空间结构数据,但在处理大规模视频数据时,其时空建模能力不足。 为了解决这一问题,需要一个能够处理海量视频数据的高效模型。 该文提出了一种新的基于 Transformer 架构的端到端视频异常检测方法。 该方法结合 Swin Transformer 架构和 Video Vision Transformer (ViViT)模型设计了时空信息融合模型,以提取视频帧序列的丰富时空信息。 此外,通过将时空信息融合模型和深度支持向量数据描述(Deep SVDD)方法进行联合训练,实现了端到端的视频异常检测。 在两个公开视频数据集上与最新的 10 种方法进行了对比实验,在 UCSD Ped2 数据集上,该模型取得了最高的 96. 5% 的 AUC;在 CHUK Avenue 数据集上,该模型也取得了 80. 7% 的 AUC,优于多数方法。 与领先的视频异常检测方法相比,该方法具有一定的优势和竞争力。
Abstract:
Although the traditional convolutional neural network can process spatial structure data,its spatiotemporal modeling ability is insufficient when processing large-scale video data. In order to solve this problem,an efficient model that can handle massive video data is needed. A new end - to - end video anomaly detection method based on Transformer architecture is proposed. Combining Swin Transformer architecture and Video Vision Transformer ( ViViT) model, a spatio - temporal information fusion model is designed to extract rich spatio-temporal information of video frame sequences. In addition,by combining spatiotemporal information fusion model and Deep SVDD method, end-to-end video anomaly detection is realized. A comparison experiment was conducted on two public video datasets with the latest 10 methods. On UCSD Ped2 dataset,the proposed model achieved the highest AUC of 96. 5% . On the CHUK Avenue dataset,it also achieved 80. 7% AUC,which is better than that of most methods. The proposed method has certain advantages and competitiveness compared with the leading video anomaly detection methods.
更新日期/Last Update: 2025-06-10