[1]李石峰*,罗晰,刘晓茹,等.基于Transformer架构的端到端视频异常检测方法[J].计算机技术与发展,2025,(06):49-55.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0018]
LI Shi-feng*,LUO Xi,LIU Xiao-ru,et al.An End-to-end Video Anomaly Detection Method Based on Transformer Architecture[J].,2025,(06):49-55.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0018]
点击复制
基于Transformer架构的端到端视频异常检测方法(
)
《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
- 期数:
-
2025年06期
- 页码:
-
49-55
- 栏目:
-
媒体计算
- 出版日期:
-
2025-06-10
文章信息/Info
- Title:
-
An End-to-end Video Anomaly Detection Method Based on Transformer Architecture
- 文章编号:
-
1673-629X(2025)06-0049-07
- 作者:
-
李石峰*; 罗晰; 刘晓茹; 田野
-
渤海大学 信息科学与技术学院,辽宁 锦州 121000
- Author(s):
-
LI Shi-feng*; LUO Xi; LIU Xiao-ru; TIAN Ye
-
School of Information Science and Technology,Bohai University,Jinzhou 121000,China
-
- 关键词:
-
视频异常检测; Transformer架构; 时空信息融合模型; 深度支持向量数据描述; 联合训练
- Keywords:
-
video anomaly detection; Transformer architecture; spatio - temporal information fusion model; deep support vector data description (Deep SVDD); joint training
- 分类号:
-
TP391
- DOI:
-
10.20165/j.cnki.ISSN1673-629X.2025.0018
- 摘要:
-
传统的卷积神经网络虽然能够处理空间结构数据,但在处理大规模视频数据时,其时空建模能力不足。 为了解决这一问题,需要一个能够处理海量视频数据的高效模型。 该文提出了一种新的基于 Transformer 架构的端到端视频异常检测方法。 该方法结合 Swin Transformer 架构和 Video Vision Transformer (ViViT)模型设计了时空信息融合模型,以提取视频帧序列的丰富时空信息。 此外,通过将时空信息融合模型和深度支持向量数据描述(Deep SVDD)方法进行联合训练,实现了端到端的视频异常检测。 在两个公开视频数据集上与最新的 10 种方法进行了对比实验,在 UCSD Ped2 数据集上,该模型取得了最高的 96. 5% 的 AUC;在 CHUK Avenue 数据集上,该模型也取得了 80. 7% 的 AUC,优于多数方法。 与领先的视频异常检测方法相比,该方法具有一定的优势和竞争力。
- Abstract:
-
Although the traditional convolutional neural network can process spatial structure data,its spatiotemporal modeling ability is insufficient when processing large-scale video data. In order to solve this problem,an efficient model that can handle massive video data is needed. A new end - to - end video anomaly detection method based on Transformer architecture is proposed. Combining Swin Transformer architecture and Video Vision Transformer ( ViViT) model, a spatio - temporal information fusion model is designed to extract rich spatio-temporal information of video frame sequences. In addition,by combining spatiotemporal information fusion model and Deep SVDD method, end-to-end video anomaly detection is realized. A comparison experiment was conducted on two public video datasets with the latest 10 methods. On UCSD Ped2 dataset,the proposed model achieved the highest AUC of 96. 5% . On the CHUK Avenue dataset,it also achieved 80. 7% AUC,which is better than that of most methods. The proposed method has certain advantages and competitiveness compared with the leading video anomaly detection methods.
更新日期/Last Update:
2025-06-10