基于特征关联的视频中群体人物行为语义抽取-《计算机技术与发展》

文章信息/Info

Title:: Crowd Activity Semantic Extraction in Video Based on Feature Association

作者:: 掌静¹; 陈志¹ ; 岳文静²; 1. 南京邮电大学计算机学院,江苏南京 210023; 2. 南京邮电大学通信与信息工程学院,江苏南京 210003

Author(s):: ZHANG Jing¹ ; CHEN Zhi¹; YUE Wen-jing²; 1. School of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210023,China; 2. School of Communication and Information Technology,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

Keywords:: crowd activity; semantic extraction; target detection; human tracking; feature mask; motion trajectory

摘要:: 为解决视频中群体人物行为语义抽取中群体人物相互遮挡、追踪困难等问题,构建一种基于特征关联的视频中群体行为人物语义抽取算法。该算法首先对视频帧提取多尺度融合特征图,通过特征图检测视频帧中可能存在的人物,利用去重算法筛除检测到的重复人物, 精准定位群体人物边界框;接着预测群体人物特征掩码,通过比对相邻视频帧人物特征掩码的差异度追踪群体人物的运动轨迹;最后结合群体人物的运动轨迹推理每帧视频帧的群体人物行为语义,根据群体人物行为特点抽取视频群体人物行为语义。实验结果表明,该算法能够准确提取、定位群体人物的动态线索,解决群体人物复杂时空关系导致的语义抽取低效问题,有效地提高群体人物语义抽取的准确率和鲁棒性。

Abstract:: In order to solve the problems of mutual occlusion and tracking of group characters for crowd activity semantic extraction in video,a crowd activity semantic extraction algorithm in video is presented based on feature association. The proposed algorithm first extracts the multi-scale fusion feature map of the video frame,detects the possible human in the video frame through the feature map,usesthe deduplication algorithm to filter out the detected duplicate human,and accurately locates the target group’s bounding boxes. Then it predicts the feature masks of group characters. The motion trajectory of group characters is tracked by comparing the difference degree of the character mask of the adjacent video frames. Finally it infers crowd activity semantics of each frame according to the motion trajectory and combines the characteristics of crowd activity to exact crowd activity semantics in video. The experiment shows that the proposed algorithm can accurately extract and locate the dynamic clues of group characters,solve the inefficiency of semantic extraction caused by complex spatial-temporal relationship of group characters, thus effectively improving the accuracy and robustness of crowd activity semantic extraction.