[1]代子鑫,翟双姣,秦品乐,等.基于对比学习和扩散模型的多模态活动识别[J].计算机技术与发展,2025,(06):116-123.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0027]
 DAI Zi-xin,ZHAI Shuang-jiao,QIN Pin-le,et al.Multimodal Activity Recognition Based on Contrastive Learning and Diffusion Models[J].,2025,(06):116-123.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0027]
点击复制

基于对比学习和扩散模型的多模态活动识别

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年06期
页码:
116-123
栏目:
人工智能
出版日期:
2025-06-10

文章信息/Info

Title:
Multimodal Activity Recognition Based on Contrastive Learning and Diffusion Models
文章编号:
1673-629X(2025)06-0116-08
作者:
代子鑫翟双姣秦品乐晋赞霞曾建潮
中北大学 计算机科学与技术学院,山西 太原 030051
Author(s):
DAI Zi-xinZHAI Shuang-jiaoQIN Pin-leJIN Zan-xiaZENG Jian-chao
School of Data Science and Technology,North University of China,Taiyuan 030051,China
关键词:
活动识别对比学习扩散模型射频视频多模态融合
Keywords:
activity recognitioncontrastive learningdiffusion modelradio frequencyvideomultimodal fusion
分类号:
TP391.4
DOI:
10.20165/j.cnki.ISSN1673-629X.2025.0027
摘要:
射频和视频信号融合能够充分利用两者的互补特性,以提升模型的环境感知能力与识别准确性。 针对现有多模态融合方法难以有效解决模态间数据的异构性及复杂场景活动识别泛化性能不足的问题,提出了一种基于对比学习和扩散模型的多模态数据融合活动识别算法。 首先,去除射频数据和视频数据中的环境噪声,并通过对比学习方法建立两种模态间的映射关系,使得这两种模态的数据能够在统一的特征空间内融合,以缓解模态间数据的异构性问题。 然后,针对现有的多模态融合方法在复杂场景下泛化性能不足的问题,利用扩散模型生成高质量的射频数据,以增强射频信号的空间特征表达能力。 同时,采用视频几何变换的方法增加视频数据的多样性,进一步提升模型的鲁棒性和在复杂场景下的泛化能力。 最后,结合融合特征和增强后的多模态数据,充分利用对比学习和扩散模型在多模态活动识别中的优势,进一步提高模型的识别准确率。 实验结果表明,该算法在自建的多模态数据集和公开的 MM-Fi 数据集上的活动识别准确率均超过 90% ,优于现有的多模态融合算法。
Abstract:
The fusion of radio frequency (RF) and video signals effectively leverages their complementary characteristics to enhance envi-ronmental perception and improve recognition accuracy. To address the problem that existing multimodal fusion methods are difficult to effectively address the heterogeneity of inter-modal data and the insufficient generalisation performance of activity recognition in complex scenes,we propose a multimodal data fusion activity recognition algorithm based on contrastive learning and diffusion models. Firstly,the environmental noise in RF data and video data is removed,and a mapping relationship between the two modalities is established by the contrastive learning approach. This process facilitates the integration of data from both modalities within a unified feature space, thereby alleviating issues related to data heterogeneity. Subsequently,to address the problem of insufficient generalisation performance of existing multimodal fusion methods in complex scenes,we use diffusion models to generate high-quality RF data to enhance the spatial feature representation of RF signals. Meanwhile,the method of video geometric transformation is used to increase the diversity of the video data, which further improves the robustness of the model and the generalisation ability under complex scenes. Finally,by combining the fused features with the enhanced multimodal data,we fully leverage the benefits of contrastive learning and diffusion models in multimodal activity recognition,further enhancing the model's recognition accuracy. Experimental results show that the proposed algorithm achieves over 90% accuracy in activity recognition on both the self-collected multimodal dataset and the publicly available MM-Fi dataset,out-performing existing multimodal fusion algorithms.

相似文献/References:

[1]吴渊,史殿习,杨若松,等. 手机位置和朝向无关的活动识别技术研究[J].计算机技术与发展,2016,26(04):1.
 WU Yuan,SHI Dian-xi,YANG Ruo-song,et al.Research on Activity Recognition Technique of Smart Phone Position and Orientation Independent[J].,2016,26(06):1.
[2]戴 丹,管有庆,龚 锐.基于不确定性推理的活动识别方法研究[J].计算机技术与发展,2022,32(01):7.[doi:10. 3969 / j. issn. 1673-629X. 2022. 01. 002]
 DAI Dan,GUAN You-qing,GONG Rui.Activity Recognition Method Based on Uncertainty Reasoning[J].,2022,32(06):7.[doi:10. 3969 / j. issn. 1673-629X. 2022. 01. 002]
[3]郎文溪,孙 涵.基于视觉一致性增强的细粒度图像检索[J].计算机技术与发展,2022,32(12):12.[doi:10. 3969 / j. issn. 1673-629X. 2022. 12. 003]
 LANG Wen-xi,SUN Han.Fine-grained Image Retrieval Based on Strengthened Visual Consistency[J].,2022,32(06):12.[doi:10. 3969 / j. issn. 1673-629X. 2022. 12. 003]
[4]倪团雄,洪智勇,余文华,等.基于卷积注意力和对比学习的多视图聚类[J].计算机技术与发展,2023,33(08):59.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 009]
 NI Tuan-xiong,HONG Zhi-yong,YU Wen-hua,et al.Multi-view Clustering Based on Convolution Attention and Contrast Learning[J].,2023,33(06):59.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 009]
[5]阮鸿柱,黄小弟,王金宝,等.面向高速公路事故风险预测的深度学习方法[J].计算机技术与发展,2023,33(11):189.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 028]
 RUAN Hong-zhu,HUANG Xiao-di,WANG Jin-bao,et al.A Deep Learning Approach for Highway Accident Risk Prediction[J].,2023,33(06):189.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 028]
[6]鲍凯辰,刘宁钟,张婧颖.基于非显著区域增强的弱监督语义分割方法[J].计算机技术与发展,2025,(06):10.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0012]
 BAO Kai-chen,LIU Ning-zhong,ZHANG Jing-ying.A Weakly Supervised Semantic Segmentation Method Based on Non-salient Region Enhancement[J].,2025,(06):10.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0012]
[7]谢林泽,陈平华*,邓柏城.基于对比学习和元优化学习的序列推荐方法[J].计算机技术与发展,2024,34(10):148.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0192]
 XIE Lin-ze,CHEN Ping-hua*,DENG Bai-cheng.Sequential Recommendation Method Based on Contrastive Learning and Meta-optimized Learning[J].,2024,34(06):148.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0192]
[8]杨孟渭,张索非,吴晓富,等.基于多尺度三元组损失的层级图像检索算法[J].计算机技术与发展,2025,(04):80.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0380]
 YANG Meng-wei,ZHANG Suo-fei,WU Xiao-fu,et al.Hierarchical Image Retrieval with Multi-scale Triplet Loss[J].,2025,(06):80.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0380]
[9]王梅,于源泽,尹传龙.基于多级特征融合的深度多视图对比学习聚类方法[J].计算机技术与发展,2025,(04):86.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0360]
 WANG Mei,YU Yuan-ze,YIN Chuan-long.Deep Multi-view Clustering Method Based on Multi-level Feature Fusion and Contrastive Learning[J].,2025,(06):86.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0360]
[10]田萍芳,张冰,黄涛,等.基于全局和历史对比的时序知识图谱推理模型[J].计算机技术与发展,2025,(07):84.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0051]
 TIAN Ping-fang,ZHANG Bing,HUANG Tao,et al.A Temporal Knowledge Graph Reasoning Model Based on Global and Historical Comparisons[J].,2025,(06):84.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0051]

更新日期/Last Update: 2025-06-10