[1]张诗凡,叶海波.Conditional HOTR:基于 Transformer 的人物交互检测[J].计算机技术与发展,2023,33(08):23-29.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 004]
 ZHANG Shi-fan,YE Hai-bo.Conditional Human-object Interaction Detection with Transformer[J].,2023,33(08):23-29.[doi:10. 3969 / j. issn. 1673-629X. 2023. 08. 004]
点击复制

Conditional HOTR:基于 Transformer 的人物交互检测()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
33
期数:
2023年08期
页码:
23-29
栏目:
媒体计算
出版日期:
2023-08-10

文章信息/Info

Title:
Conditional Human-object Interaction Detection with Transformer
文章编号:
1673-629X(2023)08-0023-07
作者:
张诗凡叶海波
南京航空航天大学 计算机科学与技术学院,江苏 南京 211106
Author(s):
ZHANG Shi-fanYE Hai-bo
School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
关键词:
人物交互检测计算机视觉Transformer查询嵌入交互点
Keywords:
human-object interaction detectioncomputer visiontransformerquery embeddinginteraction point
分类号:
TP391. 4
DOI:
10. 3969 / j. issn. 1673-629X. 2023. 08. 004
摘要:
人物交互检测任务( HOI 任务) 旨在检测出图片中所有存在交互关系的人和物,最后得到<人,动作,物>这样形式的三元组。 一般的方法包括两阶段和一阶段算法,最近一些工作提出的基于 transformer 的 HOI 检测方法使整个管道变得更加简单。 对于已有的检测模型 HOTR,旨在优化其内部 transformer 结构,使其更好地适应 HOI 检测任务。 对于其中用于交互检测的交互解码器,根据其交互查询嵌入分别生成了人和物的参考点,并以此设计了交互点生成公式,然后利用交互点的信息设计了条件交互查询,将其作为位置嵌入与内容嵌入相加得到 query,最后与 key 点乘进行注意力计算。 这有助于 transformer 显式地定位与交互相关的区域,缩小搜索范围并缓解对内容嵌入的依赖。 最终,在基准数据集 V-COCO 和HICO-DET 上,mAP 分别提升了 2. 13 百分点和 8. 33 百分点,并且精度在 V-COCO 数据集上达到了目前最优。
Abstract:
Human-object interaction task ( HOI) aims to detect all triplets in the image that exist interaction relationships. General methods contain two-stage?
algorithm and one-stage algorithm. Some recent work has proposed a transformer-basedHOI inspection approach that makes the whole pipeline much simpler. For the existing detection model HOTR,we aim to optimize itsinternal structure of transformer to better adapt to the HOI detection task. For the interaction decoder used for interaction detection,wegenerate the reference points of humans and objects according to its interaction query embeddings,?
and design the interaction pointsgeneration formula. Then,we use the information of the interaction points to design the conditional interaction query,which?
is seen as theposition embedding and added to the content embedding to obtain the query,and do dot multiplication with the key finally. It helps tolocate the interaction - related region explicitly, which narrows the search range and ease the dependence on content embedding. Weperform the experiments based on HOTR,with the mAP gain of 8. 33% on HICO - DET and 2. 13% on V - COCO, and the mAP iscurrently SOTA on the V-COCO dataset.

相似文献/References:

[1]黄艳 赵越.3D靶标的摄像机三步标定算法与实现[J].计算机技术与发展,2010,(01):135.
 HUANG Yan,ZHAO Yue.Algorithm and Realization of Three-step Camera Calibration Based on 3D-Target[J].,2010,(08):135.
[2]付海洋 牛连强 刘守琳.一种基于平面模板的单应矩阵求解方法[J].计算机技术与发展,2010,(04):69.
 FU Hai-yang,NIU Lian-qiang,LIU Shou-lin.A Solving Homography Matrix Method Based on Planar Pattern[J].,2010,(08):69.
[3]张铖伟 王彪 徐贵力.摄像机标定方法研究[J].计算机技术与发展,2010,(11):174.
 ZHANG Cheng-wei,WANG Biao,XU Gui-li.A Study on Classification of Camera Calibration Methods[J].,2010,(08):174.
[4]毛雁明 杨慧玲.一种新的立体匹配算法[J].计算机技术与发展,2011,(03):105.
 MAO Yan-ming,YANG Hui-ling.A New Stereo Matching Algorithm[J].,2011,(08):105.
[5]杨晟,李学军,王珏,等.连续尺度复合分析核线重排列影像准稠密匹配[J].计算机技术与发展,2013,(04):111.
 YANG Sheng,LI Xue-jun,WANG Jue,et al.Continuous Scale Multi-change Detecting Quasi-dense Matching for Epipolar Resample Images[J].,2013,(08):111.
[6]卢振宇,郭星,魏赛,等.基于计算机视觉的虚拟安全空间预警技术[J].计算机技术与发展,2014,24(02):237.
 LU Zhen-yu,GUO Xing,WEI Sai,et al.A Surveillance Technology for Virtual Security Space Based on Computer Vision[J].,2014,24(08):237.
[7]李孟,周波,孟正大,等. 三目立体相机的标定研究[J].计算机技术与发展,2015,25(02):69.
 LI Meng,ZHOU Bo,MENG Zheng-da,et al. Study on Trinocular Stereo Camera Calibration[J].,2015,25(08):69.
[8]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
 SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(08):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
[9]程龙乐[][],许金林[],李皙茹[][],等. 基于图像处理的跑步机速度自适应技术研究[J].计算机技术与发展,2016,26(10):92.
 CHENG Long-le[][],XU Jin-lin[],LI Xi-ru[][],et al. Research on Speed-adaptive Technology of Treadmill Based on Image Processing[J].,2016,26(08):92.
[10]严一鸣[],郭星[]. 基于计算机视觉的交互式电子沙盘系统研究[J].计算机技术与发展,2017,27(06):195.
 YAN Yi-ming[],GUO Xing[]. Investigation on Interactive Electronic Sand Table System with Computer Vision[J].,2017,27(08):195.

更新日期/Last Update: 2023-08-10