«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2023. 12. 013]
点击复制

基于特征融合和注意力机制的物体 6D 姿态估计算法()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 33
期数:: 2023年12期

页码:: 92-100

栏目:: 媒体计算

出版日期:: 2023-12-10

文章信息/Info

Title:: Object 6D Pose Estimation Algorithm Based on Feature Fusion and Attention Mechanism

文章编号:: 1673-629X(2023)12-0092-09

作者:: 高维东; 林琳; 刘贤梅; 赵娅; 东北石油大学计算机与信息技术学院,黑龙江大庆 163318

Author(s):: GAO Wei-dong; LIN Lin; LIU Xian-mei; ZHAO Ya; School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China

关键词:: 物体 6D 姿态估计; 深度学习; 特征融合; 注意力机制; 跳跃连接

Keywords:: object 6D pose estimation; deep learning; feature fusion; attention mechanism; skip connection

分类号:: TP391. 4

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 12. 013

摘要:: 针对物体 6D 姿态估计易受目标物体的弱纹理和小体积特性、复杂背景、遮挡的影响,提出一种结合特征融合和注意力机制的物体 6D 姿态估计算法。首先,在 RGB 图像特征提取网络的
首个卷积块中加入卷积注意力模块,提升弱纹理小物体的区域显著度;其次,在基于编解码结构的 RGB 图像特征提取网络中引入基于卷积注意力模块的跳跃连接,有效地将编码阶段的颜色、
纹理等细节外观特征融合到解码阶段的姿态语义特征中,弥补姿态语义特征缺乏细节外观特征的问题;然后,使用通道注意力模块改进池化金字塔模块,增强目标物体可见区域与遮挡区域的
联系,提升遮挡鲁棒性;最后,使用卷积注意力模块重构解码阶段输出的姿态语义特征,增强相似表面特征的区分度,从而降低外观相似物体对物体 6D姿态估计的干扰。实验结果表明,该算法
在 Occlusion LINEMOD 数据集和 LINEMOD 数据集上 ADD( -S) 指标分别达到73. 4% 和 99. 8% ,与 FFB6D 相比,分别提升 7. 8 百分点和 0. 1 百分点,验证了该算法的可行性。

Abstract:: Object 6D pose estimation is easily affected by the weak texture and small volume characteristics of the target object,complexbackground,and occlusion. To solve the above problems,
an object 6D pose estimation algorithm combining feature fusion and attentionmechanism is proposed. First of all, the Convolutional Block Attention Module is added to the first convol-ution module of the RGBimage feature extraction network to improve the regional saliency of small objects with weak texture. Secondly,the skip connection basedon Convolutional Block Attention Module is introduced into the RGB image feature extraction network based on the encoder - decoderstructure,which effectively fuses the detailed appearance features contai-ning color,texture and others in the coding stage into the posesemantic features in the decoding stage to make up for the lack of detailed appearance features in the pose semantic features. Then,theChannel Attention Module is used to improve the Pyramid Pooling Module to enhance the connection between the visible area of thetarget object and the occluded area,and improve the occlusion robustness. Finally,the Convolutional Block Attention Module is used toreconstruct the features in the decoding stage rich in pose semantic information,so as to enhance the discrimination of similar surfacefeatures,thus reducing the interference of similar appearance objects on object 6D pose estimation. The experimental results show that theADD( -S) index of the algorithm on Occlusion LINEMOD dataset and LINEMOD dataset reaches 73. 4% and 99. 8% respectively,which are 7. 8 percentage points and 0. 1 percentage points higher than that of FFB6D respectively,verifying the feasibility of the algorithm.

相似文献/References:

[1]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
　CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(12):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[2]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
　SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(12):55.[doi:10.3969/j.issn.1673－629X.2018.05.013]
[3]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
　HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(12):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[4]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
　CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(12):19.
[5]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
　GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(12):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[6]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
　HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(12):1.
[7]徐融,邱晓晖.一种改进的 YOLO V3 目标检测方法[J].计算机技术与发展,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
　XU Rong,QIU Xiao-hui.An Improved YOLO V3 Object Detection[J].,2020,30(12):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
[8]曾志平[] [],萧海东[],张新鹏[]. 基于DBN的金融时序数据建模与决策[J].计算机技术与发展,2017,27(04):1.
　ZENG Zhi-ping[] [],XIAO Hai-dong[],ZHANG Xin-peng[]. Modeling and Decision-making of Financial Time Series Data with DBN[J].,2017,27(12):1.
[9]李全兵,文钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
　LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(12):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
[10]李宏林. 分析式纹理合成技术及其在深度学习的应用[J].计算机技术与发展,2017,27(11):7.
　LI Hong-lin. Analyzed Texture-synthesis Techniques and Their Applications in Deep Learning[J].,2017,27(12):7.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed497
全文下载/Downloads248
评论/Comments