基于加强图像块相关性的细粒度图像分类方法-《计算机技术与发展》

文章信息/Info

Title:: Fine Grained Image Classification Method Based on Enhanced Patch Correlation

文章编号:: 1673-629X(2023)05-0056-06

作者:: 王坤; 朱子奇; 武汉科技大学计算机科学与技术学院,湖北武汉 430065

Author(s):: WANG Kun; ZHU Zi-qi; School of Computer Science & Technology,Wuhan University of Science and Technology,Wuhan 430065,China

关键词:: ViT; 细粒度图像分类; 局部特征; 相关性; 图像块特征; 编码器

Keywords:: vision transformer; fine grained image classification; local features; correlation; patch features; encoder

分类号:: TP391. 4

DOI:: 10. 3969 / j. issn. 1673-629X. 2023. 05. 009

摘要:: 在细粒度图像分类任务中,提取出具有区分性的局部特征对识别图像之间的微小差异非常重要。基于 ViT( visiontransformer) 框架的算法模型在计算机视觉各个研究领域取得了优异的表现。?
针对基于 ViT 框架的细粒度图像分类模型对图片局部区域关注度低的问题且为进一步加强图像块特征的上下文联系,提出了一种基于加强图像块相关性的细粒度图像分类方法。首先,提出了
赋予图像块相关性权重的方法,并嵌套应用于不同层编码器中丰富不同层次特征信息,解决了 ViT 对图像局部特征关注不够的问题;其次,结合图像块的位置信息加强了局部特征上下文的联系,
同时减少了噪声信息带来的干扰;最后,提出相似损失函数来学习细粒度图像中微小特征的差异性,优化模型的分类效果。在两个公开数据集 CUB-200-2011 和 Standford Dogs 上进行实验分
别取得了 91. 33% 、92. 15% 的准确率,提出的方法分别比基准模型 ViT网络提升了 0. 63、0. 45 百分点,有效提升了细粒度图像分类效果,验证了方法的有效性。

Abstract:: In the fine - grained image classification task, it is crucial to extract distinctive local features to identify small differencesbetween images. The algorithm model based on ViT ( vision transformer) framework has achieved excellent performance in variousresearch fields of computer vision. Aiming at the problem that the fine-grained image classification model based on?
ViT framework payslittle attention to the local area of the picture and to further strengthen the context connection of patch features,a fine-grained image classification method based on enhancing the correlation of patch is proposed. Firstly,a method of assigning correlation weights to patches isproposed,and nested application is used in different layer encoders to enrich different layer feature information,which solves the problemthat ViT does not pay enough attention to local features of images. Secondly,combining the position information of the patch,the localfeature context is strengthened,and the interference caused by the noise information is reduced. Finally,the similarity loss function isproposed to learn the difference of minute features?
in fine-grained images and optimize the classification effect of the model. Experimentson two public data sets,CUB-200 -2011 and Standford Dogs,have achieved an accuracy of 91. 33%?
and 92. 15% ,respectively. Theproposed method improves the benchmark model ViT network by 0. 63 and 0. 45 percentage points respectively,effectively improving thefine-grained image classification effect,and verifying the effectiveness of the method.

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

文章信息/Info

常用功能

导航/Navigate

工具/Tools

统计/Statistics