«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2024.0197]
点击复制

DCT-YOLOv5:从频率角度设计目标检测算法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 34
期数:: 2024年10期

页码:: 69-76

栏目:: 媒体计算

出版日期:: 2024-10-10

文章信息/Info

Title:: DCT-YOLOv5:Designing Object Detection Algorithms from a Frequency Perspective

文章编号:: 1673-629X(2024)10-0069-08

作者:: 王涛; 张笃振; 江苏师范大学计算机科学与技术学院,江苏徐州 221116

Author(s):: WANG Tao; ZHANG Du-zhen; School of Computer Science and Technology,Jiangsu Normal University,Xuzhou 221116,China

关键词:: 离散余弦变换; 卷积神经网络; 下采样; 固定参数; YOLOv5

Keywords:: discrete cosine transform; convolutional neural networks; down-sampling; fixed parameter; YOLOv5

分类号:: TP391.41

DOI:: 10.20165/j.cnki.ISSN1673-629X.2024.0197

摘要:: 离散余弦变换(DCT)是 JPEG 压缩算法的核心步骤之一,将图像空间域的像素数据转换为频率域的系数。 DCT 与深度学习结合的算法非常常见,但并未从频率角度解析卷积结构。为进一步提升目标检测性能,针对该问题提出改进算法:DCT-YOLOv5。首先,证明卷积神经网络(CNNs)、Transformer 和 MLP 架构都是对频域的隐式建模,验证以往模型设计的默认原则:有效感受野总小于理论感受野、多个小卷积核优于大卷积核。其次,考虑输入通道和卷积核选择合理的输出通道数,做到近似无损变换,其中下采样阶段是唯一改变通道数的地方。最后,通过固定参数比较 DCT 和卷积,二者差异稳定在依0. 8% 。并且为了最大程度降低计算量,引入固定组内数量的分组卷积。该模型以 YOLOv5 为基准,在 COCO 数据集上设计了丰富实验,验证方法的有效性。取得了 28. 9% 的 mAP@ . 5 和 277. 8 的 FPS,相对于基准模型获得了 1. 3% 的相对提升。测试结果表明,改进后的模型在精度上有显著提升,并能够在更低的算力平台上运行。

Abstract:: Discrete cosine transform (DCT) is one of the core steps of JPEG compression algorithm,which converts pixel data in the spatial domain of image into coefficients in the frequency domain. Algorithms that combine DCT with deep learning are largely common,but do not resolve the convolutional structures from the frequency perspective. To further improve the performance of object detection,we propose an improved algorithm for this problem: DCT - YOLOv5. First, it is shown that convolutional neural networks (CNNs),Transformers,and MLP architectures all implicitly model the frequency domain,validating previous standard model design principles:the effective perceptual field is always smaller than the theoretical perceptual field,and multiple small convolutional kernel is preferred to a large convolutional kernel. Second,the input channels and the convolution kernel are considered to choose a reasonable number of output channels to achieve an approximate lossless transformation,where the only place to change the number of channels is at the down - sampling stage. Finally,by comparing DCT and convolution with fixed parameters,the difference between the two is stabilized within ±0. 8% . And to minimize the computation, grouped convolution with a fixed number of in - groups is introduced. The model is benchmarked with YOLOv5, and enriched experiments are designed on the COCO2017 dataset to validate the effectiveness of the proposed method. The result shows a detection speed of 277. 8 FPS and a mAP@ . 5 of 28. 9% ,achieving a relative improvement of 1.3% over the benchmark model. The test results indicate that the enhanced model has significantly improved accuracy and can operate on lower computing platforms.

相似文献/References:

[1]吴柯.图像版权保护与认证的双水印算法[J].计算机技术与发展,2009,(09):136.
　WU Ke.Dual Watermarking Algorithm of Image Copyright Protection and Authentication[J].,2009,(10):136.
[2]张伟陈新龙詹斌.基于DCT的图像水印算法研究与实现[J].计算机技术与发展,2009,(09):157.
　ZHANG Wei,CHEN Xin-long,ZHAN Bin.Research and Implementation of Blind Watermarking Algorithm of Images Based on DCT[J].,2009,(10):157.
[3]鞠汶奇肖创柏邬鹏.基于超长指令字的定点DCT算法研究[J].计算机技术与发展,2008,(01):101.
　JU Wen-qi,XIAO Chuang-bai,WU Peng.Research of Fixed - Point DCT Algorithm Based on VLIW Architecture[J].,2008,(10):101.
[4]武杰陶亮王华彬姜雪.基于DST的实值离散Gabor变换[J].计算机技术与发展,2008,(05):118.
　WU Jie,TAO Liang,WANG Hua-bin,et al.DST- Based Real- Valued Discrete Gabor Transform[J].,2008,(10):118.
[5]罗开仲黄士坦杨华民.DCT算法及其与小波编码在图像处理中的比较[J].计算机技术与发展,2006,(09):79.
　LUO Kai-zhong,HUANG Shi-tan,YANG Hua-min.DCT Arithmetic and Its Comparison with Wavelet Transform Coding in Image Manipulation[J].,2006,(10):79.
[6]谢丁峰夏新军.数字视频录像机的研究与设计[J].计算机技术与发展,2010,(09):246.
　XIE Ding-feng,XIA Xin-jun.Research and Design of Digital Video Record[J].,2010,(10):246.
[7]陆鹏汤进罗斌.基于Windows Mobile平台图像认证系统设计实现[J].计算机技术与发展,2010,(11):87.
　LU Peng,TANG Jin,LUO Bin.Image Authentication System Design and Implementation Based on Windows Mobile[J].,2010,(10):87.
[8]谢丁峰夏新军.基于H.264的视频监控系统关键代码实现与优化[J].计算机技术与发展,2010,(12):57.
　XIE Ding-feng,XIA Xin-jun.Implementation and Optimization on Critical Code of H.264-Based Video Surveillance System[J].,2010,(10):57.
[9]杜肖山廖述剑.一种DWT与DCT相结合的图像水印算法[J].计算机技术与发展,2011,(01):147.
　DU Xiao-shan,LIAO Shu-jian.A Novel Image Watermarking Algorithm Based on DWT and DCT[J].,2011,(10):147.
[10]马媛媛杨峰信科焦方超.基于DCT的JPEG图像压缩的研究[J].计算机技术与发展,2011,(08):133.
　MA Yuan-yuan,YANG Feng,XIN Ke,et al.Research of JPEG Image Compression Based on DCT[J].,2011,(10):133.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed283
全文下载/Downloads162
评论/Comments