[1]王涛,张笃振.DCT-YOLOv5:从频率角度设计目标检测算法[J].计算机技术与发展,2024,34(10):69-76.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0197]
 WANG Tao,ZHANG Du-zhen.DCT-YOLOv5:Designing Object Detection Algorithms from a Frequency Perspective[J].,2024,34(10):69-76.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0197]
点击复制

DCT-YOLOv5:从频率角度设计目标检测算法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
34
期数:
2024年10期
页码:
69-76
栏目:
媒体计算
出版日期:
2024-10-10

文章信息/Info

Title:
DCT-YOLOv5:Designing Object Detection Algorithms from a Frequency Perspective
文章编号:
1673-629X(2024)10-0069-08
作者:
王涛张笃振
江苏师范大学 计算机科学与技术学院,江苏 徐州 221116
Author(s):
WANG TaoZHANG Du-zhen
School of Computer Science and Technology,Jiangsu Normal University,Xuzhou 221116,China
关键词:
离散余弦变换卷积神经网络下采样固定参数YOLOv5
Keywords:
discrete cosine transformconvolutional neural networksdown-samplingfixed parameterYOLOv5
分类号:
TP391.41
DOI:
10.20165/j.cnki.ISSN1673-629X.2024.0197
摘要:
离散余弦变换(DCT)是 JPEG 压缩算法的核心步骤之一,将图像空间域的像素数据转换为频率域的系数。 DCT 与深度学习结合的算法非常常见,但并未从频率角度解析卷积结构。 为进一步提升目标检测性能,针对该问题提出改进算法:DCT-YOLOv5。 首先,证明卷积神经网络(CNNs)、Transformer 和 MLP 架构都是对频域的隐式建模,验证以往模型设计的默认原则:有效感受野总小于理论感受野、多个小卷积核优于大卷积核。 其次,考虑输入通道和卷积核选择合理的输出通道数,做到近似无损变换,其中下采样阶段是唯一改变通道数的地方。 最后,通过固定参数比较 DCT 和卷积,二者差异稳定在依0. 8% 。 并且为了最大程度降低计算量,引入固定组内数量的分组卷积。 该模型以 YOLOv5 为基准,在 COCO 数据集上设计了丰富实验,验证方法的有效性。 取得了 28. 9% 的 mAP@ . 5 和 277. 8 的 FPS,相对于基准模型获得了 1. 3% 的相对提升。 测试结果表明,改进后的模型在精度上有显著提升,并能够在更低的算力平台上运行。
Abstract:
Discrete cosine transform (DCT) is one of the core steps of JPEG compression algorithm,which converts pixel data in the spatial domain of image into coefficients in the frequency domain. Algorithms that combine DCT with deep learning are largely common,but do not resolve the convolutional structures from the frequency perspective. To further improve the performance of object detection,we propose an improved algorithm for this problem: DCT - YOLOv5. First, it is shown that convolutional neural networks (CNNs),Transformers,and MLP architectures all implicitly model the frequency domain,validating previous standard model design principles:the effective perceptual field is always smaller than the theoretical perceptual field,and multiple small convolutional kernel is preferred to a large convolutional kernel. Second,the input channels and the convolution kernel are considered to choose a reasonable number of output channels to achieve an approximate lossless transformation,where the only place to change the number of channels is at the down - sampling stage. Finally,by comparing DCT and convolution with fixed parameters,the difference between the two is stabilized within ±0. 8% . And to minimize the computation, grouped convolution with a fixed number of in - groups is introduced. The model is benchmarked with YOLOv5, and enriched experiments are designed on the COCO2017 dataset to validate the effectiveness of the proposed method. The result shows a detection speed of 277. 8 FPS and a mAP@ . 5 of 28. 9% ,achieving a relative improvement of 1.3% over the benchmark model. The test results indicate that the enhanced model has significantly improved accuracy and can operate on lower computing platforms.

相似文献/References:

[1]吴柯.图像版权保护与认证的双水印算法[J].计算机技术与发展,2009,(09):136.
 WU Ke.Dual Watermarking Algorithm of Image Copyright Protection and Authentication[J].,2009,(10):136.
[2]张伟 陈新龙 詹斌.基于DCT的图像水印算法研究与实现[J].计算机技术与发展,2009,(09):157.
 ZHANG Wei,CHEN Xin-long,ZHAN Bin.Research and Implementation of Blind Watermarking Algorithm of Images Based on DCT[J].,2009,(10):157.
[3]鞠汶奇 肖创柏 邬鹏.基于超长指令字的定点DCT算法研究[J].计算机技术与发展,2008,(01):101.
 JU Wen-qi,XIAO Chuang-bai,WU Peng.Research of Fixed - Point DCT Algorithm Based on VLIW Architecture[J].,2008,(10):101.
[4]武杰 陶亮 王华彬 姜雪.基于DST的实值离散Gabor变换[J].计算机技术与发展,2008,(05):118.
 WU Jie,TAO Liang,WANG Hua-bin,et al.DST- Based Real- Valued Discrete Gabor Transform[J].,2008,(10):118.
[5]罗开仲 黄士坦 杨华民.DCT算法及其与小波编码在图像处理中的比较[J].计算机技术与发展,2006,(09):79.
 LUO Kai-zhong,HUANG Shi-tan,YANG Hua-min.DCT Arithmetic and Its Comparison with Wavelet Transform Coding in Image Manipulation[J].,2006,(10):79.
[6]谢丁峰 夏新军.数字视频录像机的研究与设计[J].计算机技术与发展,2010,(09):246.
 XIE Ding-feng,XIA Xin-jun.Research and Design of Digital Video Record[J].,2010,(10):246.
[7]陆鹏 汤进 罗斌.基于Windows Mobile平台图像认证系统设计实现[J].计算机技术与发展,2010,(11):87.
 LU Peng,TANG Jin,LUO Bin.Image Authentication System Design and Implementation Based on Windows Mobile[J].,2010,(10):87.
[8]谢丁峰 夏新军.基于H.264的视频监控系统关键代码实现与优化[J].计算机技术与发展,2010,(12):57.
 XIE Ding-feng,XIA Xin-jun.Implementation and Optimization on Critical Code of H.264-Based Video Surveillance System[J].,2010,(10):57.
[9]杜肖山 廖述剑.一种DWT与DCT相结合的图像水印算法[J].计算机技术与发展,2011,(01):147.
 DU Xiao-shan,LIAO Shu-jian.A Novel Image Watermarking Algorithm Based on DWT and DCT[J].,2011,(10):147.
[10]马媛媛 杨峰 信科 焦方超.基于DCT的JPEG图像压缩的研究[J].计算机技术与发展,2011,(08):133.
 MA Yuan-yuan,YANG Feng,XIN Ke,et al.Research of JPEG Image Compression Based on DCT[J].,2011,(10):133.

更新日期/Last Update: 2024-10-10