[1]赵艳芹,蔡乾.基于改进DINO模型的路侧目标检测方法[J].计算机技术与发展,2025,(06):145-151.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0009]
 ZHAO Yan-qin,CAI Qian.Roadside Object Detection Method Based on Improved DINO Model[J].,2025,(06):145-151.[doi:10.20165/j.cnki.ISSN1673-629X.2025.0009]
点击复制

基于改进DINO模型的路侧目标检测方法()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2025年06期
页码:
145-151
栏目:
人工智能
出版日期:
2025-06-10

文章信息/Info

Title:
Roadside Object Detection Method Based on Improved DINO Model
文章编号:
1673-629X(2025)06-0145-07
作者:
赵艳芹蔡乾
黑龙江科技大学 计算机与信息工程学院,黑龙江 哈尔滨 150022
Author(s):
ZHAO Yan-qinCAI Qian
School of Computer & Information Engineering,Heilongjiang University of Science & Technology,Harbin 150022,China
关键词:
深度学习Detection TransformerDINO目标检测路侧图像
Keywords:
deep learningDetection TransformerDINOobject detectionroadside images
分类号:
TP391.41
DOI:
10.20165/j.cnki.ISSN1673-629X.2025.0009
摘要:
路侧感知研究是一个复杂的研究领域,比车辆感知研究更为复杂。 由于摄像头的位置和角度不同,路侧目标显示出明显的多尺度差异。 广阔的感知领域引入了更多的小尺度目标和复杂背景,使目标检测更加复杂。 该文旨在增强目前基于 Transformer 的目标检测模型 DINO,并开发一种基于改进 DINO 算法的路侧目标检测方法。 在改进中,将 ConvNeXt 网络应用到 DINO 的主干网络中,其倒置瓶颈结构设计有助于优化模型的感受野,通过采用大核卷积,增强了模型对路侧目标特征的有效提取能力,从而提高了目标检测的准确性。 随后提出了 ODConvNeXt 结构,通过在 ConvNeXt 模块中引入Omni-Dimensional Dynamic Convolution(ODConv)模块,利用其并行策略的四维注意力机制准确识别包含各种路侧特征的目标区域,有效增强了网络模型的特征提取能力,显著提高了目标检测精度,尤其是小目标检测的准确性。 Lion 优化器是一种使用符号计算的内存高效随机梯度下降方法,可动态调整学习率并引入动量以加速梯度更新,通过实现参数分布均衡来提高模型的收敛性、准确性和泛化能力。 对 Rope3D 数据集的分析表明,与初始模型相比,改进后的 DINO 算法在小目标上的 mAP0. 5:0. 95指标提高了6. 3 百分点,模型权重减少了32. 3% 。 在实现这一提高的同时,检测速度保持在24 FPS,并且在参数量相等的情况下,精度高于原方法。
Abstract:
Studying roadside perception is a complex research field that is more complicated than studying vehicle perception. Roadside items display significant multi-scale differences due to various camera positions and angles. The extensive perception domain introduces additional small-scale objects and intricate backdrops,further complicating object detection. We aim to enhance the present Transformer-based object detection model DINO and develop a roadside object detection method based on the enhanced DINO algorithm. In the en-hancement,the ConvNeXt is applied to the backbone network of DINO,whose inverted bottleneck structure design optimizes the sensory field of the model,and by adopting a large kernel convolution,it enhances the model’s ability of extracting the roadside object features, and improves the accuracy of object detection. Afterwards, the ODConvNeXt structure is proposed, and by introducing the Omni - Dimensional dynamic convolution (ODConv) module in the ConvNeXt block,and utilizing the four-dimensional attention mechanism of its parallel strategy to accurately identify the object regions containing various roadside features,the feature extraction capability of the network model is effectively enhanced,and its accuracy in object detection,especially small object detection,is significantly improved.The Lion optimizer,a memory-efficient stochastic gradient descent method using symbolic operators,dynamically adjusts learning rates and introduces momentum to accelerate gradient updates, improving model convergence, accuracy, and generalization by achieving parameter distribution equalization. An analysis of the Rope3D dataset shows that the improved DINO algorithm increases the mAP0. 5:0. 95 metric on small object by 6. 3 percentage points and reduces the model size by 32. 3% compared to the initial model. This increase is achieved while sustaining a detection speed of 24 FPS and attaining more accuracy than the original method with an equivalent amount of parameters.

相似文献/References:

[1]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
 CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(06):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[2]施泽浩,赵启军.基于全卷积网络的目标检测算法[J].计算机技术与发展,2018,28(05):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
 SHI Ze-hao,ZHAO Qi-jun.Object Detection Algorithm Based on Fully Convolutional Neural Network[J].,2018,28(06):55.[doi:10.3969/j.issn.1673-629X.2018.05.013]
[3]黄法秀,张世杰,吴志红,等.数据增广下的人脸识别研究[J].计算机技术与发展,2020,30(03):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
 HUANG Fa-xiu,ZHANG Shi-jie,WU Zhi-hong,et al.Research on Face Recognition Based on Data Augmentation[J].,2020,30(06):67.[doi:10. 3969 / j. issn. 1673-629X. 2020. 03. 013]
[4]陈浩翔,蔡建明,刘铿然,等. 手写数字深度特征学习与识别[J].计算机技术与发展,2016,26(07):19.
 CHEN Hao-xiang,CAI Jian-ming,LIU Keng-ran,et al. Deep Learning and Recognition of Handwritten Numeral Features[J].,2016,26(06):19.
[5]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
 GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
[6]贺飞翔,赵启军. 基于深度学习的头部姿态估计[J].计算机技术与发展,2016,26(11):1.
 HE Fei-xiang,ZHAO Qi-jun. Head Pose Estimation Based on Deep Learning[J].,2016,26(06):1.
[7]徐 融,邱晓晖.一种改进的 YOLO V3 目标检测方法[J].计算机技术与发展,2020,30(07):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
 XU Rong,QIU Xiao-hui.An Improved YOLO V3 Object Detection[J].,2020,30(06):30.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 007]
[8]曾志平[] [],萧海东[],张新鹏[]. 基于DBN的金融时序数据建模与决策[J].计算机技术与发展,2017,27(04):1.
 ZENG Zhi-ping[] [],XIAO Hai-dong[],ZHANG Xin-peng[]. Modeling and Decision-making of Financial Time Series Data with DBN[J].,2017,27(06):1.
[9]李全兵,文 钊*,田艳梅*,等.基于 WGAN 的音频关键词识别研究[J].计算机技术与发展,2021,31(08):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
 LI Quan-bing,WEN Zhao *,TIAN Yan-mei *,et al.Research on Audio Keywords Recognition Based on WassersteinGenerative Adversarial Network[J].,2021,31(06):26.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 005]
[10]李宏林. 分析式纹理合成技术及其在深度学习的应用[J].计算机技术与发展,2017,27(11):7.
 LI Hong-lin. Analyzed Texture-synthesis Techniques and Their Applications in Deep Learning[J].,2017,27(06):7.

更新日期/Last Update: 2025-06-10