«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2025.0126]
点击复制

基于自适应多尺度融合的RGB-D岩画图像分割模型()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:: 2025年05期

页码:: 152-157

栏目:: 新型计算应用系统

出版日期:: 2025-05-10

文章信息/Info

Title:: RGB-D Petroglyph Image Segmentation Model Based on Adaptive Multi-scale Feature Fusion

文章编号:: 1673-629X(2025)05-0152-06

作者:: 白川平1; 2; 3; 职昕1; 4; 张芳琴2; 王治学2; 周明全1; 4; 1. 西北大学信息科学与技术学院,陕西西安 710127;
2. 宁夏师范大学数学与计算机科学学院,宁夏固原 756099;
3. 宁夏师范大学人工智能与智慧医疗工程技术研究中心,宁夏固原 756099;
4. 西北大学文化遗产数字化国家地方联合工程研究中心,陕西西安 710127

Author(s):: BAI Chuan-ping1; 2; 3; ZHI Xin1; 4; ZHANG Fang-qin2; WANG Zhi-xue2; ZHOU Ming-quan1; 4; 1. School of Information Science & Technology,Northwest University,Xi’an 710127,China;
2. School of Mathematics & Computer Science,Ningxia Normal University,Guyuan 756099,China;
3. Engineering Technology Research Center for Artificial Intelligence and Smart Healthcare,Ningxia Normal University,Guyuan 756099,China;
4. National and Local Joint Engineering Research Center for Digitalization of Cultural Heritage,Northwest University,Xi’an 710127,China

关键词:: 岩画图像分割; 多模态; 神经网络; 深度学习; 数据融合; 多尺度; 注意力机制

Keywords:: petroglyph image segmentation; multimodality; neural networks; deep learning; data fusion; multi-scale; attention mechanism

分类号:: TP301

DOI:: 10.20165/j.cnki.ISSN1673-629X.2025.0126

摘要:: 岩画(也称岩石艺术)是认知古代人类社会、文化、宗教和环境的重要文化遗产。为了精准分割岩画中复杂的、尺度多样的图案与符号,文中利用深度图像弥补 RGB 图像缺失的空间几何信息,提出自适应多尺度融合的 RGB-D(RGB-Depth)岩画图像分割模型 Adaptive Multi-scale Fusion Network(AMFNet),设计自适应多尺度特征融合网络融合深度空间信息与 2D 图像纹理信息,充分挖掘互补信息提高模型分割性能。该模型首先采用大卷积核扩大感受野,然后进行卷积核分解。其次利用动态空间选择机制选择不同尺度的卷积核对应的特征图,自适应地融合多尺度特征,增强岩画中不同尺度目标的空间特征表达能力。实验结果表明,该模型在 3D-pitoti 岩画数据集上的平均交并比(MIoU)和像素准确率(PA)均高于其他三个方法,比最新的 BEGL+UNet 方法分别提高 5. 3 百分点和 3. 0 百分点,验证了模型的有效性,同时验证了在前景与背景高度相似的岩画图像分割领域,深度图像的空间几何信息为分割模型提供了互补信息。

Abstract:: Rock arts are significant cultural heritage for understanding ancient human societies,cultures,religions,and environments. To accurately segment complex and multi-scale patterns and symbols in rock arts,we employ depth images to compensate for the missing spatial geometric information for RGB images. We propose an adaptive multi-scale fusion RGB-D (RGB-Depth) segmentation model,termed the Adaptive Multi-scale Fusion Network (AMFNet). The design of an adaptive multi-scale feature fusion network integrates depth spatial information with 2D image texture information to fully exploit complementary information and enhance the model ’s segmentation performance. This model first employs large convolutional kernels to expand the receptive field,then decomposes the conv-olutional kernels. Subsequently,it utilizes a dynamic spatial selection mechanism to select feature maps corresponding to convolutional kernels of different scales,adaptively fusing multi-scale features to enhance the spatial feature expression ability of targets of different scales in rock arts. Experimental results demonstrate that the proposed model achieves higher mean intersection over union (mIoU) and pixel accuracy (PA) on the 3D-Pitoti rock art dataset compared to the other three methods,outperforming the state-of-the-art BEGL+UNet approach by 5. 3 percentage points and 3. 0 percentage points in mIoU and PA,respectively,thereby validating its effectiveness. At the same time,it has been verified that the spatial geometric information of depth image provides complementary information for the seg-mentation model in the field of rock art image segmentation which the foreground and background are extremely similar.

相似文献/References:

[1]李彩云[],张著洪[]. 求解单目标区间数规划的改进型免疫优化算法[J].计算机技术与发展,2015,25(09):102.
　LI Cai-yun[],ZHANG Zhu-hong[]. Improved Immune Optimization Algorithm Solving Single-objective Interval Number Programming[J].,2015,25(05):102.
[2]王宇欣,方浩宇,张伟,等.注意力机制在情感分析中的应用研究[J].计算机技术与发展,2022,32(04):193.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 033]
　WANG Yu-xin,FANG Hao-yu,ZHANG Wei,et al.Application Research of Attention Mechanism in Sentiment Analysis[J].,2022,32(05):193.[doi:10. 3969 / j. issn. 1673-629X. 2022. 04. 033]
[3]金海燕,肖照林,蔡磊,等.显著性目标检测理论与应用研究综述[J].计算机技术与发展,2022,32(09):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 001]
　JIN Hai-yan,XIAO Zhao-lin,CAI Lei,et al.Review on Theory and Application of Saliency Target Detection[J].,2022,32(05):1.[doi:10. 3969 / j. issn. 1673-629X. 2022. 09. 001]
[4]张石清,张星楠,赵小明.基于音视频信息的深度多模态抑郁症识别综述[J].计算机技术与发展,2023,33(07):1.[doi:10. 3969 / j. issn. 1673-629X. 2023. 07. 001]
　ZHANG Shi-qing,ZHANG Xing-nan,ZHAO Xiao-ming.A Survey of Deep Multimodal Depression Recognition Based on Audio-visual Cues[J].,2023,33(05):1.[doi:10. 3969 / j. issn. 1673-629X. 2023. 07. 001]
[5]刘译善,孙涵.基于特征增强的 RGB-D 显著性目标检测[J].计算机技术与发展,2023,33(11):28.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 005]
　LIU Yi-shan,SUN Han.Feature Enhancement Based RGB-D Salient Object Detection[J].,2023,33(05):28.[doi:10. 3969 / j. issn. 1673-629X. 2023. 11. 005]
[6]段毛毛,魏燚伟.基于多模态交互网络的图像描述[J].计算机技术与发展,2024,34(05):44.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0039]
　DUAN Mao-mao,WEI Yi-wei.Multimodal Interaction Network for Image Captioning[J].,2024,34(05):44.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0039]
[7]张冬,梁平,顾进广.基于双融合图注意力网络多模态知识图谱链路预测[J].计算机技术与发展,2024,34(07):123.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0089]
　ZHANG Dong,LIANG Ping,GU Jin-guang.Multi-modal Knowledge Graph Link Prediction Based on Dual Fusion and Graph Attention Networks[J].,2024,34(05):123.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0089]
[8]曹茂俊,林世友,肖阳,等.面向测井领域的多模态知识图谱构建[J].计算机技术与发展,2024,34(09):195.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0132]
　CAO Mao-jun,LIN Shi-you,XIAO Yang,et al.Construction of Multi-modal Knowledge Graph for Logging Field[J].,2024,34(05):195.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0132]
[9]欧阳梦妮,樊小超,帕力旦·吐尔逊.基于目标对齐和语义过滤的多模态情感分析[J].计算机技术与发展,2024,34(10):171.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0209]
　OUYANG Meng-ni,FAN Xiao-chao,Palidan Turson.Multimodal Sentiment Analysis Based on Target Alignment and Semantic Filtering[J].,2024,34(05):171.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0209]
[10]周乐善,冯锡炜*.基于重构双注意力网络的图文情感分析[J].计算机技术与发展,2024,34(12):157.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0238]
　ZHOU Le-shan,FENG Xi-wei*.Images-text Sentiment Analysis Based on Reconstructed Dual Attention Networks[J].,2024,34(05):157.[doi:10.20165/j.cnki.ISSN1673-629X.2024.0238]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed271
全文下载/Downloads110
评论/Comments