«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.cnki.ISSN1673-629X.2025.0010]
点击复制

利用BERT嵌入的视觉文本融合生成对抗网络()

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:: 2025年06期

页码:: 131-136

栏目:: 人工智能

出版日期:: 2025-06-10

文章信息/Info

Title:: Generative Adversarial Networks for Visual Text Fusion Using BERT Embeddings

文章编号:: 1673-629X(2025)06-0131-06

作者:: 李珍1; 毋涛2; 1. 西安工程大学计算机科学学院,陕西西安 710048;
2. 西安工程大学计算机科学学院纺织服装智能信息服务研究所,陕西西安 710048

Author(s):: LI Zhen1; WU Tao2; 1. School of Computer Science,Xi’an Polytechnic University,Xi’an 710048,China;
2. Textile and Garment Intelligent Information Service Institute,School of Computer Science,Xi’an Polytechnic University,Xi’an 710048,China

关键词:: 文本生成图; 生成对抗网络; BERT嵌入; 视觉文本融合; 门控注意力

Keywords:: text-generated images; generative adversarial networks; BERT embedding; visual text fusion; gated attention

分类号:: TP391.4

DOI:: 10.20165/j.cnki.ISSN1673-629X.2025.0010

摘要:: 针对文本生成图像任务中多阶段生成器之间语义纠缠以及生成图像与文本描述不一致、图像细节模糊等问题,提出了利用 BERT 文本嵌入的视觉文本融合生成对抗网络(BVT-GAN)模型。首先,利用 BERT 模型在 NLP 任务中的优秀文本编码与强泛化能力深度匹配文本语义。然后,增加门机制注意力模块,衡量每个单词特征对于图像区域的贡献程度来分配权重。最后,使用视觉文本融合模块(VFTBlock),将多粒度文本信息通过并行仿射变换与视觉特征进行融合,生成纹理更加丰富、目标物体边缘更加清晰的图像。实验结果表明,相比于基准模型 AttnGAN,在 CUB 和 COCO 数据集上该模型的 IS 性能提升了 10. 3% 和 19. 4% ,FID 指标分别降低了 8. 17 和 3. 77。同以往方法相比,该模型在视觉保真度和与输入文本描述的对齐方面具有显著的优势。

Abstract:: The problems of semantic entanglement between multi-stage generators,as well as inconsistency between generated images and text descriptions,and blurring of image details, are addressed in the text - to - generate - image task. A visual text fusion generative adversarial network (BVT-GAN) model using BERT text embedding is proposed. The excellent text encoding and strong generalization ability of the BERT model in NLP tasks are first exploited to deeply match text semantics. Then the gate mechanism attention module is added to measure how much each word feature contributes to the image region to assign weights. Finally,the visual text fusion module(VFTBlock) is used to fuse the multi-granularity text information with visual features through parallel affine transformations to generate images with richer textures and clearer edges of the target objects. The experimental results show that compared with the benchmark model AttnGAN,the IS performance of the proposed model on CUB and COCO datasets is improved by 10. 3% and 19. 4% ,and the FID metrics are reduced by 8. 17 and 3. 77,respectively. Compared with the previous methods,the proposed model has a significant advantage in terms of visual fidelity and alignment with the input text description.

相似文献/References:

[1]康嘉钰,苏凡军.基于生成对抗网络的长短兴趣推荐模型[J].计算机技术与发展,2020,30(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 007]
　KANG Jia-yu,SU Fan-jun.A Long-short-term Interests Recommendation Model Based on Generative Adversarial Networks[J].,2020,30(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 007]
[2]蒋文杰,罗晓曙*,戴沁璇.一种改进的生成对抗网络的图像上色方法研究[J].计算机技术与发展,2020,30(07):56.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 013]
　JIANG Wen-jie,LUO Xiao-shu*,DAI Qin-xuan.Research on an Improved Method of Generative Adversarial Networks Image Coloring[J].,2020,30(06):56.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 013]
[3]周传华,吴幸运,李鸣.基于 WGAN 单帧人脸图像超分辨率算法[J].计算机技术与发展,2020,30(09):29.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 006]
　ZHOU Chuan-hua,WU Xing-yun,LI Ming.Single Frame Face Images Super-resolution Algorithm Based on WGAN[J].,2020,30(06):29.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 006]
[4]庄兴旺,丁岳伟.多维度注意力和语义再生的文本生成图像模型[J].计算机技术与发展,2020,30(12):27.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 005]
　ZHUANG Xing-wang,DING Yue-wei.Text-to-image Model by Multidimensional Attention and Semantic Regeneration[J].,2020,30(06):27.[doi:10. 3969 / j. issn. 1673-629X. 2020. 12. 005]
[5]尹玉婷,肖秦琨.基于深度卷积生成对抗网络的图像生成[J].计算机技术与发展,2021,31(04):86.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 015]
　YIN Yu-ting,XIAO Qin-kun.Image Generation Based on Deep Convolution GenerativeAdversarial Networks[J].,2021,31(06):86.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 015]
[6]程换新,张志浩,刘文翰,等.基于生成对抗网络的图像识别[J].计算机技术与发展,2021,31(06):175.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 031]
　CHENG Huan-xin,ZHANG Zhi-hao,LIU Wen-han,et al.Image Recognition Based on Generative Adversarial Network[J].,2021,31(06):175.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 031]
[7]苑金辉,乔艳,费烨琳,等.基于深度迁移学习的心脏 MRI 图像左心室分割[J].计算机技术与发展,2021,31(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 007]
　YUAN Jin-hui,QIAO Yan,FEI Ye-lin,et al.Left Ventricular Segmentation in Cardiac MRI Images Based onDeep Transfer Learning[J].,2021,31(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 007]
[8]王田丰,胡谷雨,王睿,等.基于 AAE 的网络性能异常发现[J].计算机技术与发展,2021,31(07):113.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 019]
　WANG Tian-feng,HU Gu-yu,WANG Rui,et al.AAE-based Anomaly Detection for Network Performance[J].,2021,31(06):113.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 019]
[9]徐志鹏,卢官明,罗燕晴.基于 CycleGAN 的人脸素描图像生成[J].计算机技术与发展,2021,31(08):63.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 011]
　XU Zhi-peng,LU Guan-ming,LUO Yan-qing.Face Sketch Image Generation Based on CycleGAN[J].,2021,31(06):63.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 011]
[10]王荣达,刘宁钟,李强懿,等.一种基于生成对抗网络的轻量级图像翻译模型[J].计算机技术与发展,2021,31(11):52.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 009]
　WANG Rong-da,LIU Ning-zhong,LI Qiang-yi,et al.A Lightweight Image-to-image Translation Model Based on GAN[J].,2021,31(06):52.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 009]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed154
全文下载/Downloads105
评论/Comments