«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2020. 12. 005]
点击复制

多维度注意力和语义再生的文本生成图像模型()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 30
期数:: 2020年12期

页码:: 27-33

栏目:: 智能、算法、系统工程

出版日期:: 2020-12-10

文章信息/Info

Title:: Text-to-image Model by Multidimensional Attention and Semantic Regeneration

文章编号:: 1673-629X(2020)12-0027-07

作者:: 庄兴旺; 丁岳伟; 上海理工大学光电信息与计算机工程学院,上海 200093

Author(s):: ZHUANG Xing-wang; DING Yue-wei; School of Option-electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China

关键词:: 文本生成图像; 生成对抗网络; 语义一致; 注意力; 语义文本再生

Keywords:: text-to-image; GAN; semantic consistency; attention; semantic text regeneration

分类号:: TP391. 41

DOI:: 10. 3969 / j. issn. 1673-629X. 2020. 12. 005

摘要:: 文本生成图像是结合计算机视觉和自然语言处理两个领域的综合性任务,从给定的文本描述生成图像有两个目标:视觉真实性和语义一致性。虽然在使用生成对抗网络(GAN)生成高质量和视觉逼真的图像方面取得了显著进展,但确保文本描述和视觉内容之间的语义一致性仍然是非常具有挑战性的。目前的方法由于文本和图像形式的多样性,仅在单词级别使用注意力并不能确保全局语义的一致性。因此,在 MirrorGAN 的基础上提出了一种改进的多维度的注意力协同模块(MCAM)和语义文本再生模块(STRM)来解决这些问题。 MCAM 使用了更为先进的 BERT 模型来进行文本处理,STRM 用于从生成的图像中重新生成文本描述,该图像在语义上与给定的文本描述对齐,使生成的图像更加贴合语义。最后,形成了基于多维度注意力以及语义文本再生的生成对抗网络模型(MirrorGAN++)。通过对两个公共基准数据集的深入实验,证明了 MirrorGAN++优于其他方法。

Abstract:: Text-to-image is a comprehensive task combining computer vision and natural language processing. Generating an image from a given text description has two goals:visual realism and semantic consistency. Although significant progress has been made in generating high-quality and visually realistic images using generative adversarial networks, guaranteeing semantic consistency between the text description and visual content remains challenging. The current approaches only using word-level attention cannot ensure global semantic consistency due to the diverse nature of both the text and image modalities. Therefore, we propose an improved multidimensional collaborative attentive module (MCAM) and semantic text regeneration module (STRM) based on MirrorGAN to solve these problems. MCAM uses a more advanced BERT model for text processing, and STRM is used to regenerate the text description from the generated image. The image is semantically aligned with the given text description, making the generated image more suitable for the semantics. Finally,a generative adversarial network model based on multi-dimensional attention and semantic text regenera-tion (MirrorGAN++) is formed. Thorough experiments on two public benchmark datasets demonstrate the superiority of MirrorGAN++ over other representative state-of-the-art methods.

相似文献/References:

[1]康嘉钰,苏凡军.基于生成对抗网络的长短兴趣推荐模型[J].计算机技术与发展,2020,30(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 007]
　KANG Jia-yu,SU Fan-jun.A Long-short-term Interests Recommendation Model Based on Generative Adversarial Networks[J].,2020,30(12):35.[doi:10. 3969 / j. issn. 1673-629X. 2020. 06. 007]
[2]蒋文杰,罗晓曙*,戴沁璇.一种改进的生成对抗网络的图像上色方法研究[J].计算机技术与发展,2020,30(07):56.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 013]
　JIANG Wen-jie,LUO Xiao-shu*,DAI Qin-xuan.Research on an Improved Method of Generative Adversarial Networks Image Coloring[J].,2020,30(12):56.[doi:10. 3969 / j. issn. 1673-629X. 2020. 07. 013]
[3]周传华,吴幸运,李鸣.基于 WGAN 单帧人脸图像超分辨率算法[J].计算机技术与发展,2020,30(09):29.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 006]
　ZHOU Chuan-hua,WU Xing-yun,LI Ming.Single Frame Face Images Super-resolution Algorithm Based on WGAN[J].,2020,30(12):29.[doi:10. 3969 / j. issn. 1673-629X. 2020. 09. 006]
[4]尹玉婷,肖秦琨.基于深度卷积生成对抗网络的图像生成[J].计算机技术与发展,2021,31(04):86.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 015]
　YIN Yu-ting,XIAO Qin-kun.Image Generation Based on Deep Convolution GenerativeAdversarial Networks[J].,2021,31(12):86.[doi:10. 3969 / j. issn. 1673-629X. 2021. 04. 015]
[5]程换新,张志浩,刘文翰,等.基于生成对抗网络的图像识别[J].计算机技术与发展,2021,31(06):175.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 031]
　CHENG Huan-xin,ZHANG Zhi-hao,LIU Wen-han,et al.Image Recognition Based on Generative Adversarial Network[J].,2021,31(12):175.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 031]
[6]苑金辉,乔艳,费烨琳,等.基于深度迁移学习的心脏 MRI 图像左心室分割[J].计算机技术与发展,2021,31(06):35.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 007]
　YUAN Jin-hui,QIAO Yan,FEI Ye-lin,et al.Left Ventricular Segmentation in Cardiac MRI Images Based onDeep Transfer Learning[J].,2021,31(12):35.[doi:10. 3969 / j. issn. 1673-629X. 2021. 06. 007]
[7]王田丰,胡谷雨,王睿,等.基于 AAE 的网络性能异常发现[J].计算机技术与发展,2021,31(07):113.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 019]
　WANG Tian-feng,HU Gu-yu,WANG Rui,et al.AAE-based Anomaly Detection for Network Performance[J].,2021,31(12):113.[doi:10. 3969 / j. issn. 1673-629X. 2021. 07. 019]
[8]徐志鹏,卢官明,罗燕晴.基于 CycleGAN 的人脸素描图像生成[J].计算机技术与发展,2021,31(08):63.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 011]
　XU Zhi-peng,LU Guan-ming,LUO Yan-qing.Face Sketch Image Generation Based on CycleGAN[J].,2021,31(12):63.[doi:10. 3969 / j. issn. 1673-629X. 2021. 08. 011]
[9]王荣达,刘宁钟,李强懿,等.一种基于生成对抗网络的轻量级图像翻译模型[J].计算机技术与发展,2021,31(11):52.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 009]
　WANG Rong-da,LIU Ning-zhong,LI Qiang-yi,et al.A Lightweight Image-to-image Translation Model Based on GAN[J].,2021,31(12):52.[doi:10. 3969 / j. issn. 1673-629X. 2021. 11. 009]
[10]植炜基,刘春雨,郑婉君,等.基于生成对抗网络的人脸表情识别技术综述[J].计算机技术与发展,2021,31(增刊):1.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 001]
　ZHI Wei-ji,LIU Chun-yu,ZHENG Wan-jun,et al.Survey of Facial Expression Recognition Technology Based onGenerative Adversarial Network[J].,2021,31(12):1.[doi:10. 3969 / j. issn. 1673-629X. 2021. S. 001]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1538
全文下载/Downloads870
评论/Comments