«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1673-629X. 2019. 07. 003]
点击复制

基于多并行计算和存储的 CNN 加速器()

分享到：

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:: 29
期数:: 2019年07期

页码:: 11-16

栏目:: 智能、算法、系统工程

出版日期:: 2019-07-10

文章信息/Info

Title:: CNN Accelerator Based on Multi-parallel Computing and Storage

文章编号:: 1673-629X(2019)07-0011-06

作者:: 李宗凌¹ ; 汪路元¹ ; 禹霁阳¹ ; 程博文¹ ; 郝梁¹; 张伟功²; 1. 北京空间飞行器总体设计部,北京 100094; 2. 首都师范大学信息工程学院,北京 100048

Author(s):: LI Zong-ling 1 ; WANG Lu-yuan 1 ; YU Ji-yang 1 ; CHENG Bo-wen 1 ; HAO Liang 1 ; ZHANG Wei-gong 2; 1. Institute of Spacecraft System Engineering,Beijing 100094,China;2. School of Information Engineering,Capital Normal University,Beijing 100048,China

关键词:: 卷积神经网络; 并行计算和存储; 加速器; VGG-16 模型; 现场可编程逻辑器件

Keywords:: convolution neural network; parallel computing and storage; accelerator; VGG-16 model; FPGA

分类号:: TP39

DOI:: 10. 3969 / j. issn. 1673-629X. 2019. 07. 003

摘要:: 根据深度卷积神经网络(CNN)前向推理结构特点,设计了基于多并行计算和存储的深度卷积神经网络加速器,从运算效率与数据重用两个角度分析了卷积运算的并行特征,并研究了全连接层的全并行流水实现方式。该加速器采用并行流水结构提升计算效率,在卷积层运算中,充分利用多种卷积运算并行架构平衡运算效率与参数及数据载入带宽的需求,通过三种加速方式实现卷积层内全流水加速;在全连接层运算中,将乘累加运算设计成全流水处理架构,流水延时不超过 20 个处理时钟,并通过并行计算实现 16 倍加速。在基于ImageNet公开数据集验证实验中,该加速器每周期最多运行 2 304 次乘累加运算,在 150MHz 的工作频率下,峰值运算速率达到 691.2 Gops,能效比为 i7-6700-CPU 的2 700 倍以上,为 GTX-1050-GPU 的 290 倍以上。该加速器在硬件资源、计算精度、速度以及功耗等多方面达到良好平衡,便于在星载嵌入式环境应用。

Abstract:: According to the forward reasoning structure of the deep convolution neural network (CNN),a deep convolution neural network accelerator based on multi-parallel computation and storage is proposed and the parallel features of convolution operation are analyzed from two angles of operational efficiency and data reuse. The accelerator uses a parallel pipelining structure to improve operation efficiency,making full use of a variety of convolution computing parallel architectures to balance the operational efficiency and the demand for bandwidth of parameters and data,accelerating the whole flow in convolution layer by three level acceleration mode. In the full-connection layer operation,the multiplication and accumulation operation is designed as the full-pipeline processing operation. The pipeline delay does not exceed 20 processing clocks,and 16 times acceleration is realized by parallel computing. In the verification test based on ImageNet datasets,the accelerator runs 2 304 times per cycle by cumulative operation. At the working frequency of 150 MHz, the peak operation rate can reach 691.2 Gops,the energy efficiency ratio is more than 2 700 times that of i7-6700-CPU,which is more than 290 times of GTX-1050-GPU. The accelerator achieves a well balance in hardware resources,computing accuracy,speed and power consumption,and is easy to be used in spaceborne embedded environment.

相似文献/References:

[1]崔凤焦.表情识别算法研究进展与性能比较[J].计算机技术与发展,2018,28(02):145.[doi:10．3969/j．issn．1673－629X．2018．02．031]
　CUI Feng-jiao.Ｒesearch and Performance Comparison of Facial Expression Ｒecognition Algorithm[J].,2018,28(07):145.[doi:10．3969/j．issn．1673－629X．2018．02．031]
[2]张丹丹,李雷. 基于PCANet-RF的人脸检测系统[J].计算机技术与发展,2016,26(02):31.
　ZHANG Dan-dan,LI Lei. Face Detection System Based on PCANet-RF[J].,2016,26(07):31.
[3]陈强锐,谢世朋.基于深度学习的肺部肿瘤检测方法[J].计算机技术与发展,2018,28(04):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
　CHEN Qiang-rui,XIE Shi-peng.Lung Cancer Detection Method Based on Deep Learning[J].,2018,28(07):201.[doi:10.3969/ j. issn.1673-629X.2018.04.043]
[4]郭子琰,舒心,刘常燕,等.基于ReLU 函数的卷积神经网络的花卉识别算法[J].计算机技术与发展,2018,28(05):154.[doi:10．3969/j．issn．1673－629X．2018．05．035]
　GUO Ziyan,SHU Xin,LIU Changyan,et al.A Recognition Algorithm of Flower Based on Convolution Neural Network with ReLU Function[J].,2018,28(07):154.[doi:10．3969/j．issn．1673－629X．2018．05．035]
[5]缪宇杰,吴智钧,宫婧.基于3D 卷积的视频错帧筛选方法[J].计算机技术与发展,2018,28(05):179.[doi:10.3969/ j. issn.1673-629X.2018.05.040]
　MIAO Yu-jie,WU Zhi-jun,GONG Jing.A Wrong Temporal-order Frames Identification Method Based on 3D Convolution[J].,2018,28(07):179.[doi:10.3969/ j. issn.1673-629X.2018.05.040]
[6]吴玉枝,吴志红,熊运余.基于卷积神经网络的小样本车辆检测与识别[J].计算机技术与发展,2018,28(06):1.[doi:10.3969/ j. issn.1673-629X.2018.06.001]
　WU Yu-zhi,WU Zhi-hong,XIONG Yun-yu.Vehicle Detection and Recognition of a Few Samples Based on Convolutional Neural Network[J].,2018,28(07):1.[doi:10.3969/ j. issn.1673-629X.2018.06.001]
[7]李相桥,李晨,田丽华,等.卷积神经网络并行训练的优化研究[J].计算机技术与发展,2018,28(08):12.[doi:10.3969/ j. issn.1673-629X.2018.08.003]
　LI Xiang-qiao,LI Chen,TIAN Li-hua,et al.Research on Optimization of Parallel Training for Convolution Neural Network[J].,2018,28(07):12.[doi:10.3969/ j. issn.1673-629X.2018.08.003]
[8]邓宗平,赵启军,陈虎. 基于深度学习的人脸姿态分类方法[J].计算机技术与发展,2016,26(07):11.
　DEND Zong-ping,ZHAO Qi-jun,CHEN Hu. Face Pose Classification Method Based on Deep Learning[J].,2016,26(07):11.
[9]河海大学计算机与信息学院,江苏南京 0098.卷积网络的无监督特征提取对人脸识别的研究[J].计算机技术与发展,2018,28(06):17.[doi:10.3969/ j. issn.1673-629X.2018.06.004]
　DU Bai-sheng.Research on Unsupervised Feature Extraction Based on Convolutional Neural Network for Face Recognition[J].,2018,28(07):17.[doi:10.3969/ j. issn.1673-629X.2018.06.004]
[10]高翔,陈志,岳文静,等.基于视频场景深度学习的人物语义识别模型[J].计算机技术与发展,2018,28(06):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]
　GAO Xiang,CHEN Zhi,YUE Wen-jing,et al.Human Semantic Recognition Model Based on Video Scene Deep Learning[J].,2018,28(07):53.[doi:10.3969/ j. issn.1673-629X.2018.06.012]

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed864
全文下载/Downloads515
评论/Comments