基于树莓派的高效卷积优化方法-《计算机技术与发展》

文章信息/Info

Author(s):: GUO Xiao-long; NIU Jin-yu; DU Yong-ping; School of Information Technology,Beijing University of Technology,Beijing 100124,China

Keywords:: deep learning model inference acceleration; computational graph optimization; operator fusion; convolution optimization; mobile inference framework

摘要:: 针对卷积神经网络(CNN)的巨大参数量和计算量而导致在树莓派等低功耗的边缘设备模型推理过程中产生耗时较大的问题,对网络上现有的开源推理框架进行了深入研究及对比分析,发现这些
都属于通用型推理框架,并不能针对树莓派设备进行极致推理优化。因此,提出了基于 RoofLine 模型的定量分析方法,从访存和运算二个维度对 Mobilenet 等移动端网络架构模型进行卷积推理优化。研究采用了计算图优化方法,利用算子融合和内存重排做推理预处理,从而减少推理过程的计算量和访存开销;同时针对每一层的卷积参数量和特性,提出了 9 宫格分块策略和 NEON 指令流水线级别的优化。实验表明,所提出的优化方法在不同的分辨率下,相比腾讯的开源框架 NCNN、阿里 MNN 和商汤 PPL. NN 在推理速度上取得了高于 3 倍的性能优化。

Abstract:: In response to the problem of time-consuming in reasoning process of low-power edge devices such as Raspberry Pi due to thehuge number of parameters and calculation amount of convolutional neural network ( CNN) ,an in-depth study and comparative analysisof the existing open source reasoning framework on the network found that these are general reasoning frameworks,which cannot beoptimized for the ultimate reasoning of Raspberry PI devices. Therefore,we propose a quantitative analysis method based on the RoofLinemodel to optimize?
the convolutional reasoning of mobile terminal network architecture models such as Mobilenet from two dimensions ofmemory access and operation. Firstly,by using the computational graph optimization method,operator fusion and memory arrangement asinference preprocessing, the amount of computation and memory access overhead in the inference process are reduced. Secondly,according to the CNN parameters and characteristics of each layer,the 9 -grid block strategy and the optimization of NEON instructionpipeline are proposed. Experiments show that the proposed method achieves more than three times performance optimization underdifferent resolutions in inferencing speed compared with Tencent ’ s open - source framework NCNN, Alibaba MNN and SensetimePPL. NN.