[1]顾文静,孙晨,王彬.基于OpenACC 的高性能计算并行优化研究与应用[J].计算机技术与发展,2018,28(04):65-70.[doi:10.3969/ j. issn.1673-629X.2018.04.014]
 GU Wen-jing,SUN Chen,WANG Bin.Research and Application of Parallel Optimization in High Performance Computing Based on OpenACC[J].,2018,28(04):65-70.[doi:10.3969/ j. issn.1673-629X.2018.04.014]
点击复制

基于OpenACC 的高性能计算并行优化研究与应用()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年04期
页码:
65-70
栏目:
智能、算法、系统工程
出版日期:
2018-04-10

文章信息/Info

Title:
Research and Application of Parallel Optimization in High Performance Computing Based on OpenACC
文章编号:
1673-629X(2018)04-0065-06
作者:
顾文静 孙晨 王彬
国家气象信息中心 高性能计算室,北京 100081
Author(s):
GU Wen-jingSUN ChenWANG Bin
High Performance Computing Division,National Meteorological Information Center,Beijing 100081,China
关键词:
神威·太湖之光 OpenACC GRAPES模式 长波辐射过程
Keywords:
Sunway Tauhu Light SystemOpenACCGRAPES modellong wave radiation
分类号:
TP301
DOI:
10.3969/ j. issn.1673-629X.2018.04.014
文献标志码:
A
摘要:
针对GPU加速时存在的编码复杂性、移植性差导致开发维护效率低下的缺陷,利用基于OpenACC指导命令的加速技术对传统的串行代码进行改写,从而达到提高开发效率、简化代码的目的.以GRAPES全球模式长波辐射过程为研究对象,首先通过编译选项对程序性能进行初步优化,再根据其数据依赖和访存特性,对数据和循环结构进行预处理并添加OpenACC指导命令实现循环级并行.实验结果表明,长波辐射过程并行计算结果正确,在不改变原有代码结构的基础上即可获得4~6倍的加速比,优化性能可比拟相同计算能力的Intel集群,虽然较GPU加速仍有差距,但大大增强了代码的可读性和可移植性,且随着编译器和硬件技术的发展,OpenACC有着广阔的发展空间.
Abstract:
For the inefficiency of development and maintenance caused by complex coding and poor portability in GPU acceleration,we make use of the acceleration technology based on the OpenACC to rewrite the traditional serial code for improving the development effi-ciency and simplifying the code. In this paper,taking the long wave radiation in GRAPES model as research object,the preliminary optimization of procedure performance is carried on by compiler options first,and then the data and loop structure is preprocessed with adding
OpenACC instruction to implement the parallel of loop according to the data dependence and memory accessing feature. The experiments indicate that the parallel computing of long wave radiation is correct with the acceleration of 4 to 6 times on basis of the original non-parallel code structure. The optimal performance can be compared to the Inter cluster in same computing power. Although still lower than GPU acceleration,the readability and portability of the code are greatly enhanced. With the development of the compiler and hardware
technology,the OpenACC has a broad space for development.
更新日期/Last Update: 2018-06-07