[1]王建荣,华连生,唐怀瓯,等.数值预报产品分布式处理与存储系统设计[J].计算机技术与发展,2018,28(02):167-172.[doi:10.3969/j.issn.1673-629X.2018.02.036]
 WANG Jian-rong,HUA Lian-sheng,TANG Huai-ou,et al.Design of Distributed NWP Data Processing and Storage System[J].,2018,28(02):167-172.[doi:10.3969/j.issn.1673-629X.2018.02.036]
点击复制

数值预报产品分布式处理与存储系统设计()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年02期
页码:
167-172
栏目:
应用开发研究
出版日期:
2018-02-10

文章信息/Info

Title:
Design of Distributed NWP Data Processing and Storage System
文章编号:
1673-629X(2018)02-0167-06
作者:
王建荣华连生唐怀瓯王 云王 静
安徽省气象信息中心,安徽 合肥 230031
Author(s):
WANG Jian-rongHUA Lian-shengTANG Huai-ouWANG YunWANG Jing
Anhui Meteorological Information Center,Hefei 230031,China
关键词:
Quartz解码日志文件KafkaHBaseSolr协处理器
Keywords:
Quartzdecoding log fileKafkaHBaseSolrcoprocessor
分类号:
TP302
DOI:
10.3969/j.issn.1673-629X.2018.02.036
文献标志码:
A
摘要:
气象数值预报产品数据日益增长,传统的关系型数据库对其存储和管理能力不足,查询规模较大的历史数据时效率较低。针对上述问题,设计了分布式的数值预报产品处理与存储系统。通过 Quartz 任务调度定时采集数值预报产品文件;运用 Kafka 分布式消息队列解耦数值预报产品解码与入库程序;将解码日志文件、原始产品文件和解码得到的要素 GRIB 文件写入 HDFS 分布式文件系统,应用 MapReduce 分布式程序将解码日志记录存入 HBase。因 HBase 对 Rowkey 的一级索引支持较好,而对多条件查询支持不足,需辅助 Solr 索引加以优化。HBase 接收数据时自动触发协处理器同步记录到 Solr 索引库,实现了 HBase 的二级索引。测试结果表明,产品文件写入 Hadoop 文件系统平均速度为 82.54 MB/s,而 HBase 最快入库速度可达每秒 13 677 条,数据检索结果返回时效达到毫秒级,能够满足业务应用中对数值预报产品存储和检索时效的要求。
Abstract:
With the rapid growth of global and regional numerical weather prediction (NWP) products,traditional relational database has insufficient storage and management for the mass data and its query efficiency is low in long-time-series data accessing.Therefore,we design a distributed data processing and storage system.The system copies NWP files from source folders by using the Kafka Quartz scheduler and decouples NWP products decoding and storage programs by using Kafka distributed message queue.It also writes the decoding log files,
source products and element GRIB files into HDFS and then inserts the decoding log file records into HBase.Because the HBase has better support for the first level index of Rowkey,but it is not enough to support the multi condition query,it is necessary to optimize the query using Solr index.HBase receives the data meanwhile it automatically triggers the coprocessor to write records synchronously to SolrCloud,which realizes the multi condition index in HBase.The test shows that the average speed of product file to Hadoop file system is 82.54 MB
per second,fastest storage speed can be up to 13 677 records per second,and the response time of data retrieval is up to millisecond level,thus it can meet the performance requirement of the storage and retrieval time of NWP data in business applications.
更新日期/Last Update: 2018-03-29