数值预报产品分布式处理与存储系统设计-《计算机技术与发展》

文章信息/Info

Author(s):: WANG Jian-rong; HUA Lian-sheng; TANG Huai-ou; WANG Yun; WANG Jing; Anhui Meteorological Information Center，Hefei 230031，China

摘要:: 气象数值预报产品数据日益增长，传统的关系型数据库对其存储和管理能力不足，查询规模较大的历史数据时效率较低。针对上述问题，设计了分布式的数值预报产品处理与存储系统。通过 Quartz 任务调度定时采集数值预报产品文件;运用 Kafka 分布式消息队列解耦数值预报产品解码与入库程序;将解码日志文件、原始产品文件和解码得到的要素 GＲIB 文件写入 HDFS 分布式文件系统，应用 MapＲeduce 分布式程序将解码日志记录存入 HBase。因 HBase 对Ｒowkey 的一级索引支持较好，而对多条件查询支持不足，需辅助 Solr 索引加以优化。HBase 接收数据时自动触发协处理器同步记录到 Solr 索引库，实现了 HBase 的二级索引。测试结果表明，产品文件写入 Hadoop 文件系统平均速度为 82．54 MB/s，而 HBase 最快入库速度可达每秒 13 677 条，数据检索结果返回时效达到毫秒级，能够满足业务应用中对数值预报产品存储和检索时效的要求。

Abstract:: With the rapid growth of global and regional numerical weather prediction (NWP) products，traditional relational database has insufficient storage and management for the mass data and its query efficiency is low in long－time－series data accessing．Therefore，we design a distributed data processing and storage system．The system copies NWP files from source folders by using the Kafka Quartz scheduler and decouples NWP products decoding and storage programs by using Kafka distributed message queue．It also writes the decoding log files，
source products and element GＲIB files into HDFS and then inserts the decoding log file records into HBase．Because the HBase has better support for the first level index of Ｒowkey，but it is not enough to support the multi condition query，it is necessary to optimize the query using Solr index．HBase receives the data meanwhile it automatically triggers the coprocessor to write records synchronously to SolrCloud，which realizes the multi condition index in HBase．The test shows that the average speed of product file to Hadoop file system is 82．54 MB
per second，fastest storage speed can be up to 13 677 records per second，and the response time of data retrieval is up to millisecond level，thus it can meet the performance requirement of the storage and retrieval time of NWP data in business applications．