[1]王梓,梁正和,吴莹莹.基于 Kafka、Disruptor 技术对传统 ETL 的改进[J].计算机技术与发展,2018,28(11):26-29.[doi:10.3969/ j. issn.1673-629X.2018.11.006]
 WANG Zi,LIANG Zheng-he,WU Ying-ying.Improvement of Traditional ETL Based on Kafka and Disruptor Technology[J].,2018,28(11):26-29.[doi:10.3969/ j. issn.1673-629X.2018.11.006]
点击复制

基于 Kafka、Disruptor 技术对传统 ETL 的改进()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年11期
页码:
26-29
栏目:
智能、算法、系统工程
出版日期:
2018-11-10

文章信息/Info

Title:
Improvement of Traditional ETL Based on Kafka and Disruptor Technology
文章编号:
1673-629X(2018)11-0026-04
作者:
王梓梁正和吴莹莹
河海大学 计算机与信息学院,江苏 南京 211110
Author(s):
WANG ZiLIANG Zheng-heWU Ying-ying
School of Computer and Information,Hohai University,Nanjing 211100,China
关键词:
大数据ETLKafka数据仓库Disruptor
Keywords:
big dataETLKafkadata warehouseDisruptor
分类号:
TP311.133.1
DOI:
10.3969/ j. issn.1673-629X.2018.11.006
文献标志码:
A
摘要:
ETL 系统是构建和维护数据仓库的基本构件,对异构数据源中的业务数据进行抽取、清洗、转换可通过 ETL 工具将其装载到数据仓库中。 但是,当数据量上升到一定程度时,传统的 ETL 在数据处理速度以及数据的准确性方面会大大降低,并且不能满足数据源多种多样的变化需求。 针对如何同时具有高效的数据处理能力和通用的数据源访问能力的问题,提出一种对传统 ETL 进行改进的方案。 利用 Kafka 和 Disruptor 并发框架相结合,从数据源中抽取数据放入 Kafka 集群,结合 Disruptor 高吞吐和低延迟的特点,实现了数据高效的传输,使数据可以在不同数据源之间进行清洗和转换,同时在数据传输准确性方面有了极大的改进,保证了数据传输的一致性。
Abstract:
ETL system is the basic component of building and maintaining data warehouse,and business data in heterogeneous data sources can be extracted,cleaned,and transformed to be loaded into the data warehouse by ETL tools. However,when the data volume rises to a certain extent,the traditional ETL in terms of data processing speed and data accuracy will be greatly reduced,which can’t meet the diversified requirements of data source. Aiming at the problem of how to have both efficient data processing and universal data source accessing,we propose an improved scheme for traditional ETL. Combined Kafka with Disruptor concurrent framework,the data is drawn from the data source into Kafka cluster. According to high throughput and low delay for the isruptor,the efficient data transmission is achieved,enabling data to be cleaned and transformed between different data sources. At the same time,it greatly improves the accuracy of data transmission and ensures the consistency of data transmission.

相似文献/References:

[1]梁美红 张男楠 李建 伍东 胡永泉 杨静.一种钻井数据仓库ETL系统的设计[J].计算机技术与发展,2010,(03):250.
 LIANG Mei-hong,ZHANG Nan-nan,LI Jian,et al.Design of ETL System for Drilling Data Warehouse[J].,2010,(11):250.
[2]范金花 梁正和.报表系统中ETL通用框架的设计与研究[J].计算机技术与发展,2009,(06):202.
 FAN Jin-hua,LIANG Zheng-he.Design and Research of Universal ETL Framework in Report System[J].,2009,(11):202.
[3]黄怀毅 杨路明.一种轻量级架构的ETL系统设计与实现[J].计算机技术与发展,2008,(06):202.
 HUANG Huai-yi,YANG Lu-ming.Design and Implementation of Lightweight Architecture of ETL System[J].,2008,(11):202.
[4]王亮 葛玮.ETL过程的思考[J].计算机技术与发展,2008,(10):130.
 WANG Liang,GE Wei.Thinking in ETL Process[J].,2008,(11):130.
[5]于红蕾 华庆一 刘燕玲 罗养霞.数据仓库在电信统计分析中的应用[J].计算机技术与发展,2007,(08):201.
 YU Hong-lei,HUA Qing-yi,LIU Yan-ling,et al.Application of Data Warehouse to Statistics and Analysis System of Telecom[J].,2007,(11):201.
[6]王晓虹 刘莹 张艳凤[].高校数据集成系统的ETL设计与实现[J].计算机技术与发展,2011,(07):186.
 WANG Xiao-hong,LIU Ying,ZHANG Yan-feng.Design and Implementation of ETL Based on University Data Integration System[J].,2011,(11):186.
[7]严霄凤,张德馨.大数据研究[J].计算机技术与发展,2013,(04):168.
 YAN Xiao-feng,ZHANG De-xin.Big Data Research[J].,2013,(11):168.
[8]王雷,陈彦先,袁哲,等. 面向预拌混凝土行业的云计算[J].计算机技术与发展,2014,24(08):14.
 WANG Lei,CHEN Yan-xian,YUAN Zhe JI Xu. Research on Cloud Computing for Ready-mixed Concrete Industry[J].,2014,24(11):14.
[9]金宗泽,冯亚丽,文必龙,等. 大数据分析流程框架的研究[J].计算机技术与发展,2014,24(08):117.
 JIN Zong-ze,FENG Ya-l,WEN Bi-long,et al. Research on Framework of Big Data Analytic Process[J].,2014,24(11):117.
[10]张也弛,周文钦,石润华. 一种面向云的大数据完整性检测协议[J].计算机技术与发展,2014,24(09):68.
 ZHANG Ye-chi,ZHOU Wen-qin,SHI Run-hua. A Big Data Integrity Checking Protocol for Cloud[J].,2014,24(11):68.

更新日期/Last Update: 2018-11-10