[1]李磊.ETL 任务集群调度方法[J].计算机技术与发展,2018,28(11):35-38.[doi:10.3969/ j. issn.1673-629X.2018.11.008]
 LI Lei.One Scheduling Method for ETL Task Cluster[J].,2018,28(11):35-38.[doi:10.3969/ j. issn.1673-629X.2018.11.008]
点击复制

ETL 任务集群调度方法()
分享到:

《计算机技术与发展》[ISSN:1006-6977/CN:61-1281/TN]

卷:
28
期数:
2018年11期
页码:
35-38
栏目:
智能、算法、系统工程
出版日期:
2018-11-10

文章信息/Info

Title:
One Scheduling Method for ETL Task Cluster
文章编号:
1673-629X(2018)11-0035-04
作者:
李磊
北方工业大学 计算机学院,北京 100144
Author(s):
LI Lei
School of Computer,North China University of Technology,Beijing 100144,China
关键词:
数据仓库抽取-转换-加载quartz 集群调度贪婪调度算法Kettle
Keywords:
data warehouseextract-transfer-loadquartz cluster schedulinggreedy schedulingKettle
分类号:
TP311
DOI:
10.3969/ j. issn.1673-629X.2018.11.008
文献标志码:
A
摘要:
随着数据仓库规模越来越大,ETL 任务也不断增多,单机调度 ETL 任务导致多数 ETL 任务不能按时运行或者不能运行情况时常发生。对基于 Kettle 的 ETL 任务调度方法进行了研究,根据这种 ETL 任务特性,ETL 任务调度方法作用的对象是一批相互没有制约的任务。 把 ETL 任务调度分为两个阶段:任务分配与任务执行。 为了避免集群负载的不均衡,根据 ETL 任务的关键特性数据源的数据量,使用贪婪调度算法进行 ETL 任务分配。 为了避免一些 ETL 任务获取不到机会执行,采用动态调整任务优先级的方法,使用高响应比优先调度算法执行 ETL 任务。 通过 ETL 任务测试该集群调度方法的效率,主要比较 ETL 任务执行时所消耗的 CPU、内存,以及一次全部的 ETL 任务执行完成后使用的总时间,并与轮转调度算法进行对比,结果表明效率高于轮转算法。
Abstract:
As the scale of data warehouse becomes larger and larger,ETL tasks are also increasing. As a result of single-machine scheduling of ETL tasks,most ETL tasks cannot run on time or cannot run frequently. The ETL task scheduling method based on Kettle is studied. According to the ETL task characteristics,the ETL task scheduling method acts on a batch of tasks with no restriction on each other.ETL task scheduling is divided into two stages:task assignment and task execution. In order to avoid unbalanced cluster loads,greedy scheduling algorithm is used to allocate ETL tasks according to the data volume of key characteristic data sources of ETL tasks. In order to avoid some ETL tasks being unable to get the opportunity to execute,we adopt the method of dynamically adjusting task priority and use priority scheduling algorithm of high response rate to execute ETL tasks. The efficiency of the cluster scheduling method is tested by the ETL task,and the CPU and memory consumed during the execution of the ETL task as well as the total time used after the completion of all ETL tasks are mainly compared,which shows that its efficiency is higher than the rotation scheduling algorithm.

相似文献/References:

[1]吉同路 柏永飞 王立松.住宅与房地产电子政务中数据挖掘的应用研究[J].计算机技术与发展,2010,(01):235.
 JI Tong-lu,BAI Yong-fei,WANG Li-song.Study and Application of Data Mining in E-government of House and Real Estate Industry[J].,2010,(11):235.
[2]梁美红 张男楠 李建 伍东 胡永泉 杨静.一种钻井数据仓库ETL系统的设计[J].计算机技术与发展,2010,(03):250.
 LIANG Mei-hong,ZHANG Nan-nan,LI Jian,et al.Design of ETL System for Drilling Data Warehouse[J].,2010,(11):250.
[3]林智超 孙蕾.多维数据库模式下联机数据分析技术的实施[J].计算机技术与发展,2010,(05):75.
 LIN Zhi-chao,SUN Lei.Implementation of OLAP Based on Multidimensional Database[J].,2010,(11):75.
[4]汪涛.医院数据仓库数据模型设计[J].计算机技术与发展,2010,(05):191.
 WANG Tao.Data Model Design of Hospital Data Warehouse[J].,2010,(11):191.
[5]秦学勇 刘栋[].数据仓库的可扩展性研究与设计[J].计算机技术与发展,2009,(05):65.
 QIN Xue-yong,LIU Dong.Research and Design on Scalability of Data Warehouse[J].,2009,(11):65.
[6]吴先斌 林国龙 杨斌 王洁.数据仓库在口岸物流中的应用[J].计算机技术与发展,2009,(12):233.
 WU Xian-bin,LIN Guo-long,YANG Bin,et al.Data Warehouse Application in Port Logistics[J].,2009,(11):233.
[7]林昕 李心科.一种OLAP海量数据载入技术的研究[J].计算机技术与发展,2008,(02):51.
 LIN Xin,LI Xin-ke.Study on OLAP Mass Data Loading Technology[J].,2008,(11):51.
[8]闫娜娜 刘锋 李锡娟 耿波.支持CRM分析的数据仓库多维启动模型[J].计算机技术与发展,2008,(05):67.
 YAN Na-na,LIU Feng,LI Xi-juan,et al.A Multidimentional Starter Model of Data Warehouse to Support CRM Analysis[J].,2008,(11):67.
[9]王预.数据仓库与数据挖掘的关系及其安全性问题[J].计算机技术与发展,2008,(05):144.
 WANG Yu.Relation of Data Warehouse and Data Mining and Its Safety[J].,2008,(11):144.
[10]周蓝粢 周肆清 杨炼.数据仓库技术在医院病情诊疗分析中的应用研究[J].计算机技术与发展,2008,(08):230.
 ZHOU Lan-zi,ZHOU Si-qing,YANG Lian.Research and Application of Data Warehouse in Hospital for Analysis of Diagnosis of State of Illnesses[J].,2008,(11):230.

更新日期/Last Update: 2018-11-10