这些常用ETL任务调度框架组件,你都知道几个?

工具资源:去公众号【taskctl】回复内容 "软件" 即可

1. Cron-like Scheduler

1.1 Python任务调度框架 APScheduler

一个基于Python,提供类似Cron功能,并深受Java Quartz 影响的轻量级进程内任务调度框架。

图片源自网络

Advanced Python Scheduler (APScheduler) is a light but powerful in-process task scheduler that lets you schedule jobs (functions or any python callables) to be executed at times of your choosing.

This can be a far better alternative to externally run cron scripts for long-running applications (e.g. web applications), as it is platform neutral and can directly access your application's variables and functions.

The development of APScheduler was heavily influenced by theQuartz task scheduler written in Java. APScheduler provides most of the major features that Quartz does, but it also provides features not present in Quartz (such as multiple job stores).

1.2 任务调度框架 cron4j

cron4j 是一个Java的任务调度框架,类似于UNIX系统下的crontab.

使用示例:

1.3 conclusion:

not web-based application

need to program

just only scheduler

2. Gearman分布式远程过程处理框架

2.1Outline

Gearmand 是 Gearman 的作业服务器组件,Gearman是一个分发任务的程序框架,可以用在各种场合,与Hadoop相 比,Gearman更偏向于任务分发功能。它的 任务分布非常 简单,简单得可以只需要用脚本即可完成。Gearman最初用于LiveJournal的图片resize功能,由于图片resize需要消耗大量计算资 源,因此需要调度到后端多台服务器执行,完成任务之后返回前端再呈现到界面。

图片源自网络

2.2 Features

Open Source - It's free! (in both meanings of the word) Gearman has an active open source community that is easy to get involved with if you need help or want to contribute.

Multi-language - There are interfaces for a number of languages, and this list is growing. You also have the option to write heterogeneous applications with clients submitting work in one language and workers performing that work in another.

Flexible - You are not tied to any specific design pattern. You can quickly put together distributed applications using any model you choose, one of those options being Map/Reduce.

Fast - Gearman has a simple protocol and interface with a new optimized server in C to minimize your application overhead.

Embeddable - Since Gearman is fast and lightweight, it is great for applications of all sizes. It is also easy to introduce into existing applications with minimal overhead.

No single point of failure - Gearman can not only help scale systems, but can do it in a fault tolerant way.

3. ETL 商业免费工具: TASKCTL Web应用版

TASKCTL免费Web版作为目前唯一的ETL调度领域商业级免费软件,保证100% free,绝无黑盒代码。它志在促进该领域的独立发展,使调度在ETL领域独立化、专业化、系统化。从而使项目实施更轻松便捷,使企业基础架构更清晰、更易管理。

图片源自网络

(一)主要适用环境

操作系统:aix/linux/unix等(由于采用标准c语言构建,理论上可应用于各种主流unix系列)

项目规模:适用于中小型ETL项目

ETL工具环境:TASKCTL由于采用任务插件驱动机制,因此,可支持各种存储过程、各种脚本、以及诸如Datastage\Informatica\kettle等各种ETL工具任务。

(二)主要功能

核心调度功能:主要可以完成串行、并行、依赖、互斥、执行计划、定时、容错、循环、条件分支、远程、负载均衡、自定义条件等各种不同的核心调度功能。

扩展功能

网络扩展:可实现单机部署、多服务部署、远程代理部署、集群部署等多种网络部署

应用扩展:技术平台设计有专门的应用API接口,可实现更多的调度应用。

任务类型扩展:为了适应不同类型的任务调度,平台可通过具有统一模版、统一接口的插件进行快速扩展。

应用功能:配置功能、流程设计功能、监控功能、各种查询功能以及诸如重跑、重置等人工干预功能。

(三) 主要创新

无数据库设计:国内首款专业无数据库调度技术平台。

插件机制:业界唯一通过具有统一应用接口的插件来扩展任务类型的技术平台。

流程设计代码开发设计理念:调度领域唯一通过文本代码设计流程的调度技术平台。具有语法代码特征的文本代码设计与传统记录表格对话框方式相比,操作更方便、设计更灵活、可读性更强。

推荐阅读更多精彩内容