Storm的可靠性与ack机制

96
orisonchan
2018.08.26 12:44* 字数 810

无论是实时处理还是离线处理,都会遇到一个不可避免的问题是,失败任务如何重做?Storm提供了一个ack机制。首先来看一下ISpout接口的方法。

1 Spout的可靠性

  • ISpout.java部分源码:
public interface ISpout extends Serializable {

    /**
     * Called when a task for this component is initialized within a worker on the cluster.
     * It provides the spout with the environment in which the spout executes.
     *
     * This includes the:
     *
     * @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.
     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
     * @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
     */
    void open(Map conf, TopologyContext context, SpoutOutputCollector collector);

    /**
     * When this method is called, Storm is requesting that the Spout emit tuples to the 
     * output collector. This method should be non-blocking, so if the Spout has no tuples
     * to emit, this method should return. nextTuple, ack, and fail are all called in a tight
     * loop in a single thread in the spout task. When there are no tuples to emit, it is courteous
     * to have nextTuple sleep for a short amount of time (like a single millisecond)
     * so as not to waste too much CPU.
     */
    void nextTuple();

    /**
     * Storm has determined that the tuple emitted by this spout with the msgId identifier
     * has been fully processed. Typically, an implementation of this method will take that
     * message off the queue and prevent it from being replayed.
     */
    void ack(Object msgId);

    /**
     * The tuple emitted by this spout with the msgId identifier has failed to be
     * fully processed. Typically, an implementation of this method will put that
     * message back on the queue to be replayed at a later time.
     */
    void fail(Object msgId);
}

可以看到,提供了两个方法ack()和fail()。里面的参数是一个叫msgId的东西。所谓msgId就是Tuple的msgId,每个元组在整个topology中都有唯一的ID。

在open中提供了一个参数叫SpoutOutputCollector,该collector是专门用于spout发送消息的,其中提供了一个方法叫List<Integer> emit(List<Object> tuple, Object messageId)。当然使用没有messageId的emit()也可,但是这样就不会触发ack机制。使用带有messageId的emit()方法后,该ID就会随着元组从拓扑传下去。这是ack机制的基础。

2 ack机制详细定义

Storm会跟踪spout产生的每一个tuple,给tuple指定ID就可以告诉Storm,无论执行成功还是失败,spout都要接收tuple所传过的每一个节点上返回的通知。一个十分重要的原则是:

  • 一个tuple处理成功,指的是这个tuple以及这个tuple所衍生的所有tuple都已经被每个bolt成功处理。

  • tuple处理失败是指tuple或这个tuple衍生的某一个tuple处理失败。

衍生这个概念在下文“锚定”章节会讲到。

另外,我们需要在spout中的fail方法中手动写代码重新发送失败的元组。

2.1 实现原理

Storm中有个特殊的task叫acker,查看源码会发现其也是个bolt。acker对于每个spout-tuple保存一个ack-val的校验值,它的初始值是0,然后每发射一个Tuple或Ack一个Tuple时,这个Tuple的id就要跟这个校验值异或一下,并且把得到的值更新为ack-val的新值。那么假设每个发射出去的Tuple都被ack了,那么最后ack-val的值就一定是0。Acker就根据ack-val是否为0来判断是否完全处理,如果为0则认为已完全处理。

3 bolt的可靠性

3.1 锚定

上文提到,元组有其衍生的元组。在IBolt的prepare()方法中,有着跟ISpout中open方法类似的一个参数是:OutputCollector collector,该collector不同于SpoutOutputCollector,是bolt用于发送元组的collector,在该collector中的emit()方法其中有一种是:List<Integer> emit(Tuple anchor, List<Object> tuple)。其中,anchor是该bolt在execute()方法中接收到的元组。而list则是要发送的元组数据。这样就会将anchor和将要发送到下个bolt的元组联系起来,称之为锚定

如果tuple没有锚定,则不会触发ack机制,无法保证可靠性。

所以每个bolt都要调用上述emit()方法并且在之后调用collector.ack(tuple)或在捕获异常的时候调用collector.fail(tuple)方法。

大数据
Web note ad 1