Spark Streaming Dynamic Resource Allocation

Problem Statement

DRA has already been implemented since Spark 1.2 . However the existing Spark DRA on Yarn implementation does not embody the specific property of Spark Streaming.

Spark DRA works when there are some executors being idle for removeExecutorInterval time,then they will be removed or when there is a backlog of pending tasks waiting to be rescheduled,then new executors will be added。This mechanism works fine for Long-Time stage。 Spark Streaming is known as micro batch processing,hence most of time the duration of batch is small. It will cause churn when removing executors at the end of the batch and adding executors at the beginning of next batch.

In real-time data processing, data slowly increase or slowly down,so the adjacent batches have almost the same data to process.

The perfect result of spark streaming is processing time equal to duration. So the key concept is reducing/increasing resource until processing time infinitely close to duration .

Goals

The goal is to make processing time infinitely close to duration by reducing/increasing resource in spark streaming . And we also hope having a reasonable stable time after short-time adjustment which means resource adjustment rarely happens

Design Summary

The same as spark DRA, Spark Streaming DRA is also disabled,but can be enabled by a new Spark conf property spark.streaming.dynamicAllocation.enabled. All related Spark Streaming DRA properties introduced will bepackaged in the spark.streaming.dynamicAllocation.* namespace

Adding Executors

The job in Spark Streaming is time-sensitive . Once there is a batch delayed it triggers Adding-Executors action. This Action will request the number of spark.streaming.dynamicAllocation.maxExecutors executors from Yarn immediately in greedy way since we hope we can eliminate the delay as soon as possible ,at the same time,we also should take care of the situation when requesting resource with multi rounds may have resource scarcity in shared cluster.

Removing Executors

As mentioned before, Spark Streaming, time-sensitive , so the action removing executors is not allowed to consume too much time of duration. So executors marked as removed will be added to pendingToRemove set so new tasks will not be launched in them ,then we acknowledge yarn to kill them asynchronously.

Algorithm of DRA

To calculate the number of executor should be removed every round,the formula is used as following

val totalRemoveExecutorNum = Math.round(
currentExecutors * (
(duration.toDouble - processDuration) / duration - reserveRate)
)
val actualShouldRemoveExecutorNumInThisRound = totalRemoveExecutorNum / releaseRounds 

In this formula, we suppose processing time has a strong relationship with executor number,however,they are not linear relationship. To fix this, we add some new parameters like reserveRate and releaseRounds to make sure we fit this non-linear relationship .

We also provide a heuristic strategy to reduce executor number. This formula will be calculate in every round, so every round the number will be readjusted circularly ,and finally actualShouldRemoveExecutorNumInThisRound will converge to zero .

Implementation of DRA

New classes should be added in spark streaming module:

package org.apache.spark.streaming
private[spark] class StreamingExecutorAllocationManager(client: ExecutorAllocationClient,
                                                        duration: Long,
                                                        steamingListenerBus: StreamingListenerBus,
                                                        listenerBus: LiveListenerBus,
                                                        conf: SparkConf) extends Logging {
  allocationManager =>
......
}

private class StreamingExecutorAllocationListener extends SparkListener {
......
}

private class StreamingSchedulerListener extends StreamingListener {
.....
}

StreamingExecutorAllocationManager is initialed in StreamingContext

Class affected in spark core module:

package org.apache.spark
private[spark] trait ExecutorAllocationClient {
   //blackList who can remove executors immediately make Yarn remove them asynchronously possible
  def addExecutorToPendingStatus(executorId: String): Unit = {}
  
  //cause spark streaming is initial late then spark core module,
  // we need to know how many executors we already have when 
  // spark streaming is initaled
  def executors(): List[String] = { List() }
}

SparkContext and CoarseGrainedSchedulerBackend both are affected because they are extend from ExecutorAllocationClient.

JobScheduler is also should be modified so before batch job submitted we can adjust resource.

package org.apache.spark.streaming.scheduler
private[streaming]class JobScheduler(val ssc: StreamingContext) extends Logging {

def submitJobSet(jobSet: JobSet) {
    if (jobSet.jobs.isEmpty) {
      logInfo("No jobs added for time " + jobSet.time)
    } else {
      listenerBus.post(StreamingListenerBatchSubmitted(jobSet.toBatchInfo))
      jobSets.put(jobSet.time, jobSet)

      //before job submitted,we should compute resource actually required
      ssc.executorAllocationManager match {
        case Some(eam)=>  eam.run()
        case None => //do nothing
      }
   .........
    }
  }
......}

DRA Available Properties

Enable DRA:

spark.streaming.dynamicAllocation.enabled=true

Upper/Lower bound for the number of executors if dynamic allocation is enabled.

spark.streaming.dynamicAllocation.minExecutors=0
spark.streaming.dynamicAllocation.maxExecutors=50

More message will be printed in log if DRA is enabled. Default is false

spark.streaming.dynamicAllocation.debug=true

Rounds(Batches) will be used to release the resource calculated in current round. Default is 5

spark.streaming.dynamicAllocation.releaseRounds=5

The number of rounds(Batches) should be remembered. We can increase this number if batch processing time is unstable . this number affects processDuration in (duration.toDouble - processDuration) .Default is 1

spark.streaming.dynamicAllocation.rememberBatchSize=1

DRA delays to work when the specific number of rounds has been submitted. Default is 10

spark.streaming.dynamicAllocation.delay.rounds=10

Make sure the resource is more than reserveRate * current number of Executors. More useful than minExecutorNumber. Default is 0.2

spark.streaming.dynamicAllocation.reserveRate=0.2
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 158,847评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,208评论 1 292
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,587评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,942评论 0 205
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,332评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,587评论 1 218
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,853评论 2 312
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,568评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,273评论 1 242
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,542评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,033评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,373评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,031评论 3 236
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,073评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,830评论 0 195
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,628评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,537评论 2 269

推荐阅读更多精彩内容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的阅读 13,304评论 5 6
  • 惯有思维.. 今日训练有点成效,原来重新学习弯曲膝盖是这么困难的一件事情 一直还蛮喜欢汤唯曾经唱过的一首歌《我曾经...
    Shawn_阅读 272评论 0 0
  • 这两天娱乐新闻也好,微博也好,都在大力狂刷一部现象级的大剧《我的前半生》,在好奇心的驱使下,我立紧随潮流的下载...
    我和你们阅读 298评论 0 1