Spark 源码分析（四）: Application 的注册

在前面一篇文章中分析到了 SparkContext 中的 TaskScheduler 创建及启动。

在 StandaloneSchedulerBackend start 代码里除了创建了一个 DriverEndpoint 用于 standalone 模式下用来和 Executor 通信之外还会创建一个 AppClient。

这个 AppClient 会向 Master 注册 Application，然后 Master 会通过 Application 的信息为它分配 Worker。

创建这个 AppClient 对象之前会去获取一些必要的参数。

        // 拿到 Driver RpcEndpoint 的地址
        val driverUrl = RpcEndpointAddress(
      sc.conf.get("spark.driver.host"),
      sc.conf.get("spark.driver.port").toInt,
      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
    // 一些启动参数
    val args = Seq(
      "--driver-url", driverUrl,
      "--executor-id", "{{EXECUTOR_ID}}",
      "--hostname", "{{HOSTNAME}}",
      "--cores", "{{CORES}}",
      "--app-id", "{{APP_ID}}",
      "--worker-url", "{{WORKER_URL}}")
    // executor 额外的 java 选项
    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
      .map(Utils.splitCommandString).getOrElse(Seq.empty)
    // executor 额外的环境变量
    val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
      .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
    // executor 额外的依赖
    val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
      .map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
      
    val testingClassPath =
      if (sys.props.contains("spark.testing")) {
        sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
      } else {
        Nil
      }

    // Start executors with a few necessary configs for registering with the scheduler
    val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
    val javaOpts = sparkJavaOpts ++ extraJavaOpts
    // 将上面的那些信息封装成一个 command 对象
    val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
      args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
    // 获取 application UI 的地址
    val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
    // 获取 executor 配置的 core 数量
    val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)

然后用上面的这些参数结合 SparkContext 中的一些数据封装一个 ApplicationDescription 对象，这对象里封装了一些信息，可以看看。

private[spark] case class ApplicationDescription(
    name: String,// 名字
    maxCores: Option[Int],// 最多使用的 core 数量
    memoryPerExecutorMB: Int,// 每个 Executor 分配的内存
    command: Command,// 启动命令
    appUiUrl: String,// application 的 UI Url 地址
    eventLogDir: Option[URI] = None,// event 日志目录
    // short name of compression codec used when writing event logs, if any (e.g. lzf)
    eventLogCodec: Option[String] = None,
    coresPerExecutor: Option[Int] = None,
    // number of executors this application wants to start with,
    // only used if dynamic allocation is enabled
    initialExecutorLimit: Option[Int] = None,
    user: String = System.getProperty("user.name", "<unknown>")) {

  override def toString: String = "ApplicationDescription(" + name + ")"
}

封装好 ApplicationDescription 对象之后，根据这个对象创建一个 StandaloneAppClient 对象，然后调用 StandaloneAppClient 对象的 start 方法。

// 封装成一个 AppclientDescription 对象    
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
  // 创建 StandaloneAppClient 对象
    client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
  // 调用 StandaloneAppClient 的 start 方法
    client.start()
    launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
  // 等待注册状态的更新
    waitForRegistration()
    launcherBackend.setState(SparkAppHandle.State.RUNNING)

StandaloneAppClient 的 start 方法会去创建并注册一个 ClientEndpoint 用来向 master 注册 Application。

def start() {
    // Just launch an rpcEndpoint; it will call back into the listener.
    endpoint.set(rpcEnv.setupEndpoint("AppClient", new ClientEndpoint(rpcEnv)))
  }

ClientEndPoint 是一个 RpcEndpoint，在初始化的时候会去调用其 onstart 方法。

override def onStart(): Unit = {
      try {
        // 向 master 注册
        registerWithMaster(1)
      } catch {
        case e: Exception =>
          logWarning("Failed to connect to master", e)
          markDisconnected()
          stop()
      }
    }

registerWithMaster 方法实际上会去调用 tryRegisterAllMasters 方法，向所有的 Master 去注册。

在 Spark 中，Master 可能会是高可靠的 (HA)，这种情况会有可能有多个 Master，不过只有一个 Master 处于 alive 状态，其它处于 standby 状态。

/**
* 向所有的 master 进行一步注册，将会一直调用 tryRegisterAllMasters 方法进行注册，知道超出注册时间
* 当成功连接到 master ，所有调度的工作和 Futures 都会被取消
*/
private def registerWithMaster(nthRetry: Int) {
        // 实际上调用了 tryRegisterAllMasters ，想所有 master 进行注册
      registerMasterFutures.set(tryRegisterAllMasters())
      registrationRetryTimer.set(registrationRetryThread.schedule(new Runnable {
        override def run(): Unit = {
          if (registered.get) {
            registerMasterFutures.get.foreach(_.cancel(true))
            registerMasterThreadPool.shutdownNow()
          } else if (nthRetry >= REGISTRATION_RETRIES) {
            markDead("All masters are unresponsive! Giving up.")
          } else {
            registerMasterFutures.get.foreach(_.cancel(true))
            registerWithMaster(nthRetry + 1)
          }
        }
      }, REGISTRATION_TIMEOUT_SECONDS, TimeUnit.SECONDS))
    }

tryRegisterAllMasters 方法的调用，向所有 master 注册。

// 异步向所有 master 注册，返回一个 [Future] 的数组用来以后取消
private def tryRegisterAllMasters(): Array[JFuture[_]] = {
      for (masterAddress <- masterRpcAddresses) yield {
        registerMasterThreadPool.submit(new Runnable {
          override def run(): Unit = try {
            if (registered.get) {
              return
            }
            logInfo("Connecting to master " + masterAddress.toSparkURL + "...")
            val masterRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
            // 向 master 发送注册消息
            masterRef.send(RegisterApplication(appDescription, self))
          } catch {
            case ie: InterruptedException => // Cancelled
            case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
          }
        })
      }
    }

向 master 发送注册消息后，master 收到消息后注册完 application 之后会回复一条消息。

case RegisterApplication(description, driver) =>
      // TODO Prevent repeated registrations from some driver
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
      } else {
        logInfo("Registering app " + description.name)
        val app = createApplication(description, driver)
        // 将 master 中内存中等待调度的 app 队列更新，
        registerApplication(app)
        logInfo("Registered app " + description.name + " with ID " + app.id)
        persistenceEngine.addApplication(app)
        // 向 driver 回复一条注册 Application 的处理结果的消息
        driver.send(RegisteredApplication(app.id, self))
        // 调度资源
        schedule()
      }

master 调用 shedule 方法，这个方法做两件事，一个是给等待调度的 driver 分配资源，一个是给等待调度的 application 分配资源去启动 executor。

在给等待调度的 application 分配资源的时候最后会走到 launchExecutor 方法，这个方法会通过给符合要求的 worker 发送启动 executor 的消息去启动 executor。

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    worker.addExecutor(exec)
    // 给 worker 发送 LaunchExecutor 消息
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }

在 worker 收到启动 Executor 的消息后会去根据消息去启动对应的 Executor。

至此，Application 的注册就完成了。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 159,117评论 4赞 362
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,328评论 1赞 293
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 108,839评论 0赞 243
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 44,007评论 0赞 206
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,384评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,629评论 1赞 219
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,880评论 2赞 313
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,593评论 0赞 198
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,313评论 1赞 243
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,575评论 2赞 246
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,066评论 1赞 260
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,392评论 2赞 253
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,052评论 3赞 236
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,082评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,844评论 0赞 195
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,662评论 2赞 274
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,575评论 2赞 270

Spark 源码分析（四）: Application 的注册

推荐阅读更多精彩内容