Failed to get broadcast_10_piece0 of broadcast_10

spark-submit yarn-client提交任务时,出现如下错误

18/12/16 15:19:22 WARN scheduler.TaskSetManager: Lost task 10.0 in stage 1.0 (TID 4, iksenode3, executor 3): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_10_piece0 of broadcast_10
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1214)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:167)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_10_piece0 of broadcast_10
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:140)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:140)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:139)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:121)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:121)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:121)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1207)
    ... 11 more

原因:将sparkcontext定义在了object体内,而不是object的方法内,这就导致方法在执行时,sparkcontext初始化多次。在spark中,上一个sparkcontext没有关闭,则会出错。

解决方法:

  1. 一开始在网上查,以为是spark.cleaner.ttl的锅,spark.cleaner.ttl设置一个清除时间,使spark清除超过这个时间的所有RDD数据,以便腾出空间给后来的RDD使用。延长了这个时间发现并没有用。
  2. 从“原因”中得知,应该把sparkcontext初始化在方法中。

最好额外写一个spark初始化类:

import org.apache.spark.{SparkConf, SparkContext}

class Spark extends Serializable {
  def getContext: SparkContext = {
    @transient lazy val conf: SparkConf = 
          new SparkConf()
          .setMaster("local")
          .setAppName("test")

    @transient lazy val sc: SparkContext = new SparkContext(conf)
    sc.setLogLevel("OFF")

   sc
  }
 }

在main中调用

object Test extends Spark{

  def main(args: Array[String]): Unit = {
  val sc = getContext
  val irisRDD: RDD[String] = sc.textFile("...")
...
}

推荐阅读更多精彩内容