Spark-1.4.0单机部署测试

接着上篇部署。该篇是针对上篇的测试。

测试

  • Spark-shell测试
    ./spark-shell
    ...
    scala> val days = List("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")
    days: List[String] = List(Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday)
    scala> val daysRDD =sc.parallelize(days)
    daysRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:14
    scala>daysRDD.count()
    scala>res0:Long =7
    
  • 脚本测试
    • 本地模式

      • ./bin/run-example org.apache.spark.examples.SparkPi 2 spark://localhost:7077
      • ./bin/run-example SparkPi 10 --master local[2]
    • standalone模式
      【注意】127.0.0.1 && *.jar的路径

      • ./spark-submit --class org.apache.spark.examples.SparkPi --master spark://127.0.0.1:7077 ../lib/spark-examples-1.4.0-hadoop2.6.0.jar 100
    • yarn测试(cluster模式和client模式)
      【注意】*.jar的路径

      • ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ../lib/spark-examples*.jar 10
        http://localhost:8088/(localhost可以是服务器地址)
      • ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ../lib/spark-examples*.jar 10
      • 访问localhost:8088可以看到结果。
  • 数据测试
    • shell脚本
    getNum(){
        c=1
        while [[ $c -le 5000000 ]]
        do
            echo $(($RANDOM/500))
            ((c++))
        done
    }
    for i in `seq 30`
    do
        getNum >> ${i}.txt &
        # getNum
    done
    wait
    echo "------------------DONE-----------------"
    cat [0-9]*.txt > num.txt
    
    • 创建hdfs文件目录(执行文件位于hadoop/bin/hdfs;hdfs根目录是hdfs://localhost:9000
      执行命令:./bin/hdfs dfs -mkdir -p /user/hadoop/datatest
    • 向创建的hdfs文件中写入数据(脚本生成的数据)
      执行命令:./bin/hdfs dfs -put /root/num.txt /user/hadoop/datatest
    • scala测试代码:
      执行命令:spark/bin/Spark-shell
    scala> val file = sc.textFile("hdfs://localhost:9000/user/hadoop/datatest/num.txt")
    scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
    scala> count.sortBy(_._2).map(x => x._1 + "\t" + x._2).saveAsTextFile("hdfs://localhost:9000/user/hadoop/datatest/numCount")
    
    执行hadoop相关操作命令:(hadoop/bin/)
    ./hadoop fs -cat hdfs://localhost:9000/user/hadoop/datatest/numCount/p*|sort -k2n
    测试结果如下:

推荐阅读更多精彩内容