flink基础——安装和demo

清华开源软件镜像
https://mirrors.tuna.tsinghua.edu.cn/apache/

1. 安装

1) Mac上安装

参考:
https://segmentfault.com/a/1190000016901469
https://www.jianshu.com/p/17676d34dd35
启动位置:
/usr/local/Cellar/apache-flink/1.6.2/libexec/bin
启动与停止:

$ ./start-cluster.sh
$ ./stop-cluster.sh

2)Windows上安装

参考
https://ci.apache.org/projects/flink/flink-docs-stable/start/flink_on_windows.html
启动位置
D:\flink-1.6.2-bin-scala_2\flink-1.6.2\bin
启动

$ start-cluster.bat 
# Starting a local cluster with one JobManager process and one TaskManager process. 
# You can terminate the processes via CTRL-C in the spawned shell windows.
# Web interface by default on [http://localhost:8081/.]

启动一个job

进入目录,运行

flink.bat run -c wikiedits.WikipediaAnalysis D:\Flink_Project\target\original-wiki-edits-1.0-SNAPSHOT.jar 127.0.0.1 9000

2. 运行第一个job

1) Java程序

  1. maven 模版
$ mvn archetype:generate \  
-DarchetypeGroupId=org.apache.flink \  
-DarchetypeArtifactId=flink-quickstart-java \  
-DarchetypeVersion=1.6.2
mvn archetype:generate \
    -DarchetypeGroupId=org.apache.flink \
    -DarchetypeArtifactId=flink-quickstart-java \
    -DarchetypeVersion=1.6.1 \
    -DgroupId=wiki-edits \
    -DartifactId=wiki-edits \
    -Dversion=0.1 \
    -Dpackage=wikiedits \
    -DinteractiveMode=false
  1. 自定义任务
package FlinkTest;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class SocketTextStreamWordCount {
    public static void main(String[] args) throws Exception {
        //参数检查
        if (args.length != 2) {
            System.err.println("USAGE:\nSocketTextStreamWordCount <hostname> <port>");
            return;
        }

        String hostname = args[0];
        Integer port = Integer.parseInt(args[1]);


        // set up the streaming execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //获取数据
        DataStreamSource<String> stream = env.socketTextStream(hostname, port);

        //计数
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new LineSplitter())
                .keyBy(0)
                .sum(1);

        sum.print();

        env.execute("Java WordCount from SocketTextStream Example");
    }

    public static final class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
            String[] tokens = s.toLowerCase().split("\\W+");

            for (String token: tokens) {
                if (token.length() > 0) {
                    collector.collect(new Tuple2<String, Integer>(token, 1));
                }
            }
        }
    }
}
  1. 打包
    进入工程目录(pom.xml所在目录),使用以下命令打包。
    $ maven clean package -Dmaven.test.skip=true

  2. 执行

flink run -c FlinkTest.SocketTextStreamWordCount \
/Users/dodoyuan/IdeaProjects/flink-quickstart/target/\
original-flink-quickstart-1.8-SNAPSHOT.jar 127.0.0.1 9000

完整参考来源:
https://www.jianshu.com/p/17676d34dd35

2) python程序

  1. 自定义任务
from flink.plan.Environment import get_environment
from flink.functions.GroupReduceFunction import GroupReduceFunction

class Adder(GroupReduceFunction):
  def reduce(self, iterator, collector):
    count, word = iterator.next()
    count += sum([x[0] for x in iterator])
    collector.collect((count, word))

env = get_environment()
data = env.from_elements("Who's there?",
 "I think I hear them. Stand, ho! Who's there?")

data \
  .flat_map(lambda x, c: [(1, word) for word in x.lower().split()]) \
  .group_by(1) \
  .reduce_group(Adder(), combinable=True) \
  .output()

env.execute(local=True)
  1. 运行
    /usr/local/Cellar/apache-flink/1.6.2/libexec/bin/pyflink.sh wordcount.py
    参考:
    http://www.willmcginnis.com/2015/11/08/getting-started-with-python-and-apache-flink/

  2. 查看日志文件


    image.png

其他参考:
https://ci.apache.org/projects/flink/flink-docs-stable/quickstart/java_api_quickstart.html

推荐阅读更多精彩内容