大数据开发环境搭建之Flink安装部署

一、Standalone模式安装

1、下载Flink

官网

官网提供的压缩包下载地址

flink-1.10.1-bin-scala_2.11.gz

2、解压Flink

在bigdata03服务器

cd /home/bigdata/soft/

tar -zxvf flink-1.10.1-bin-scala_2.11.gz
mv flink-1.10.1/ /home/bigdata/apps/

3、修改环境变量

命令:

vim ~/.bashrc

文件末尾追加两行内容:

export FLINK_HOME=/home/bigdata/apps/flink-1.10.1/
export PATH=$PATH:$FLINK_HOME/bin

保存退出之后,使用命令source使之生效

source ~/.bashrc

4、Local模式安装(单机flink)

启动服务

cd /home/bigdata/apps/flink-1.10.1/

./bin/start-cluster.sh

停止服务

./bin/stop-cluster.sh

5、Web页面浏览

http://bigdata03:8081/

6、 Standalone模式安装

集群规划

7、配置集群

修改conf/flink-conf.yaml

cd /home/bigdata/apps/flink-1.10.1/conf/

vim flink-conf.yaml

修改内容

 jobmanager.rpc.address: bigdata03 

修改conf/slaves

vim slaves

修改内容

bigdata03
bigdata05 

8、复制bigdata03中的flink-1.10.1文件夹到bigdata05

scp -r /home/bigdata/apps/flink-1.10.1/ bigdata@bigdata05:~/apps

9、在bigdata03(JobMananger)节点启动

start-cluster.sh

10、Web页面浏览

http://bigdata03:8081/

11、StandAlone模式需要考虑的参数

jobmanager.heap.mb:jobmanager节点可用的内存大小

taskmanager.heap.mb:taskmanager节点可用的内存大小

taskmanager.numberOfTaskSlots:每台机器可用的cpu数量

parallelism.default:默认情况下任务的并行度

taskmanager.tmp.dirs:taskmanager的临时数据存储目录

二、on Yarn模式

官网提供的example

1.修改配置文件

flink-conf.yaml

jobmanager.rpc.address: bigdata03
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 2
parallelism.default: 1
high-availability: zookeeper
high-availability.storageDir: hdfs://bigdata02:9000/flink/ha/
high-availability.zookeeper.quorum: bigdata02:2181,bigdata03:2181,bigdata04:2181
https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
state.backend: filesystem
state.checkpoints.dir: hdfs://bigdata02:9000/flink-checkpoints
state.savepoints.dir: hdfs://bigdata02:9000/flink-checkpoints
jobmanager.execution.failover-strategy: region
io.tmp.dirs: /home/bigdata/data/flink/tmp
env.log.dir: /home/bigdata/data/flink/log

masters

bigdata03:8081
bigdata05:8081

slaves

bigdata03
bigdata04
bigdata05

zoo.cfg

# The number of milliseconds of each tick
tickTime=2000

# The number of ticks that the initial  synchronization phase can take
initLimit=10

# The number of ticks that can pass between  sending a request and getting an acknowledgement
syncLimit=5

# The directory where the snapshot is stored.
# dataDir=/tmp/zookeeper

# The port at which the clients will connect
clientPort=2181

# ZooKeeper quorum peers
server.0=bigdata02:2888:3888
server.1=bigdata03:2888:3888
server.2=bigdata04:2888:3888

2.复制到其他机器

scp -r /home/bigdata/apps/flink-1.10.1/ bigdata@bigdata04:~/apps
scp -r /home/bigdata/apps/flink-1.10.1/ bigdata@bigdata05:~/apps

3.配置jar包

1. while creating FileSystem when initializing the state of the BucketingSink.

2. Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.

3.org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.

1.10.1如果要操作hdfs的话,必须要在flink安装目录的 lib 下加上额外的jar包

https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/地址


我的hadoop是2.7.7

4.启动

start-cluster.sh

5.提交程序

cd /home/bigdata/apps/flink-1.10.1

flink run -m yarn-cluster ./examples/batch/WordCount.jar

推荐阅读更多精彩内容