hadoop环境搭建(jdk hdfs yarn hive zookeeper kafka hbase kylin )

添加hsot

vim /etc/hosts

123.123.123.123 kylin1
123.123.123.125 kylin3
123.123.123.124 kylin2

修改主机名

#命令方式修改 从新连接后生效,重启会重置
hostnamectl set-hostname 主机名
#修改/etc/hostname防止重启以后重置
vim /etc/hostname

配置主机秘钥

ssh-keygen -t rsa

复制主机秘钥

ssh-copy-id kylin1

安装jdk[所有机器都要安装]

jdk官网地址:https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

所有软件安装目录 /usr/local

解压jdk安装包

存放在 /usr/local/java下

tar -zxvf jdk-8u191-linux-x64.tar.gz 
mv jdk1.8.0_191/ /usr/local/java/

修改环境变量

vim /etc/profile
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

验证安装结果

java -version

成功返回

java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

安装hadoop

haddop 安装包地址 https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-3.1.4/hadoop-3.1.4.tar.gz
其他版本参考:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core
hadoop 官方文档

image.png

解压安装包,并放到已选定的安装目录

tar -zxvf hadoop-3.1.4.tar.gz
mv jdk1.8.0_281/ /usr/local/hadoop/

修改配置

配置文件位置为 /usr/local/hadoop/hadoop-3.1.4/etc/hadoop/

core-site.xml

<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://kylin1:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/data</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
  <name>dfs.name.dir</name>
  <value>/usr/local/data/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/usr/local/data/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>kylin1</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
</configuration>

hadoop-env.sh

#查找export JAVA_HOME= 后面加上自己的路径
export JAVA_HOME=/usr/local/java/jdk1.8.0_191

workers

#集群节点
kylin2
kylin3

添加环境变量

vim /etc/profile


#java
export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

#hadoop
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.1.4
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

刷新环境变量

source /etc/profile

直接把配置好的hadoop包传到剩下两个子节点同样的位置下

scp -r hadoop root@kylin2:/usr/local/
scp -r hadoop root@kylin3:/usr/local/

格式化hdfs

hdfs namenode -format

到hadoop安装包根目录

cd /usr/local/hadoop/hadoop-3.1.4

执行

./sbin/start-dfs.sh

可能出现如下报错:

Starting namenodes on [kylin1]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [iZ8vbdysgs8j4ptv1x9tyaZ]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

将/usr/local/hadoop/hadoop-3.1.4/sbin 下的start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

再次重启,出现如下字样表示启动成功

WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [kylin1]
Last login: Fri Mar  5 18:02:06 CST 2021 on pts/0
Last failed login: Fri Mar  5 18:03:46 CST 2021 from 172.26.76.123 on ssh:notty
There were 2 failed login attempts since the last successful login.
Starting datanodes
Last login: Fri Mar  5 18:03:57 CST 2021 on pts/0
Starting secondary namenodes [kylin1]
Last login: Fri Mar  5 18:03:59 CST 2021 on pts/0

启动yarn

sbin/start-yarn.sh
#成功字样
Starting resourcemanager
Last login: Fri Mar  5 18:04:02 CST 2021 on pts/0
Starting nodemanagers
Last login: Fri Mar  5 18:07:06 CST 2021 on pts/0

jps 查看一下

image.png

节点


image.png

修改配置刷新命令

bin/hdfs dfsadmin -refreshNodes
bin/yarn rmadmin -refreshNodes

成功

安装Hive

安装mysql

wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

wget http://repo.mysql.com/mysql57-community-release-el7.rpm

sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

sudo yum update

sudo yum install mysql-server

sudo systemctl start mysqld

hive安装包下载地址

注意 选则匹配自己Hadoop 的版本 可以参考 https://hive.apache.org/downloads.html
https://mirrors.bfsu.edu.cn/apache/hive/

同上面步骤 ,创建hive安装目录,解压安装包,移动到对应目录下

mkdir /usr/local/hive
tar -zxvf apache-hive-2.3.4-bin.tar.gz 
mv apache-hive-2.3.4-bin /usr/local/hive/hive-2.3.4

添加环境变量

vim /etc/profile

#hive
export HIVE_HOME=/usr/local/hive/hive-2.3.4


#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin

source /etc/profile

修改配置文件

cd hive-2.3.4/conf
mv hive-default.xml.template   hive-site.xml

vim hive-site.xml 


 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://master:3306/hive_metadata?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
 </property>
 
 <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
 </property>
  
 <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
 </property>
    
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
 </property>

 <property>
    <name>hive.querylog.location</name>
    <value>/usr/local/hive/hive-2.3.4/tmp/hadoop</value>
    <description>Location of Hive run time structured log file</description>
  </property>
 
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/usr/local/hive/hive-2.3.4/tmp/hadoop/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>
  
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/usr/local/hive/hive-2.3.4/tmp/hadoop</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/usr/local/hive/hive-2.3.4/tmp/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
  </property>

链接: https://pan.baidu.com/s/1K4O7K00khqlo9yi6VVS3jA 密码: kkpp

把下载好的mysql-connector-java.jar这个jar包拷到/usr/local/hive/hive-2.3.4/lib/下面

进入mysql

mysql

create database if not exists hive_metadata;
grant all privileges on hive_metadata.* to 'hive'@'%' identified by 'hive';
grant all privileges on hive_metadata.* to 'hive'@'localhost' identified by 'hive';
grant all privileges on hive_metadata.* to 'hive'@'master' identified by 'hive';
flush privileges;
use hive_metadata;
exit;

初始化

schematool -dbType mysql -initSchema  

出现如下错误

image.png

原因:
hadoop和hive的两个guava.jar版本不一致
两个位置分别位于下面两个目录:

  • /usr/local/hive/hive-2.3.4/lib
  • /usr/local/hadoop/hadoop-3.1.4/share/hadoop/common/lib

解决办法:
删除低版本的那个,将高版本的复制到低版本目录下

测试验证hive

#创建一个txt文件存点数据等下导到hive中去
vim users.txt
1,浙江工商大学
2,杭州
3,I love
4,ZJGSU
5,加油哦

进入hive出现”hive>“ 表示成功

image.png
# 创建users表,这个row format delimited fields terminated by ','代表我们等下导过来的文件中字段是以逗号“,”分割字段的
# 所以我们上面users.txt不同字段中间有逗号
create table users(id int, name string) row format delimited fields terminated by ',';
# 导数据
load data local inpath '/usr/local/src/users.txt' into table users;
#查询
select * from users;

ok,hive 完成

zookeeper

安装包
链接: https://pan.baidu.com/s/1wkBVwD6qh_WLi8zPs5RbEg 密码: vo2j
官网https://zookeeper.apache.org/

/usr/local下创建zookeeper安装目录

mkdir zookeeper
tar -zxvf zookeeper-3.4.10.tar.gz 
mv zookeeper-3.4.10 /usr/local/zookeeper/

配置zookeeper环境变量

#zookeeper
export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.10

#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

 source /etc/profile

修改配置文件

cd zookeeper-3.4.10/conf
mv zoo_sample.cfg zoo.cfg

vim zoo.cfg

dataDir=/usr/local/zookeeper/zookeeper-3.4.10/data

server.0=master:2888:3888 
server.1=slave1:2888:3888 
server.2=slave2:2888:3888

创建数据目录

cd ..
mkdir data

cd data
vim myid
0

把上面配置好的zookeeper文件夹直接传到两个子节点

scp -r zookeeper root@192.168.185.151:/usr/local/
scp -r zookeeper root@192.168.185.152:/usr/local/

# 注意在两个子节点上把myid文件里面的 0 给分别替换成 1 和 2

# 注意在两个子节点上像步骤1一样,在/etc/profile文件里配置zookeeper的环境变量,保存后别忘source一下

启动zookeeper

#每一台机器都要做
zkServer.sh start

正确结果应该是:三个节点中其中一个是leader,另外两个是follower


image.png
image.png
jps
# 检查三个节点是否都有QuromPeerMain进程
image.png

image.png

zookeeper配置结束

kafka

Kafka由Scala和Java编写,所以我们先需要安装配置Scala
安装包:链接: https://pan.baidu.com/s/1iz6AbKr7CP3CjsqaWaZo_g 密码: 8l4u

解压放在/usr/lcal/scala下
编辑环境变量

vim /etc/profile

#scala
export SCALA_HOME=/usr/local/scala/scala-2.11.8

#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$KYLIN_HOME/bin:$SCALA_HOME/bin

source /etc/profile

验证结果

scala -version

Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

然后在剩下两个子节点中重复上述步骤!

创建kafka安装目录/usr/local/kafka
链接: https://pan.baidu.com/s/1Xb2plF4GVNCq9csDepH5Jg 密码: wr7a
解压并移动到/usr/local/kafka
环境变量

#kafka
export KAFKA_HOME=/usr/local/kafka/kafka-2.1.0

#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$KYLIN_HOME/bin:$SCALA_HOME/bin:$KAFKA_HOME/bin

编辑配置文件

vim kafka-2.1.0/config/server.properties
broker.id=0
listeners=PLAINTEXT://kylin1:9092
advertised.listeners=PLAINTEXT://kylin1:9092
zookeeper.connect=kylin1:2181,kylin2:2181,kylin3:2181

把master节点上修改好的kafka整个文件夹传到其余两个子节点

scp -r kafka root@kylin2:/usr/local/
scp -r kafka root@kylin3:/usr/local/



# 在另外两个节点上,对server.properties要有几处修改
# broker.id 分别修改成: 1 和 2
# listeners 在ip那里分别修改成子节点对应的,即 PLAINTEXT://kylin2:9092 和 PLAINTEXT://kylin3:9092
# advertised.listeners 也在ip那里分别修改成子节点对应的,即 PLAINTEXT://kylin2:9092 和 PLAINTEXT://kylin3:9092
# zookeeper.connect 不需要修改
# 另外两个节点上也别忘了配置kafka环境变量

# 在三个节点都启动kafka
[root@master local]# cd kafka/kafka-2.1.0/
[root@master kafka-2.1.0]# nohup kafka-server-start.sh /usr/local/kafka/kafka-2.1.0/config/server.properties & 

# 在主节点上创建主题TestTopic
[root@master kafka-2.1.0]# kafka-topics.sh --zookeeper kylin1:2181,kylin2:2181,kylin3:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create

# 在主节点上启动一个生产者
[root@master kafka-2.1.0]# kafka-console-producer.sh --broker-list kylin1:9092,kylin2:9092,kylin3:9092 --topic TestTopic

# 在其他两个节点上分别创建消费者
[root@slave1 kafka-2.1.0]# kafka-console-consumer.sh --bootstrap-server kylin2:9092 --topic TestTopic --from-beginning
[root@slave2 kafka-2.1.0]# kafka-console-consumer.sh --bootstrap-server kylin3:9092--topic TestTopic --from-beginning

# 在主节点生产者命令行那里随便输入一段话:
> hello world

# 然后你就会发现在其他两个消费者节点那里也出现了这句话,即消费到了该数据

kafka 完成

Hbase

安装包:链接: https://pan.baidu.com/s/1IVkmzyAqd9zFSW_Cts7t2Q 密码: 7io2
官网:https://www.w3cschool.cn/hbase_doc/hbase_doc-m3y62k51.html

同样 /usr/local下创建hbase安装目录,解压并移动到hbase目录

#环境变量
vim /etc/profile


#hbase 
export HBASE_HOME=/usr/local/hbase/hbase-2.1.1


#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin

source /etc/profile

编辑配置文件

cd hbase-2.1.1/conf
vim hbase-env.sh 

export JAVA_HOME=/usr/local/java/jdk1.8.0_191
export HBASE_LOG_DIR=${HBASE_HOME}/logs 
export HBASE_MANAGES_ZK=false

vim hbase-site.xml 


<configuration>
<property> 
    <name>hbase.rootdir</name> 
    <value>hdfs://master:9000/hbase</value> 
  </property> 
  <property> 
    <name>hbase.cluster.distributed</name> 
    <value>true</value> 
  </property> 
  <property> 
    <name>hbase.zookeeper.quorum</name> 
    <value>master,slave1,slave2</value> 
  </property> 
  <property> 
    <name>hbase.zookeeper.property.dataDir</name> 
    <value>/usr/local/zookeeper/zookeeper-3.4.10/data</value> 
  </property> 
  <property>
    <name>hbase.tmp.dir</name>
    <value>/usr/local/hbase/data/tmp</value>
  </property>
  <property> 
    <name>hbase.master</name> 
    <value>hdfs://master:60000</value> 
  </property>
  <property>
    <name>hbase.master.info.port</name>
    <value>16010</value>
  </property>
  <property>
    <name>hbase.regionserver.info.port</name>
    <value>16030</value>
  </property>
</configuration>

vim regionservers
master
slave1
slave2

复制文件到节点

# 把上面配置好的hbase整个文件夹传过去
cd ../../..
scp -r hbase root@192.168.185.151:/usr/local/

# 别忘在另外两个节点也要在/etc/profile下配置环境变量并source一下使生效!
# 在所有节点上都手动创建/usr/local/hbase/data/tmp目录,也就是上面配置文件中hbase.tmp.dir属性的值,用来保存临时文件的。

注意:启动Hbase之前,zookeeper和hadoop需要提前启动起来
启动hbase

cd hbase/hbase-2.1.1
bin/start-hbase.sh   
jps
# 正确结果:主节点上显示:HMaster / 子节点上显示:HRegionServer

启动以后可能没有HMaster
需要执行

cp $HBASE_HOME/lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar $HBASE_HOME/lib/

再次重启就好了,
重启过程中 执行 ./bin/stop-hbase.sh 可能不会停掉 HRegionServer
执行

./bin/hbase-daemon.sh stop regionserver RegionServer

hbase结束

kylin

官网 http://kylin.apache.org/cn/docs/install/index.html

/usr/local创建kylin安装目录 解压,并移动到/usr/local/kylin目录

添加环境变量

vim /etc/profile

#kylin
export KYLIN_HOME=/usr/local/kylin/kylin-3.1.1
#path
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$KYLIN_HOME/bin

source /etc/profile

进入kylin-3.1.1目录

./bin/check.env.sh
image.png

如上图表示检查项通过

启动kylin

./bin/kylin.sh start

报错


image.png

需要在$HBASE_HOME/bin/hbase 里面添加:/opt/cloudera/parcels/CDH/lib/hbase/lib/*
原文地址:http://92072234.wiz03.com/share/s/2i1O8Q1L1k042IDoOy3h7BgH2K4G6J2SoQv42Xc4b01xpCrj

报错: KeeperErrorCode = NoNode for /hbase/master]

解决方案:https://www.cnblogs.com/xyzai/p/12695116.html

看到如下字样表示kylin启动成功


image.png

文献:
https://blog.csdn.net/pig2guang/article/details/85313410
https://blog.csdn.net/weixin_40521823/article/details/86666139
https://blog.csdn.net/k393393/article/details/92078626

推荐阅读更多精彩内容