Hadoop2.7.3+Ubuntu14完全分布式集群搭建

96
至极L
2017.07.06 12:05* 字数 643

案例说明:1.本例使用了两台pc作为实验对象;
2.本例中的master的ip为192.168.1.103;slave2的ip为192.168.1.102;

一、新建用户

1.linux 如何创建新用户:

  sudo useradd -m hadoop -s /bin/bash

2.新建用户设置密码 :

sudo passwd hadoop

3.增加管理员权限:

sudo adduser hadoop sudo

4.更新apt:

sudo apt-get update

5.安装vim:

sudo apt-get install vim 

二、网络配置

1.查看ip :

 ifconfig

2.修改主机名:

sudo vim /etc/hostname 

3.修改所有节点(slave2也要修改)的ip映射:

 ifconfig #查看master 的ip
image.png
sudo vim /etc/hosts
image.png

4.测试是否配置好:

ping slave2 -c 3   #只ping3次 或者 ping ip 

三、安装、配置ssh无密码登录

1.安装ssh:

sudo apt-get install openssh-server

2.登陆本机:

ssh localhost 

3.退出ssh登录的localhost,ssh-keygen生成密钥,并将密钥加入到授权中:

exit
cd ~/.ssh/
ssh-keygen -t rsa  #如果执行不成功在最前面加sudo
cat ./id_rsa.pub >> /.authorized_keys  #如果报错权限问题,需要重新为~/.shh复制权限
image.png

4.传输公钥到slave2节点:

 scp ~/.ssh/id_rsa.pub hadoop@slave2:/home/hadoop/
image.png

5.在slave2节点上,将ssh公钥加入授权:

mkdir ~/.ssh
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys #将密钥加入
rm ~/id_rsa.pub  #可以不删除

6.登出用户 选择log out;在重新登录

7.在master上进入slave2节点:

ssh slave2
(如果出现问题实在解决不了可重装) 卸载ssh:
sudo apt-get --purge remove   openssh-serve 

四、Java安装配置

千万不能安装openjdk,血泪史。

1.下载Jdk-8u131-linux-x64.tar.gz并解压

sudo tar -zxf ~/Downloads/jdk-8u131-linux-x64.tar.gz -C /usr/local #解压命令

2.添加java环境:

sudo vim ~/.bashrc 

加入下列变量:

export JAVA_HOME=/usr/local/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/rt.jar

3.使环境变量生效:

source ~/.bashrc 

4.验证java环境配置正确与否

echo $JAVA_HOME
Java -version
$JAVA_HOME/bin/java -version #应该输出相同结果 

五、Hadoop安装

1.下载Hadoop安装至/usr/local

sudo tar -zxf ~/Downloads/hadoop-2.7.3.tar.gz -C /usr/local
cd /usr/local/
sudo mv ./hadoop-2.7.3/ ./hadoop #修改名字
sudo chown -R hadoop ./hadoop  # 赋予权限

2.查看Hadoop是否可用

cd /usr/local/hadoop
./bin/hadoop version 

3.hadoop环境配置

 sudo vim ~/.bashrc 
添加如下变量
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_ROOT_LOGGER=INFO,console
export  HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" 

执行

source ~/.bashrc 

六、Hadoop集群配置

cd  /usr/local/hadoop/etc/hadoop

1.修改hadoop-env.sh

sudo vim hadoop-env.sh 
export JAVA_HOME=/usr/local/jdk1.8.0_131
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:/usr/local/hadoop/bin

2.修改slaves删除localhost加入 slave2

sudo vim slaves
image.png

3.修改core-site.xml

<configuration>
      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://Master:9000</value>
      </property>
      <property>
         <name>io.file.buffer.size</name>
         <value>131072</value>
     </property>
     <property>
          <name>hadoop.tmp.dir</name>
          <value>/usr/local/hadoop/tmp</value>
     </property>
</configuration>

4.修改hdfs-site.xml

<configuration>
    <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>Master:50090</value>
    </property>
    <property>
      <name>dfs.replication</name>
<!--指定HDFS副本的数量-->
      <value>1</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/usr/local/hadoop/hdfs/data</value>
    </property>
</configuration>

5.修改mapred-site.xml

cp mapred-site.xml.template mapred-site.xml 

修改mapred-site.xml

<configuration>
 <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>Master:10020</value>
  </property>
  <property>
          <name>mapreduce.jobhistory.address</name>
          <value>Master:19888</value>
  </property>
</configuration>

6.修改yarn-site.xml

<configuration>
     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
     <property>
         <name>yarn.resourcemanager.address</name>
         <value>Master:8032</value>
     </property>
     <property>
         <name>yarn.resourcemanager.scheduler.address</name>
         <value>Master:8030</value>
     </property>
     <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>Master:8031</value>
     </property>
     <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>Master:8033</value>
     </property>
     <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>Master:8088</value>
     </property>
</configuration>

7.复制master节点的Hadoop文件夹到slave2上

scp -r /usr/local/hadoop hadoop@slave2:/usr/local

如果报错一般由于权限问题无法访问,可以执行:

scp -r /usr/local/hadoop hadoop@slave2:/home/hadoop
ssh slave2
sudo cp -r ~/hadoop   /usr/local 
sudo chown -R hadoop /usr/local/hadoop

8.在slave2上,安装java,并在~/.bashrc配置java、Hadoop环境(参考上文)

9.首次启动需要在Master节点执行NameNode的格式化:

hdfs namenode -format 

10.测试Hadoop是否安装成功

start-dfs.sh
start-yarn.sh 
查看集群是否启动成功,输入在master输入jps显示: 
SecondaryNameNode 
ResourceManager 
NameNode

在slave2上输入jps显示:
NodeManager 
DataNode

另外需要在Master节点通过命令 
hdfs dfsadmin -report 

查看DataNode启动。

安装出现的问题:

1.安装openjava出现hadoop无法启动,需要卸载openjava,在安装sun java;
2.Nodedata不能启动,用户对/usr/local/hadoop文件夹权限不足,不能读取;必须对slave机器的hadoop文件夹设置权限,sudo chown -R hadoop /usr/local/hadoop, 如果是权限比较大的用户如root不会出现问题;
3. Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

解决方法:
关闭安全模式:
hadoop路径/bin/hadoop dfsadmin -safemode leave
修改Hadoop core-site.xml 因为nodedata不能识别master

 <property>
          <name>fs.defaultFS</name>
          <value>hdfs://192.168.1.103:9000</value>
    </property>
毕业论文
Web note ad 1