hadoop/hbase/spark安装

安装母机linux

下载CentOS-6.5-x86_64-bin-minimal.iso进行U盘安装

安装KVM虚拟化软件

yum install kvm libvirt python-virtinst qemu-kvm virt-viewer bridge-utils  #安装
/etc/init.d/libvirtd start #启动

也可以在软件管理器搜索kvm进行安装

虚拟化机器集群

桥接方式创建虚拟机:

virt-install 
--name=gateway  #名字
--ram 4096          #内存
--vcpus=4           #cpu核数
-f /home/kvm/gateway.img    #文件
--cdrom /root/CentOS-6.5-x86_64-bin-minimal.iso     #iso镜像文件
--graphics vnc,listen=0.0.0.0,port=5920,                #是否使用vnc连接器
--network bridge=br0 --force --autostart            #采用桥接方式桥接br0,自动启动

也可以在图形化界面进行新增操作。minimal安装的linux需要先安装桌面环境。

yum -y groupinstall Desktop
yum -y groupinstall "X Window System"
startx  #启动图形化
如果想默认以图形界面启动,则修改/etc/inittab

id:5:initdefault: #3为默认命令行,5为图形化,其他不常用

搭建环境前摇

1、网络设置
/etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
HWADDR=52:54:00:00:9d:f1
IPADDR=192.168.0.231
PREFIX=24
GATEWAY=192.168.0.1
DNS1=192.168.0.1
DNS2=8.8.8.8
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"

2、关闭selinux
/etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

3、修改文件句柄
在/etc/security/limis.conf增加以下配置,将root替换为你想授权的用户

root soft nofile 65535
root hard nofile 65535
root soft nproc 32000
root hard nproc 32000

4、关闭防火墙

service iptables stop   #现在关闭iptables
chkconfig --level 35 iptables off   #以后重启也不启动iptables

5、修改主机名和hosts

cat > /etc/sysconfig/network << EOF
> NETWORKING=yes
> HOSTNAME=spark-1
> GATEWAY=192.168.0.1
> EOF
 
cat >> /etc/hosts << EOF
> 192.168.0.231 spark-1
> 192.168.0.232 spark-2
> 192.168.0.233 spark-3
> 192.168.0.234 spark-4
> EOF

环境搭建

  • ssh互通
    每台机器执行以下命令:
ssh-keygen
touch authorized_keys

将每台机器的id_rsa.pub的内容拷贝到每台机器的authorized_keys文件内 ssh-copy-id命令

  • 安装和配置启动hadoop
    从官网下载hadoop-2.7.3.tar.gz到~
tar zxvf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 /usr/local
echo "HADOOP_HOME=/usr/local/hadoop-2.7.3" >> /etc/profile
echo "PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin" >> /etc/profile
source /etc/profile
cd $HADOOP_HOME
# 修改core-site.xml
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop-2.7.3/var</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://spark-1:9000</value>
    </property>
    <property>
        <name>fs.trash.interval</name>
        <value>2880</value>
    </property>
# 修改hdfs-site.xml
<property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>spark-1:50070</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>spark-2:50090</value>
    </property>
# 修改mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>spark-1:8021</value>
</property>
# 修改slaves
spark-1
spark-2
spark-3
spark-4
# 启动hadoop
$HADOOP_HOME/bin/hadoop namenode -format
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave

安装zookeeper

同上步骤1将zookeeper-3.4.6.tar.gz解压到/usr/local,将ZOOKEEPER_HOME加到/etc/profile,更新PATH
创建data目录和myid文件和logs目录

mkdir -p $ZOOKEEPER_HOME/data
touch $ZOOKEEPER_HOME/myid
echo 1 > $ZOOKEEPER_HOME/myid #安装的每台机器拥有一个id,一般安装奇数台机器,id一般0或1开始,保证唯一性
mkdir -p $ZOOKEEPER_HOME/logs

配置zookeeper

cd $ZOOKEEPER_HOME/conf
cp zoo_sample.cfg zoo.cfg
echo > zoo.cfg << EOF
> tickTime=2000
> dataDir=/usr/local/zookeeper-3.4.6/data
> dataLogDir=/usr/local/zookeeper-3.4.6/logs
> clientPort=2181
> tickTime=2000
> initLimit=10
> syncLimit=5
> server.0=spark-1:2888:3888
> server.1=spark-2:2888:3888
> server.2=spark-3:2888:3888
> EOF

启动zookeeper,在每台安装机器执行:

zkServer.sh start
zkServer.sh status #查看状态

安装hbase

同上解压和配置环境变量
修改hbase-site.xml

    <property>
            <name>hbase.rootdir</name>
            <value>hdfs://spark-1:9000/hbase</value>
    </property>
    <property>
            <name>hbase.cluster.distributed</name>
            <value>true</value>
    </property>
    <property>
            <name>hbase.master</name>
            <value>spark-1:60000</value>
   </property>
   <property>
            <name>hbase.zookeeper.quorum</name>
            <value>spark-1,spark-2,spark-3</value>
   </property>

修改hbase-env.sh,添加以下内容

export JAVA_HOME=/usr/local/jdk  #java安装目录
export HBASE_LOG_DIR=/usr/local/hbase-1.2.1/logs #Hbase日志目录
export HBASE_MANAGES_ZK=false #如果使用HBase自带的Zookeeper值设成true 如果使用自己安装的Zookeeper需要将该值设为false
修改regionservers,将安装HBase的主机名加入,去掉localhost启动hbase
$HBASE_HOME/bin/start-hbase.sh

安装spark

如上解压和配置环境变量
将hadoop的core-site.xml、hdfs-site.xml和HBase的hdfs-site.xml拷到spark的conf目录
修改spark-default.conf

spark.executor.memory       6g
spark.eventLog.enabled      true
spark.eventLog.dir      hdfs://spark-1:9000/spark-history
spark.serializer        org.apache.spark.serializer.KryoSerializer
spark.eventLog.compress     true
spark.scheduler.mode        FAIR

修改spark-env.sh

export HBASE_HOME=/usr/local/hbase-1.2.1
export HIVE_HOME=/usr/local/hive-1.2.1
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/usr/local/spark-2.0.1-hadoop2.7/jars/hbase/*
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdk
export SPARK_MASTER_IP=spark-1
export SPARK_WORKER_MEMORY=11g
export SPARK_WORKER_CORES=4
export SPARK_EXECUTOR_CORES=2
export SPARK_EXECUTOR_MEMORY=6g
export SPARK_DAEMON_MEMORY=11g
export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
# export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=s1:2181,s2:2181,s3:2181 -Dspark.deploy.zookeeper.dir=/spark" #zookeeper管理模式
export SPARK_LOG_DIR=/usr/local/spark-2.0.1-hadoop2.7/logs
export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://spark-1:9000/spark-history"

将hbase的lib下的以下jar包拷到spark的jars/hbase目录

guava-12.0.1.jar  hbase-client-1.2.1.jar  hbase-common-1.2.1.jar  hbase-protocol-1.2.1.jar  hbase-server-1.2.1.jar  htrace-core-3.1.0-incubating.jar  metrics-core-2.2.0.jar  protobuf-java-2.5.0.jar

启动spark

$SPARK_HOME/sbin/start-all.sh
$SPARK_HOME/sbin/start-history-server.sh

快捷安装思路:

  • 善用rsync:
    由于几乎大部分配置都是一样,可以在一台机器上先做以上配置,用rsync -avz 进行文件夹同步,然后每台机器配置不同的地方
  • 使用fabric
    通过编写fabric脚本来安装集群,对shell能力要求较高

推荐阅读更多精彩内容