Hive环境搭建

0.193字数 866阅读 2179

介绍

APACHE HIVE TM

Hive是运行在Hadoop之上的数据仓库,将结构化的数据文件映射为一张数据库表,提供简单类SQL查询语言,称为HQL,并将SQL语句转换成MapReduce任务运算。有利于利用SQL语言查询、分析数据,适于处理不频繁变动的数据。Hive底层可以是HBase或者HDFS存储的文件。
推荐文章:hbase和hive的差别是什么,各自适用在什么场景中?

作者:yuan daisy
链接:https://www.zhihu.com/question/21677041/answer/78289309
来源:知乎
著作权归作者所有,转载请联系作者获得授权。

  1. Hive中的表是纯逻辑表,就只是表的定义等,即表的元数据。Hive本身不存储数据,它完全依赖HDFS和MapReduce。这样就可以将结构化的数据文件映射为为一张数据库表,并提供完整的SQL查询功能,并将SQL语句最终转换为MapReduce任务进行运行。 而HBase表是物理表,适合存放非结构化的数据。
  2. Hive是基于MapReduce来处理数据,而MapReduce处理数据是基于行的模式;HBase处理数据是基于列的而不是基于行的模式,适合海量数据的随机访问。
  3. HBase的表是疏松的存储的,因此用户可以给行定义各种不同的列;而Hive表是稠密型,即定义多少列,每一行有存储固定列数的数据。
  4. Hive使用Hadoop来分析处理数据,而Hadoop系统是批处理系统,因此不能保证处理的低迟延问题;而HBase是近实时系统,支持实时查询。
  5. Hive不提供row-level的更新,它适用于大量append-only数据集(如日志)的批任务处理。而基于HBase的查询,支持和row-level的更新。
  6. Hive提供完整的SQL实现,通常被用来做一些基于历史数据的挖掘、分析。而HBase不适用与有join,多级索引,表关系复杂的应用场景。

环境搭建

环境描述

用两台机器,主机装了hive服务端(172.16.252.128 host:master),此机也为hadoop-master,另装有mysql。
hive客户端(172.16.252.128 host:slave02),此机也为hadoop-slave。

环境依赖

需要提前安装好mysql和hadoop。
安装教程见我博客:
[Linux(CentOS7)下rpm方式安装mysql5.6.29](http://www.jianshu.com/p/e23d22022c53)
[CentOS7下搭建Hadoop2.7.3集群](http://www.jianshu.com/p/26e857d7aca8)

安装调试

服务端(master)

  1. 官网下载最新的hiva二进制包apache-hive-2.1.0-bin.tar.gz
    解压即安装,我放在/data目录下
    $ tar -zxvf /data/apache-hive-2.1.0-bin.tar.gz
  2. 配置环境变量
$ vi /etc/profile

#Hive Env
HIVE_HOME=/date/apache-hive-2.1.0-bin
PATH=$PATH:$HIVE_HOME/bin
export HIVE_NAME PATH

$ source /etc/profile
  1. 登陆mysql并创建hive用户和hive数据库用以同步hive结构
$ mysql -uroot -p
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'master' WITH GRANT OPTION;
mysql> flush privileges;
mysql> exit;
$ mysql -h master -uhive
mysql> set password = password('hive');
  1. 下载mysql-connector-java-5.1.40.tar.gz
tar -zvxf mysql-connector-java-5.1.40.tar.gz
cd mysql-connector-java-5.1.40
cp  mysql-connector-java-5.1.40-bin.jar $HIVE_HOME/lib/
  1. 修改hive配置文件
$ cp /data/apache-hive-2.1.0-bin/conf/hive-default.xml.template /data/apache-hive-2.1.0-bin/conf/hive-site.xml
$ vi /data/apache-hive-2.1.0-bin/conf/hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  #mysql连接jdbc
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
<property>
 <name>javax.jdo.option.ConnectionURL</name>
  #mysql连接
 <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
 <description>
   JDBC connect string for a JDBC metastore.
   To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
   For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
 </description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  #mysql用户名
  <value>hive</value>
  <description>Username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  #mysql密码
  <value>hive</value>
  <description>password to use against metastore database</description>
</property>

客户端(slave02)

第一步和第二步同服务端。
第三步:修改配置文件

$ cp /data/apache-hive-2.1.0-bin/conf/hive-default.xml.template /data/apache-hive-2.1.0-bin/conf/hive-site.xml
$ vi /data/apache-hive-2.1.0-bin/conf/hive-site.xml

<property>
 <name>hive.metastore.uris</name>
 <value>thrift://master:9083</value>
 <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

启动(有报错请看下一部分的分析)

  1. 服务端开启
[root@master bin]# cd $HADOOP_HOME/bin/
[root@master bin]# ./hive --service metastore &
[1] 32436
[root@master bin]# which: no hbase in (/data/spark-2.0.1-bin-hadoop2.7/bin:/data/spark-2.0.1-bin-hadoop2.7/sbin:/data/soft/scala-2.11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.7.0_79/bin:/usr/java/jdk1.7.0_79/jre/bin:/data/hadoop-2.7.3/bin:/data/hadoop-2.7.3/sbin:/date/apache-hive-2.1.0-bin/bin:/root/bin)
Starting Hive Metastore Server
  1. 服务器登录hive
[root@master bin]# ./hive
which: no hbase in (/data/spark-2.0.1-bin-hadoop2.7/bin:/data/spark-2.0.1-bin-hadoop2.7/sbin:/data/soft/scala-2.11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.7.0_79/bin:/usr/java/jdk1.7.0_79/jre/bin:/data/hadoop-2.7.3/bin:/data/hadoop-2.7.3/sbin:/date/apache-hive-2.1.0-bin/bin:/root/bin)

Logging initialized using configuration in jar:file:/data/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive> show databases;
OK
default
wentao
Time taken: 0.85 seconds, Fetched: 2 row(s)
  1. 客户端登录hive
[root@slave02 bin]# ./hive
which: no hbase in (/data/spark-2.0.1-bin-hadoop2.7/bin:/data/spark-2.0.1-bin-hadoop2.7/sbin:/data/soft/scala-2.11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.7.0_79/bin:/usr/java/jdk1.7.0_79/jre/bin:/data/hadoop-2.7.3/bin:/data/hadoop-2.7.3/sbin:/root/bin)

Logging initialized using configuration in jar:file:/data/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive> show databases;
OK
default
wentao
Time taken: 0.933 seconds, Fetched: 2 row(s)
hive>

两边的数据是同步的。同时登录到master的mysql查看hive表

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| AUX_TABLE                 |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_COMPACTIONS     |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| KEY_CONSTRAINTS           |
| MASTER_KEYS               |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
| WRITE_SET                 |
+---------------------------+
57 rows in set (0.00 sec)

mysql>

hive结构和数据就以键值对的形式存储在这些表中。

报错及修改

  1. SLF4J多重绑定
```
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
```
解决:
上述jar包有重复绑定Logger类,删除较旧版本即可。
``rm -rf /data/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar``
  1. Hadoop处于安全模式无法启动Hive
```
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1327)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3895)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:984)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
        at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
```
解决:
强制关闭hadoop安全模式
``$HADOOP_HOME/bin/hadoop dfsadmin -safemode leave``
  1. iotmp动态地址无法解析
```
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    at org.apache.hadoop.fs.Path.initialize(Path.java:205)
    at org.apache.hadoop.fs.Path.<init>(Path.java:171)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
    at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
        at java.net.URI.checkPath(URI.java:1804)
        at java.net.URI.<init>(URI.java:752)
        at org.apache.hadoop.fs.Path.initialize(Path.java:202)
        ... 12 more
```
解决:
将`java.io.tmpdir`固定
```
$ mkdir $HIVE_HOME/tmpdir
vi $HIVE_HOME/conf/hive-site.xml

将所有含 ${system:java.io.tmpdir} 所在的value替换成 $HIVE_HOME/tmpdir
```
**注意以上所有$HIVE_HOME等标示都是你的安装目录**

参考

Hadoop集群之Hive安装配置
hive shell not opening when I have hive-site.xml
Hadoop 解除 "Name node is in safe mode"
slf4j提示Class path contains multiple SLF4J bindings
hbase和hive的差别是什么,各自适用在什么场景中?

推荐阅读更多精彩内容