虚拟机（centos7）：192.168.198.131

java 1.8

一、hadoop 安装

1、设置主机名 master

vim /etc/sysconfig/network

NETWORKING=yes 
HOSTNAME=master

vim /etc/hosts

192.168.198.131 master

重启生效 reboot

2、关闭防火墙

systemctl stop firewalld firewall-cmd --state

3、设置免密码登录，感觉没有必要吧（有必要，后面用到，后面有设置）

4、Hadoop-2.7.4 解压

[root@master tools]# tar -zxvf hadoop-2.7.4.tar.gz

5、jdk

[root@master hadoop-2.7.4]# java -version

java version "1.8.0_161"

Java(TM) SE Runtime Environment (build 1.8.0_161-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

6、查看hadoop版本

[root@master bin]# ./hadoop version

Error: JAVA_HOME is not set and could not be found.

修改hadoop环境配置

vim hadoop-env.sh

export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 
export HADOOP_LOG_DIR=/data/hadoop_repo/logs/hadoop

查看版本：

[root@master bin]# ./hadoop version

Hadoop 2.7.4

Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r cd915e1e8d9d0131462a0b7301586c175728a282

Compiled by kshvachk on 2017-08-01T00:29Z

Compiled with protoc 2.5.0

From source with checksum 50b0468318b4ce9bd24dc467b7ce1148

This command was run using /usr/local/tools/hadoop-2.7.4/share/hadoop/common/hadoop-common-2.7.4.jar

7、修改配置文件

[root@master hadoop]# pwd

/usr/local/tools/hadoop-2.7.4/etc/hadoop

vim core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop_repo</value>
    </property>
</configuration>

vim hdfs-site.xml
副本数量

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml

vim mapred-site.xml

表示在yarn这个引擎执行

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

vim yarn-site.xml

yarn跑哪个引擎，白名单

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

8、hdfs使用前需要进行格式化（和格式化磁盘类似）：不要频繁执行，如果出错，把hadoop_repo目录删除，在执行格式化

确保路径 /data/hadoop_repo 存在

bin/hdfs namenode -format

20/05/05 19:44:45 INFO common.Storage: Storage directory /data/hadoop_repo/dfs/name has been successfully formatted.

9、环境变量

vim /etc/profile

HADOOP_HOME=/usr/local/tools/hadoop-2.7.4 
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH}

10、设置免密码登录 ssh-keygen -t rsa

如果不设置，执行 start-all.sh 命令，会一直提示：

[root@master sbin]# ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [master]

The authenticity of host 'master (192.168.198.131)' can't be established.

ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.

ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.

Are you sure you want to continue connecting (yes/no)? yes

master: Warning: Permanently added 'master,192.168.198.131' (ECDSA) to the list of known hosts.

root@master's password:

master: starting namenode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-namenode-master.out

The authenticity of host 'localhost (::1)' can't be established.

ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.

ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.

Are you sure you want to continue connecting (yes/no)? yes

localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

root@localhost's password:

root@localhost's password: localhost: Permission denied, please try again.

localhost: starting datanode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-datanode-master.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.

ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.

ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.

Are you sure you want to continue connecting (yes/no)? yu^H[[3~^[[D[[D^[[C

Please type 'yes' or 'no': yes

0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.

root@0.0.0.0's password:

0.0.0.0: starting secondarynamenode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-secondarynamenode-master.out

starting yarn daemons

starting resourcemanager, logging to /usr/local/tools/hadoop-2.7.4/logs/yarn-root-resourcemanager-master.out

root@localhost's password:

localhost: starting nodemanager, logging to /usr/local/tools/hadoop-2.7.4/logs/yarn-root-nodemanager-master.out

未设置ssh免密码登录

[root@master sbin]# ssh 192.168.198.131

root@192.168.198.131's password:

Last failed login: Tue May 5 19:51:21 PDT 2020 from localhost on ssh:notty

There was 1 failed login attempt since the last successful login.

Last login: Tue May 5 19:28:21 2020 from 192.168.198.1

设置ssh免密码登录

[root@master sbin]# ssh-keygen -t rsa

三次回车

执行

[root@master ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

然后可以直接ssh了

exit退出

11、启动

[root@master sbin]# ./start-all.sh
[root@master sbin]# jps
7202 Jps
6836 NodeManager
6709 ResourceManager
6536 SecondaryNameNode
6201 NameNode
6347 DataNode

12、访问

localhost:8088 localhost:50070

二、hive 安装

1、解压

2、配置环境变量

[root@master apache-hive-2.3.7]# hive --version

Hive 2.3.7

Git git://Alans-MacBook-Air.local/Users/gates/git/hive -r cb213d88304034393d68cc31a95be24f5aac62b6

Compiled by gates on Tue Apr 7 12:42:45 PDT 2020

From source with checksum 9da14e8ac4737126b00a1a47f662657e

3、

[root@master conf]# cp hive-default.xml.template hive-site.xml

[root@master conf]# vim hive-site.xml

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.198.131:3306/hive</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
</property>

4、复制mysql的驱动程序到hive/lib下面

5、创建mysql下的hive数据库，然后执行

mysql 启动，创建hive库，不使用hive自带的

[root@master mysql]# service mysqld start

/etc/init.d/mysqld: line 239: my_print_defaults: command not found

/etc/init.d/mysqld: line 259: cd: /usr/local/mysql: No such file or directory

Starting MySQL ERROR! Couldn't find MySQL server (/usr/local/mysql/bin/mysqld_safe)

CREATE DATABASE hive;

[root@master bin]# schematool -dbType mysql -initSchema

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL: jdbc:mysql://192.168.198.131:3306/hive

Metastore Connection Driver : com.mysql.jdbc.Driver

Metastore connection User: root

Starting metastore schema initialization to 2.3.0

Initialization script hive-schema-2.3.0.mysql.sql

Initialization script completed

schemaTool completed

[root@master bin]#

6、执行hive命令

[root@master apache-hive-2.3.7]# hive

which: no hbase in (/usr/local/tools/apache-hive-2.3.7/bin:/usr/local/tools/hadoop-2.7.4/bin:/usr/local/tools/hadoop-2.7.4/sbin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/tools/apache-hive-2.3.7/lib/hive-common-2.3.7.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive>

7、查看第5步创建的数据库，已经有了很多表

mysql -uroot -p

目测主要看两个表：TBLS , COLUMNS_V2

8、测试

[root@master apache-hive-2.3.7]# hive

which: no hbase in (/usr/local/tools/apache-hive-2.3.7/bin:/usr/local/tools/hadoop-2.7.4/bin:/usr/local/tools/hadoop-2.7.4/sbin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/tools/apache-hive-2.3.7/lib/hive-common-2.3.7.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> show tables;

OK

Time taken: 4.426 seconds

hive> create database hive_1;

OK

Time taken: 0.198 seconds

hive> show databases;

OK

default

hive_1

Time taken: 0.03 seconds, Fetched: 2 row(s)

hive>

看看hadoop存储信息：

[root@master ~]# hadoop fs -lsr /

lsr: DEPRECATED: Please use 'ls -R' instead.

drwx-wx-wx - root supergroup 0 2020-05-05 22:29 /tmp

drwx-wx-wx - root supergroup 0 2020-05-05 22:29 /tmp/hive

drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root

drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root/1fed50ca-d9f6-4b5c-b80b-a81a66679812

drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root/1fed50ca-d9f6-4b5c-b80b-a81a66679812/_tmp_space.db

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive/warehouse

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive/warehouse/hive_1.db

三、kafka （伪分布式）安装，前提安装zookeeper

1、安装

# tar zxvf kafka_2.11-2.2.1.tgz # mv kafka_2.11-2.2.1 kafka # cd kafka

启动kafka服务

# nohup bin/kafka-server-start.sh config/server.properties &

创建topic

# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

查看topic

# bin/kafka-topics.sh --list --zookeeper localhost:2181

2、测试

使用kafka-console-producer.sh 发送消息

# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

使用kafka-console-consumer.sh消费消息

# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

3、kafka集群

config 下配置多个server属性文件，设置不同的 broker.id

bin/kafka-server-start.sh config/server-1.properties &

需要先启动zookeeper

四、hbase 安装

1、解压，配置环境变量

[root@master hbase-1.4.13]# vim /etc/profile

HBASE_HOME=/usr/local/tools/hbase-1.4.13
export PATH=${HBASE_HOME}/bin:${PATH}

[root@master hbase-1.4.13]# source /etc/profile

2、修改配置文件

向hbase-env.sh中添加：

export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 export HBASE_MANAGES_ZK=false

修改hbase-site.xml为

<configuration>

<property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:9000/hbase</value>
</property>

<property>
 <name>hbase.zookeeper.property.clientPort</name>
 <value>2181</value>
 <description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
 </description>
</property>

<property>
    <name>hbase.tmp.dir</name>
    <value>/usr/local/tools/hbase-1.4.13/data</value>
</property>
<!-- zk的位置，zk伪集群，value只有一个，如果是集群，主机名以逗号分隔 -->
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>master</value>
        </property>
<!-- 
false表示单机模式 true表示集群模式 
此处必须为true，不然hbase仍用自带的zk，若启动了外部的zookeeper，会导致冲突，hbase启动不起来
-->
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
</property>
</configuration>

3、启动hbase

[root@master bin]# ./start-hbase.sh

访问：查看HBase界面端口 16010

4、问题总结

1）

running master, logging to /usr/local/tools/hbase-1.4.13/bin/../logs/hbase-root-master-master.out

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

解决方案：

如报错所示，在hbase-env.sh配置文件中存在某些在jdk8中不存在命令，查看配置文件发现如下场景：

Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+

export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"

注释即可

2）

HMaster和HRegionServer是Hbase的两个子进程，但是使用jps发现没有启动起来，所以去我们配置的logs查看错误信息。提示：

Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.

但是在hbase-env.sh文件中设置了export HBASE_MANAGES_ZK=false

设置不使用自带zookeeper，这一步设置完按理说就可以使用独立的zookeeper程序了,但是还是报错。很明显，这是启动自带zookeeper与独立zookeeper冲突了。因为把hbase.cluster.distributed设置为false，也就是让hbase以standalone模式运行时，依然会去启动自带的zookeeper。

所以要做如下设置，值为true：

vim conf/hbase-site.xml

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

3）

2020-05-07 02:57:17,302 INFO [main-SendThread(192.168.181.131:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.181.131/192.168.181.131:2181. Will not attempt to authenticate using SASL (unknown error)

hbase-site.xml 配置的zookeeper 主机为hostname（master），之前是ip

五、solr集群 solr-7.5.0，前提，已经配置好了zk伪集群

1、解压

2、启动测试

solr start

3、配置zk集群和SOLR_PORT，zokeeper伪集群已经配置好，需要在solr中配置zk和SOLR_PORT

[root@master bin]# vim solr.in.sh

ZK_HOST="192.168.198.131:2181,192.168.198.131:2182,192.168.198.131:2183"

4、solr 创建 collection

bash $SOLR_HOME/bin/solr create -c vertex_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2

bash $SOLR_HOME/bin/solr create -c edge_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2

bash $SOLR_HOME/bin/solr create -c fulltext_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2

5、solr集群启动

zk伪集群

solr伪集群

/usr/local/tools/solr-cloud/solr1/bin/solr start -force 
/usr/local/tools/solr-cloud/solr2/bin/solr start -force 
/usr/local/tools/solr-cloud/solr3/bin/solr start -force 
/usr/local/tools/solr-cloud/solr4/bin/solr start -force 

/usr/local/tools/solr-cloud/solr1/bin/solr stop 
/usr/local/tools/solr-cloud/solr2/bin/solr stop 
/usr/local/tools/solr-cloud/solr3/bin/solr stop 
/usr/local/tools/solr-cloud/solr4/bin/solr stop

[root@master bin]#./solr create_collection -c test_collection -shards 2 -replicationFactor 2 -force

-c 指定库(collection)名称

-shards 指定分片数量,可简写为 -s ,索引数据会分布在这些分片上

-replicationFactor 每个分片的副本数量

-force 上文已说明

加 -force 是因为solr不允许使用 root 进行操作的，其他账户可不加

solr集群完成

参考：https://blog.csdn.net/qq_37936542/article/details/83113083

六、apache atlas 独立部署开始

使用atlas内置的hbase和solr

/usr/local/project/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server

不使用atlas内置的hbase和solr

/usr/local/project/apache-atlas-sources-2.0.0-alone

[root@master apache-atlas-sources-2.0.0-alone]# mvn clean -DskipTests package -Pdist

编译完成，使用 distro/target/apache-atlas-2.0.0-server

集成solr到apache atlas

cd /usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf

[root@master conf]# cp -r solr/ /usr/local/tools/solr-7.5.0/apache-atlas-conf

独立部署：主要修改配置文件
atlas-env.sh

export HBASE_CONF_DIR=/usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf

atlas-application.properties

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=master:2181,master:2182,master:2183
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=master:2181,master:2182,master:2183
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: http://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=master:2181/kafka,master:2182/kafka,master:2183/kafka
atlas.kafka.bootstrap.servers=master:9092,master:9093,master:9094
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM

#########  Server Properties  #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=master:2181,master:2182,master:2183

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>

######### Hive Hook Configs #######
 
atlas.hook.hive.synchronous=false
 
atlas.hook.hive.numRetries=3
 
atlas.hook.hive.queueSize=10000
 
######### Sqoop Hook Configs #######
 
atlas.hook.sqoop.synchronous=false
 
atlas.hook.sqoop.numRetries=3
 
atlas.hook.sqoop.queueSize=10000

storage.cql.protocol-version=3
storage.cql.local-core-connections-per-host=10
storage.cql.local-max-connections-per-host=20
storage.cql.local-max-requests-per-connection=2000
storage.buffer-size=1024

七、atlas 独立部署问题总结

1）

Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir

软连接不对？

cd /usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0

[root@cdh632-worker03 atlas]# ln -s /etc/hbase/conf /opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf

[root@cdh632-worker03 atlas]# pwd

vim atlas-env.sh

export HBASE_CONF_DIR=/usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf

2）

启动报错

2020-05-23 10:30:46,794 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399)

java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

修改配置文件

master:2181,master:2182,master:2183

参考：
https://blog.csdn.net/qq_34024275/article/details/105393745

图数据库建立流程如下：

配置文件配置图数据库的数据存储位置和索引存储位置：

atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage

atlas.graph.storage.backend=hbase

atlas.graph.storage.port=2181

atlas.graph.storage.hbase.table=atlas-test

atlas.graph.storage.hostname=docker2,docker3,docker4

# Graph Search Index Backend

atlas.graph.index.search.backend=elasticsearch

atlas.graph.index.search.hostname=127.0.0.1

atlas.graph.index.search.index-name=atlas_test

3）

at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily

atlas hbase 是2.0，本地启动的是 1.4.13

解决：

<property>

<name>hbase.rootdir</name>

<value>hdfs://master:9000/hbase</value>

</property>

<property>

<name>hbase.zookeeper.property.clientPort</name>

<value>2181</value>

<description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.

</description>

</property>

<property>

<name>hbase.tmp.dir</name>

<value>/usr/local/tools/hbase-2.2.4/data</value>

</property>

<!-- zk的位置 -->

<property>

<name>hbase.zookeeper.quorum</name>

<value>master</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

4）

2020-05-23 23:52:00,415 WARN - [main:] ~ JanusGraphException: Could not open global configuration (AtlasJanusGraphDatabase:167)

2020-05-23 23:52:00,432 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399)

java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

配置文件添加：

storage.cql.protocol-version=3

storage.cql.local-core-connections-per-host=10

storage.cql.local-max-connections-per-host=20

storage.cql.local-max-requests-per-connection=2000

storage.buffer-size=1024

5）

Caused by: org.apache.solr.common.SolrException: Cannot connect to cluster at master:2181,master:2182,master:2183/solr: cluster not found/not ready

at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:385)

at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:141)

at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:383)

at org.janusgraph.diskstorage.solr.Solr6Index.<init>(Solr6Index.java:218)

master:2181,master:2182,master:2183/solr

改成 master:2181,master:2182,master:2183 就可以了

补充：

1、java环境变量

vim /etc/profile

加上以下代码：

export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 
export PATH=$JAVA_HOME/bin:$PATH

然后保存退出，使配置生效：

source /etc/profile

2、zookeeper 伪分布式

[root@master zookeeper-01]# cd data/

[root@master data]# touch myid

[root@master data]# echo 1 >> myid

修改配置文件。把conf目录下的zoo_sample.cfg文件改名为zoo.cfg（IP号记得改成你自己的）

server.1=192.168.198.131:2881:3881

server.2=192.168.198.131:2882:3882

server.3=192.168.198.131:2883:3883

zookeeper 集群启动报错

org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException

zoo.cfg 需要改成自己创建的data路径，因为里面有myid文件

dataDir=/usr/local/tools/zk-cloud/zookeeper01/data

创建启动文件，省的一个一个启动

vim zk-start.sh

cd zookeeper01/bin

./zkServer.sh start

cd ../../

cd zookeeper02/bin

./zkServer.sh start

cd ../../

cd zookeeper03/bin

./zkServer.sh start

cd ../../

chmod -R 755 zk-start.sh

zookeeper启动成功，查看 zkServer.sh stauts

apache atlas独立部署（hadoop、hive、kafka、hbase、solr、zookeeper）

apache atlas独立部署（hadoop、hive、kafka、hbase、solr、zookeeper）

Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+