clickhouse部署流程

一、概述

clickhouse是一个数据库管理系统，对于数据的读写、存储、查询、修改、复制、事务、效率等有其独特的方法架构论；另外它是列式存储的，将一列数据作为最小的存储单元，需要多少列读多少列，减少了IO的数据量，提升了效率；最后它适合分析结构化的、干净的、不可变的流式数据，作为olap是很好的选择。

二、安装部署

clickhouse的分布式是由分片（shard）+副本（replica）来实现的，数据的一致性、高可用及容错是结合zookeeper来控制的。所以生产环境上决定采用4台机器来搭建clickhouse集群（2shard * 2 replica）。详细部署步骤如下：

1、查看机器是否支持SSE 4.2

grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"

2、通过yum安装clickhouse-server、clickhouse-client（每台机器都要），安装过程中可能需要升级glibc，解决办法https://cloud.tencent.com/developer/article/1463094。

yum install yum-utils
rpm --import https://repo.yandex.ru/clickhouse/CLICKHOUSE-KEY.GPG
yum-config-manager --add-repo https://repo.yandex.ru/clickhouse/rpm/stable/x86_64search
yum install clickhouse-server clickhouse-client

3、修改默认配置文件config.xml

<!--日志相关-->
<logger>
 <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
 <level>trace</level>
 <log>/home/hadoop/logs/clickhouse-server/clickhouse-server.log</log>
 <errorlog>/home/hadoop/logs/clickhouse-server/clickhouse-server.err.log</errorlog>
 <size>1000M</size>
 <count>10</count>
 <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
 </logger>

<!--端口-->    
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<!--用于副本间通信的端口-->
<interserver_http_port>9009</interserver_http_port>
<interserver_http_host>192.168.16.1</interserver_http_host>
<listen_host>0.0.0.0</listen_host>

<!--数据路径及权限控制 -->
 <path>/home/hadoop/data/clickhouse/</path>
 <tmp_path>/home/hadoop/data/clickhouse/tmp/</tmp_path>
 <user_files_path>/home/hadoop/data/clickhouse/user_files/</user_files_path>
 <users_config>users.xml</users_config>
 <!-- Default profile of settings. -->
 <default_profile>default</default_profile>

 <!--集群相关配置-->
 <remote_servers incl=""clickhouse_remote_servers""  />
 <zookeeper incl=""zookeeper-servers""  optional=""true""   />
 <macros incl=""macros""  optional=""true"" />

4、添加集群配置文件metrika.xml

<yandex>
    <!--ck集群节点-->
    <clickhouse_remote_servers>
    <hadooptest_clusters>
    <!--分片1-->
    <shard>
    <internal_replication>true</internal_replication>
    <replica>
    <host>192.168.16.1</host>
    <port>9000</port>
    </replica>
    </shard>
    <!--分片2-->
    <shard>
    <internal_replication>true</internal_replication>
    <replica>
    <host>192.168.16.2</host>
    <port>9010</port>
    </replica>
    </shard>
    </hadooptest_clusters>
    </clickhouse_remote_servers>
    <!--zookeeper相关配置-->
    <zookeeper-servers>
    <node index="1">
    <host>192.168.16.1</host>
    <port>2281</port>
    </node>
    <node index="2">
    <host>192.168.16.2</host>
    <port>2281</port>
    </node>
    <node index="3">
    <host>192.168.16.24</host>
    <port>2281</port>
    </node>
    </zookeeper-servers>

    <macros>
    <layer>hadooptest_clusters</layer>
    <shard>01</shard> <!--分片号-->
    <replica>192.168.16.1</replica> <!--当前节点IP-->
    </macros>
    <networks>
    <ip>::/0</ip>
    </networks>
    <!--压缩相关配置-->
    <clickhouse_compression>
    <case>
    <min_part_size>10000000000</min_part_size>
    <min_part_size_ratio>0.01</min_part_size_ratio>
    <method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘-->
    </case>
    </clickhouse_compression>
</yandex>

5、添加数据盘配置，冷热数据分离

<yandex>
<storage_configuration>
    <disks>
        <default>
            <keep_free_space_bytes>1024</keep_free_space_bytes>
        </default>
        <fast>
            <path>/home/hadoop/data/clickhouse/</path>
        </fast>
        <normal>
            <path>/home/hadoop/data/clickhouse/</path>
            <keep_free_space_bytes>10485760</keep_free_space_bytes>
        </normal>
    </disks>

    <policies>
        <ssd_and_hdd>
            <volumes>
                <hot>
                    <disk>fast</disk>
                    <max_data_part_size_bytes>1073741824</max_data_part_size_bytes>
                </hot>
                <cold>
                        <disk>normal</disk>
                </cold>
        </volumes>
        <move_factor>0.2</move_factor>
        </ssd_and_hdd>
    </policies>
</storage_configuration>
</yandex>

6、配置用户权限

<?xml version="1.0"?>
<yandex>
 <!-- Profiles of settings. -->
 <profiles>
 <!-- Default settings. -->
 <default>
 <!-- Maximum memory usage for processing single query, in bytes. -->         <max_memory_usage_for_all_queries>101310968832</max_memory_usage_for_all_queries>          <max_bytes_before_external_group_by>50655484416</max_bytes_before_external_group_by>
 <max_memory_usage>101310968832</max_memory_usage>         <distributed_aggregation_memory_efficient>1</distributed_aggregation_memory_efficient>
 <use_uncompressed_cache>0</use_uncompressed_cache>
 <load_balancing>random</load_balancing>
 </default>
 <readonly>
 <readonly>1</readonly>
 </readonly>
 </profiles>

 <quotas>
 <default>
 <interval>
 <duration>3600</duration>
 <queries>0</queries>
 <errors>0</errors>
 <result_rows>0</result_rows>
 <read_rows>0</read_rows>
 <execution_time>0</execution_time>
 </interval>
 </default>
 </quotas>

 <users>
 <!-- If user name was not specified, 'default' user is used. -->
 <default>
 <password>MVZqc4ne</password>
 <networks incl="networks" replace="replace">
 <ip>::/0</ip>
 </networks>

 <!-- Settings profile for user. -->
 <profile>default</profile>

 <!-- Quota for user. -->
 <quota>default</quota>

 <!-- For testing the table filters -->
 <databases>
 <test>
 <!-- Simple expression filter -->
 <filtered_table1>
 <filter>a = 1</filter>
 </filtered_table1>

 <!-- Complex expression filter -->
 <filtered_table2>
 <filter>a + b &lt; 1 or c - d &gt; 5</filter>
 </filtered_table2>
 <filtered_table3>
 <filter>c = 1</filter>
 </filtered_table3>
 </test>
 </databases>
 </default>
 <bi_test>
 <password_sha256_hex>645cb15583a65c5b7d89f02b37a97fe162e79dafdacb7450de2a679ff602c9ea</password_sha256_hex>
 <networks incl="networks" replace="replace">
 <ip>::/0</ip>
 </networks>
 <profile>default</profile>
 <quota>default</quota>
 </bi_test>
 </users>
</yandex>

7、由于需要用到zookeeper来管理，这里选择单独安装zookeeper。

7.1、将线上腾讯云的kafka安装包copy到新机器。

7.2、配置zoo.cfg文件

# The number of milliseconds of each tick
tickTime=2000
initLimit=10
dataDir=/home/hadoop2/zookeeper
clientPort=2181
minSessionTimeout=6000
maxSessionTimeout=180000
autopurge.snapRetainCount=10
autopurge.purgeInterval=1
server.21=hadoop1:2888:3888
server.22=hadoop2:2888:3888
server.23=hadoophadoop3:2888:3888
maxClientCnxns=500

7.3、启动zookeeper

bin/zkServer.sh  start</pre>

8、启动clickhouse server

service clickhouse-server start

9、查看日志是否异常，连接clickhouse-client测试建表，导数，查询等。

clickhouse-client -h 192.168.16.1 --port 9000 -u bi_test --password Ny3jTUoTQUlQAb4i

hadoop2 :) select count(*) from dws_sb_olap_user_basic_1d;

SELECT count(*)
FROM dws_sb_olap_user_basic_1d

┌─count()─┐
│  475835 │
└─────────┘

1 rows in set. Elapsed: 0.005 sec. Processed 475.83 thousand rows, 475.83 KB (92.74 million rows/s., 92.74 MB/s.)

三、clickhouse表引擎使用介绍

clickhouse的表引擎决定了数据的存放和读取方式，从而也就决定了IO效率。不同的表引擎主要决定以下几点：

数据存储和读取的位置
支持哪些查询方式
能否并发式访问数据
能不能使用索引
是否可以执行多线程请求
数据复制使用的参数

这里主要介绍三种表引擎：MergeTree、ReplicatedMergeTree、Distributed。

MergeTree

四、监控及后期维护

1、监控

此次监控采取clickhouse exporter + prometheus + grafana方式。

1.1、在centos 7的机器上安装docker

#安装
yum -y install docker
#启动
service docker start
#验证
docker version
docker run hello-world

1.2、重新制作clickhouse exporter的镜像（因为需要更改相关环境变量）

#编写dockerfile，将参数传递进去
#Dockerfile
FROM docker.io/f1yegor/clickhouse-exporter
ADD clickhouse_exporter_start.sh /opt/clickhouse_exporter_start.sh
ENTRYPOINT ["/opt/clickhouse_exporter_start.sh"]
#clickhouse_exporter_start.sh
#!/bin/sh
export CLICKHOUSE_USER=default
export CLICKHOUSE_PASSWORD=**********
/usr/local/bin/clickhouse_exporter $*
#制作镜像
docker build -t ck_clickhouse_exporter .
#查看镜像
╰─># docker images
REPOSITORY                              TAG                 IMAGE ID            CREATED             SIZE
ck_clickhouse_exporter               latest              3384e729d116        23 hours ago        19.6 MB
docker.io/f1yegor/clickhouse-exporter   latest              9d9bfc1c7cb2        9 months ago        19.6 MB
docker.io/hello-world                   latest              fce289e99eb9        11 months ago       1.84 kB

docker run -d -p 9116:9116 ck_clickhouse_exporter -scrape_uri=http://ip1:8123/
docker run -d -p 9117:9116 ck_clickhouse_exporter -scrape_uri=http://ip2:8123/
docker run -d -p 9118:9116 ck_clickhouse_exporter -scrape_uri=http://ip3:8123/
docker run -d -p 9119:9116 ck_clickhouse_exporter -scrape_uri=http://ip4:8123/

1.4、编辑prometheus配置文件，加入刚才启动的exporter。

 - job_name : 'clickhouse'
 scrape_interval:     30s
 static_configs:
 - targets: ['ck1:9116','ck1:9117','ck1:9118','ck1:9119']
 labels:
 env: 'tx_clickhouse'

1.5、导入dashboard到grafana

先将开源的dashboard导入grafana然后再根据具体需要做调整。

1.6、利用prometheus的alter manager设置报警。

配置rules

groups:
- name: Clickhouse监控规则
 rules:
 - alert: "clickhouse实例状态告警"
 expr: clickhouse_version_integer != 19016003
 for: 3m
 labels:
 severity: critical

 annotations:
 summary: "Clickhouse实例异常"
 description: "Clickhouse {{$labels.instance}}实例状态异常 当前状态:{{ $value }}"

 - alert: "Clickhouse查询告警"
 expr: clickhouse_memory_tracking > 32212254720
 for: 3m
 labels:
 severity: critical

 annotations:
 summary: "Clickhouse查询内存异常"
 description: "Clickhouse {{$labels.exported_instance}}实例查询内存大于30G 当前状态:{{ $value }}"

后续再补充相关报警。

1.7、编写alertmanager_webhook相关邮件或者短信接口服务。

2、扩容

2.1、机器cpu内存扩容，因为是云主机可直接申请升配。

2.2、机器磁盘扩容，添加磁盘后可参考冷热数据分离配置不同的数据分布方式，也可以手动更改表数据分区的磁盘分布。

ALTER TABLE table_name MOVE PARTITION|PART partition_expr TO DISK|VOLUME 'disk_name'

最后编辑于：2021.04.18 22:05:11

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 159,716评论 4赞 364
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 67,558评论 1赞 294
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 109,431评论 0赞 244
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 44,127评论 0赞 209
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 52,511评论 3赞 287
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 40,692评论 1赞 222
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 31,915评论 2赞 313
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 30,664评论 0赞 202
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 34,412评论 1赞 246
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 30,616评论 2赞 245
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 32,105评论 1赞 260
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 28,424评论 2赞 254
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 33,098评论 3赞 238
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 26,096评论 0赞 8
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 26,869评论 0赞 197
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 35,748评论 2赞 276
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 35,641评论 2赞 271

clickhouse部署流程

一、概述

二、安装部署

三、clickhouse表引擎使用介绍

四、监控及后期维护

推荐阅读更多精彩内容