×

学习Consul

96
卓一抗
2016.07.25 22:45* 字数 1880

Consul是由HashiCorp开发的服务发现和配置管理框架,这公司还有一个耳熟能详的工具:Vargrant,是不是想说,“哦,原来是这个公司。”

本文是 consul getting started学习笔记。

Consul是什么?


Consul主要有以下几个组件。

  • 服务发现(Service Discovery):服务提供者可以注册自己到Consul,服务使用者可以通过Consul查询到提供者,有http和dns两种方式。
  • 健康检查(Health Checking):Consul客户端提供了几中不同的健康检查方案,比如:服务本身是不是500了?;服务所在机器的内存使用率是不是超过90%了,等等,这些信息提供给运维监控整个集群的健康状态,也可以被服务发现组件使用一次来将流量导入到健康的服务中。
  • KV存储(Key/Value Store):应用可以使用Consul的分层kv存储干任何事情,比如:动态配置,特征标记,协调,leader选举等。KV存储的API是基于http的。
  • 多数据中心(Multi Datacenter):Consul创造性的支持多数据中心,Consul的使用者无需多做任何工作。

Consul是分布式,高可用系统,基本架构如下:

  • Consul agent:所有提供服务的节点都需要运行Consul agent(使用和查询服务的不需要),这个Agent主要负责服务本身和这个服务所在节点的健康检查
  • Consul servers:负责数据的存储和复制,servers之间选举出一个leader,虽然consul一个也可以工作,但是一般需要提供3至5个servers组成的servers集群。

查询服务的时候可以通过Agent也可以通过Server,agent会将查询路由给server然后返回结果。

安装


Mac Os X

$ brew cask install consul

如果你没有cast插件,这样安装:

$ brew install caskroom/cask/brew-cask

验证是否安装成功

➜  consul
usage: consul [--version] [--help] <command> [<args>]

Available commands are:
agent          Runs a Consul agent
configtest     Validate config file
event          Fire a new event
exec           Executes a command on Consul nodes
force-leave    Forces a member of the cluster to enter the "left" state
info           Provides debugging information for operators
join           Tell Consul agent to join cluster
keygen         Generates a new encryption key
keyring        Manages gossip layer encryption keys
leave          Gracefully leaves the Consul cluster and shuts down
lock           Execute a command holding a lock
maint          Controls node or service maintenance mode
members        Lists the members of a Consul cluster
monitor        Stream logs from a Consul agent
reload         Triggers the agent to reload configuration files
rtt            Estimates network round trip time between nodes
version        Prints the Consul version
watch          Watch for changes in Consul

运行Agent


每一个consul都可以运行在客户端(client)和服务器(server)模式,生成环境建议部署3-5台consul服务器,否则数据丢失是不可恢复的。agent 客户端模式则非常轻量级,主要用来:注册服务,健康检查,路由查询到服务器。Agent必须在集群的每一台机器上部署。

可以用dev模式快速启动一个consul,主要用做本地测试,千万不要用于生产环境。

$ consul agent -dev
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'zhuoyikangdeMBP.lan'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 192.168.199.171 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2016/07/25 21:38:02 [INFO] raft: Node at 192.168.199.171:8300 [Follower] entering Follower state
    2016/07/25 21:38:02 [INFO] serf: EventMemberJoin: zhuoyikangdeMBP.lan 192.168.199.171
    2016/07/25 21:38:02 [INFO] serf: EventMemberJoin: zhuoyikangdeMBP.lan.dc1 192.168.199.171
    2016/07/25 21:38:02 [INFO] consul: adding LAN server zhuoyikangdeMBP.lan (Addr: 192.168.199.171:8300) (DC: dc1)
    2016/07/25 21:38:02 [INFO] consul: adding WAN server zhuoyikangdeMBP.lan.dc1 (Addr: 192.168.199.171:8300) (DC: dc1)
    2016/07/25 21:38:02 [ERR] agent: failed to sync remote state: No cluster leader
    2016/07/25 21:38:03 [WARN] raft: Heartbeat timeout reached, starting election
    2016/07/25 21:38:03 [INFO] raft: Node at 192.168.199.171:8300 [Candidate] entering Candidate state
    2016/07/25 21:38:03 [DEBUG] raft: Votes needed: 1
    2016/07/25 21:38:03 [DEBUG] raft: Vote granted from 192.168.199.171:8300. Tally: 1
    2016/07/25 21:38:03 [INFO] raft: Election won. Tally: 1
    2016/07/25 21:38:03 [INFO] raft: Node at 192.168.199.171:8300 [Leader] entering Leader state
    2016/07/25 21:38:03 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2016/07/25 21:38:03 [DEBUG] raft: Node 192.168.199.171:8300 updated peer set (2): [192.168.199.171:8300]
    2016/07/25 21:38:03 [INFO] consul: cluster leadership acquired
    2016/07/25 21:38:03 [INFO] consul: New leader elected: zhuoyikangdeMBP.lan
    2016/07/25 21:38:03 [DEBUG] consul: reset tombstone GC to index 2
    2016/07/25 21:38:03 [INFO] consul: member 'zhuoyikangdeMBP.lan' joined, marking health alive
    2016/07/25 21:38:04 [INFO] agent: Synced service 'consul'
==> Failed to check for updates: Get https://checkpoint-api.hashicorp.com/v1/check/consul?arch=amd64&os=darwin&signature=&version=0.6.4: dial tcp: lookup checkpoint-api.hashicorp.com on 192.168.199.1:53: read udp 192.168.199.171:65100->192.168.199.1:53: i/o timeout

从日志可以看出两件事:

  1. consul运行在服务器模式。
  2. consul把自己选择为leader节点,进入leader state.

集群成员查看

✗ consul members
Node                 Address               Status  Type    Build  Protocol  DC
zhuoyikangdeMBP.lan  192.168.199.171:8301  alive   server  0.6.4  2         dc1

还可以加上-detailed获取更详细的信息,

使用consl memebers是基于gossip协议,并且是最终一致的,也就是说输出的结果可能和实际的网络环境并不一样,如果你需要强一致性的结果,用http接口:

curl localhost:8500/v1/catalog/nodes
[{"Node":"zhuoyikangdeMBP.lan","Address":"192.168.199.171","TaggedAddresses":{"wan":"192.168.199.171"},"CreateIndex":3,"ModifyIndex":4}]

另外consul还可以使用dns接口,注意你必须确定把你的dns服务器指向为consul agent的dns服务,端口默认为8600

# dns的玩意儿没怎么看懂
# 
dig @127.0.0.1 -p 8600 zhuoyikangdeMBP.lan

; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 zhuoyikangdeMBP.lan
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 45459
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;zhuoyikangdeMBP.lan.       IN  A

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Jul 25 21:48:01 2016
;; MSG SIZE  rcvd: 37

安全停止

Ctrl-C 可以让Consul平稳的退出,退出的时候该结点会通知集群中的其他节点它离开了。如果你粗暴的kill掉consul agent,集群中的其他节点会检查到该节点fail了。

  • 当一个节点平稳退出:从它登记的服务和安全检查将从目录中移除。
  • 当一个节点fail:它的健康状态被标记为critial,但是并不从目录中移除。consul会自动尝试重连fail的的节点,并且允许其恢复。

另外,当一个节点是服务节点时,平稳退出将可以有效的避免潜在问题 CONSENSUS PROTOCOL

安全运维请查看:CONSUL GUIDES/Adding/Removing Servers -

注册服务


服务定义和注册

服务可以通过两种形式注册:

  • 提供一个服务定义
  • 通过HTTP APi。

实际上服务定义这种方式比较常用,操作如下:

# 新建一个服务定义文件夹
sudo mkdir /etc/consul.d

# 新建一个服务 
echo '{"service": {"name": "web", "tags": ["rails"], "port": 80}}' \
    >/etc/consul.d/web.json
    
# 启动
consul agent -dev -config-dir /etc/consul.d

# 从日志可以看出同步启动了一个web服务。
....
2016/07/25 22:09:14 [INFO] agent: Synced service 'web'
...

查询服务

dns 方法

$ dig @127.0.0.1 -p 8600 web.service.consul
...

;; QUESTION SECTION:
;web.service.consul.        IN  A

;; ANSWER SECTION:
web.service.consul. 0   IN  A   172.20.20.11

可以看出找到一条A记录,可以通过SRV获取到主机信息

dig @127.0.0.1 -p 8600 web.service.consul SRV

; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 web.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19045
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;web.service.consul.        IN  SRV

;; ANSWER SECTION:
web.service.consul. 0   IN  SRV 1 1 80 209-160-124-64.fwd.paradisenetworks.net.node.dc1.consul.

;; ADDITIONAL SECTION:
209-160-124-64.fwd.paradisenetworks.net.node.dc1.consul. 0 IN A 192.168.199.171

;; Query time: 2 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Jul 25 22:12:40 2016
;; MSG SIZE  rcvd: 200

可以看出输出了该服务运行在209-160-124-64.fwd.paradisenetworks.net.node.dc1.consul.节点。

最后,可以用tags来过滤记录:TAG.NAME.service.consul

 dig @127.0.0.1 -p 8600 rails.web.service.consul

; <<>> DiG 9.8.3-P1 <<>> @127.0.0.1 -p 8600 rails.web.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41235
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;rails.web.service.consul.  IN  A

;; ANSWER SECTION:
rails.web.service.consul. 0 IN  A   192.168.199.171

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Jul 25 22:15:06 2016
;; MSG SIZE  rcvd: 82

可以用http api查询。

curl http://localhost:8500/v1/catalog/service/web
[{"Node":"209-160-124-64.fwd.paradisenetworks.net","Address":"192.168.199.171","ServiceID":"web","ServiceName":"web","ServiceTags":["rails"],"ServiceAddress":"","ServicePort":80,"ServiceEnableTagOverride":false,"CreateIndex":5,"ModifyIndex":5}]%

查询其健康状况:

curl 'http://localhost:8500/v1/health/service/web?passing'
[{"Node":{"Node":"209-160-124-64.fwd.paradisenetworks.net","Address":"192.168.199.171","TaggedAddresses":{"wan":"192.168.199.171"},"CreateIndex":3,"ModifyIndex":5},"Service":{"ID":"web","Service":"web","Tags":["rails"],"Address":"","Port":80,"EnableTagOverride":false,"CreateIndex":5,"ModifyIndex":5},"Checks":[{"Node":"209-160-124-64.fwd.paradisenetworks.net","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","CreateIndex":3,"ModifyIndex":3}]}]%

更新服务信息

修改配置文件后发送SIGHUP消息给Agent,Agent会自动更新配置。

consul集群


本章将通过Vagrant说明下consul集群的搭建。

首先你要去下载Vagrant配置

启动两个节点:

➜  vagrant up

.......
......

➜  vagrant status
Current machine states:

n1                        running (virtualbox)
n2                        running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
➜  vagrant

启动成功后,ssh登录到节点1。

vagrant ssh n1

上文中使用-dev用于测试,这个参数在集群环境中直接干掉。

  • 任何consul集群中的都必须有个名字,默认使用主机名,可以使用-node command-line option覆盖。
  • -bootstrap-expect 这个参数用来提示consul需要加入的剩余节点个数,直到所有的节点都加入后,consul才会开始进行某些集群工作。
  • -bind address:指定监听地址,如果不指定,consul会默认监听系统中的第一个私有IP,但是你最好指定一个。
vagrant@n1:~$ consul agent -server -bootstrap-expect 1 \
    -data-dir /tmp/consul -node=agent-one -bind=172.20.20.10 \
    -config-dir /etc/consul.d
...

然后

vagrant ssh n2
 
consul agent -data-dir /tmp/consul -node=agent-two \
    -bind=172.20.20.11 -config-dir /etc/consul.d

但是这个时候两个节点彼此不认识,必须把它们连在一起:

$ vagrant ssh n1
...
vagrant@n1:~$ consul join 172.20.20.11
Successfully joined cluster by contacting 1 nodes.

vagrant@n2:~$ consul members
Node       Address            Status  Type    Build  Protocol
agent-two  172.20.20.11:8301  alive   client  0.5.0  2
agent-one  172.20.20.10:8301  alive   server  0.5.0  2

To join a cluster, a Consul agent only needs to learn about one existing member. After joining the cluster, the agents gossip with each other to propagate full membership information.

启动时自动加入

上面是手动加入,比较麻烦,使用Atlas by HashiCorp可以自动加入:

$ consul agent -atlas-join \
  -atlas=ATLAS_USERNAME/infrastructure \
  -atlas-token="YOUR_ATLAS_TOKEN"

你需要创建一个atlas用户,并且替换上面的参数。

查询节点

可以像查询服务一样查询节点:

vagrant@n1:~$ dig @127.0.0.1 -p 8600 agent-two.node.consul
...

;; QUESTION SECTION:
;agent-two.node.consul. IN  A

;; ANSWER SECTION:
agent-two.node.consul.  0 IN    A   172.20.20.11

健康检查


定义健康检查

两种方式,定义文件和http.

vagrant@n2:~$ echo '{"check": {"name": "ping",
  "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' \
  >/etc/consul.d/ping.json

vagrant@n2:~$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
  "check": {"script": "curl localhost >/dev/null 2>&1", "interval": "10s"}}}' \
  >/etc/consul.d/web.json

获取到健康状态

vagrant@n1:~$ curl http://localhost:8500/v1/health/state/critical
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]

Key/Value存储


Consul还提供了简单的Key/Value存储,这个功能可以用来存储动态配置,帮助服务协调,建立leader选举等。

本地需要先启动一个agent哟。

consul agent -dev -config-dir /etc/consul.d -bind 127.0.0.1
 curl -v http://127.0.0.1:8500/v1/kv/\?recurse
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8500 (#0)
> GET /v1/kv/?recurse HTTP/1.1
> Host: 127.0.0.1:8500
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< X-Consul-Index: 1
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Tue, 26 Jul 2016 01:06:35 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 127.0.0.1 left intact

Put 一个数据

$ curl -X PUT -d 'test' http://127.0.0.1:8500/v1/kv/web/key1
true
$ curl -X PUT -d 'test' http://127.0.0.1:8500/v1/kv/web/key2?flags=42
true
$ curl -X PUT -d 'test'  http://127.0.0.1:8500/v1/kv/web/sub/key3
true
$ curl http://127.0.0.1:8500/v1/kv/?recurse
[{"CreateIndex":97,"ModifyIndex":97,"Key":"web/key1","Flags":0,"Value":"dGVzdA=="},
 {"CreateIndex":98,"ModifyIndex":98,"Key":"web/key2","Flags":42,"Value":"dGVzdA=="},
 {"CreateIndex":99,"ModifyIndex":99,"Key":"web/sub/key3","Flags":0,"Value":"dGVzdA=="}]

上面建立了3个Key,所有的数据都是test,注意数据是Base64编码的。可以通过Get来获取。

curl http://127.0.0.1:8500/v1/kv/web/key1
[{"LockIndex":0,"Key":"web/key1","Flags":0,"Value":"dGVzdA==","CreateIndex":8,"ModifyIndex":8}]%

也可以通过Delete删除

$ curl -X DELETE http://127.0.0.1:8500/v1/kv/web/sub?recurse
$ curl http://127.0.0.1:8500/v1/kv/web?recurse
[{"CreateIndex":97,"ModifyIndex":97,"Key":"web/key1","Flags":0,"Value":"dGVzdA=="},
 {"CreateIndex":98,"ModifyIndex":98,"Key":"web/key2","Flags":42,"Value":"dGVzdA=="}]

还可以通过Put修改:

$ curl -X PUT -d 'newval' http://localhost:8500/v1/kv/web/key1?cas=8
true
$ curl -X PUT -d 'newval' http://localhost:8500/v1/kv/web/key1?cas=8
false

?cas=表示使用consul的Check-And-Set操作,这个操作是原子的。cas的值为这条数据的ModifyIndex

第1条操作更新成功,因为ModifyIndex为8,第2条操作更新失败,因为ModifyIndex已经不是8了。

也可以通过index参数指定到ModifyIndex大于某个值为止:

$ curl "http://localhost:8500/v1/kv/web/key2?index=101&wait=5s"
[{"CreateIndex":98,"ModifyIndex":101,"Key":"web/key2","Flags":42,"Value":"dGVzdA=="}]

上面的数据会等待ModifyIndex大于101,最多等待5s,这对于等待Key修改非常有用。

完整文档

Web UI


Consul提供漂亮的UI,可以用来观察所有的服务和节点,UI也自动支持'多数据中心'。

有两种方式使用UI:

Atlas By HashiCorp

启动时候要指定Atls相关参数

consul agent -atlas=ATLAS_USERNAME/demo -atlas-token="ATLAS_TOKEN"

注册链接

Self-hosting The open-source UI

consul agent -dev -config-dir /etc/consul.d -bind 127.0.0.1 -ui
Paste_Image.png
微服务
Web note ad 1