Openstack计算节点高可用功能安装部署

UMHA

UMHA(UMCloud High Availability)--计算节点高可用处理程序,其目的是在某计算节点发生故障时,能够 自动将在该计算节点上的虚机疏散到其他可用计算节点,进而保证生产环境业务的正常运行。另外,真正运行在 环境下的程序名称则是umha-processing。

1. 环境准备

umha是基于ceilometer的扩展来实现的,目前阶段只增加storage_connectivity(存储网连通性,后续 可根据实际需求再拓展其他meter项)一个meter项,所以我们首先需要安装、配置及部署ceilomter。

安装

安装分两块:计算节点和控制节点安装。

计算节点

计算节点安装的ceilometer程序用来采集数据(这里指的是上面提到的存储网的连通性),然后通过publisher把数据上报,注意此次ceilometer扩展采用的publisher是udp。

安装

需要安装的包:python-ceilometer,python-ceilometerclient,ceilometer-common,ceilometer-polling,如果是直接通过dpkg安装,因为包间的依赖关系注意它们的安装顺序,当然也可以制作apt源通过apt-get安装。安装特别是首次安装时,往往会有ceilometer配置文件的交互界面的出现,直接跳过或者一路按enter即可,因为总归安装完之后我们仍要编辑ceilometer配置文件,关于这一点下边会讲到。

配置

涉及到两个文件:/etc/ceilometer/ceilometer.conf和/etc/ceilometer/pipeline_ha.yaml,下面给出这两个配置文件的sample。

/etc/ceilometer/ceilometer.conf:

[DEAFULT]
# Polling namespace(s) to be used while resource polling (list value)
# Allowed values: compute, central, ipmi
polling_namespaces = compute

# List of metadata keys reserved for metering use. And these keys are additional to the ones included in the namespace. (list value)
reserved_metadata_keys = cluster

# Configuration file for pipeline definition. (string value)
pipeline_cfg_file = pipeline_ha.yaml

# If set to true, the logging level will be set to DEBUG instead of the default INFO level. (boolean value)
debug = false

# If set to false, the logging level will be set to WARNING instead of the default INFO level. (boolean value)
# This option is deprecated for removal.
# Its value may be silently ignored in the future.
verbose = true

# Log output to standard error. This option is ignored if log_config_append is set. (boolean value)
use_stderr = false

# The messaging driver to use, defaults to rabbit. Other drivers include amqp and zmq. (string value)
rpc_backend = rabbit

[compute]
# Some storage address used to check storage-connectivity. (string value)
storage_server_address = 192.168.0.10, 192.168.0.20

# Count of ping storage gateway. (integer value)
ping_count = 4

# Ping loss ratio that can be accepted. (floating point value)
ping_loss_ratio = 0.5

[keystone_authtoken]
# Complete public Identity API endpoint. (string value)
auth_uri = http://192.168.0.2:5000/v2.0

[oslo_messaging_rabbit]
# The RabbitMQ broker address where a single node is used. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_host
rabbit_host = 192.168.0.6

# The RabbitMQ broker port where a single node is used. (port value)
# Minimum value: 0
# Maximum value: 65535
# Deprecated group/name - [DEFAULT]/rabbit_port
rabbit_port = 5673

# Connect over SSL for RabbitMQ. (boolean value)
# Deprecated group/name - [DEFAULT]/rabbit_use_ssl
rabbit_use_ssl = false

# The RabbitMQ userid. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_userid
rabbit_userid = nova

# The RabbitMQ password. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_password
rabbit_password = 0edA4bYmVlJHDuzkI93mYBs1

# The RabbitMQ virtual host. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_virtual_host
rabbit_virtual_host = /

# Try to use HA queues in RabbitMQ (x-ha-policy: all). If you change this option, you must wipe the RabbitMQ database. In RabbitMQ 3.0,
# queue mirroring is no longer controlled by the x-ha-policy argument when declaring a queue. If you just want to make sure that all queues
# (except  those with auto-generated names) are mirrored across all nodes, run: "rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode":
# "all"}' " (boolean value)
# Deprecated group/name - [DEFAULT]/rabbit_ha_queues
rabbit_ha_queues = True

# Number of seconds after which the Rabbit broker is considered down if heartbeat's keep-alive fails (0 disable the heartbeat). EXPERIMENTAL
# (integer value)
heartbeat_timeout_threshold = 60

[publisher]
# Secret value for signing messages. Set value empty if signing is not required to avoid computational overhead. (string value)
# Deprecated group/name - [DEFAULT]/metering_secret
# Deprecated group/name - [publisher_rpc]/metering_secret
# Deprecated group/name - [publisher]/metering_secret
telemetry_secret = nmWpu7Pu8DDWhQ3oPaBpVQjn

[service_credentials]
# Region name to use for OpenStack service endpoints. (string value)
# Deprecated group/name - [DEFAULT]/os_region_name
region_name = RegionOne

# Type of endpoint in Identity service catalog to use for communication with OpenStack services. (string value)
# Allowed values: public, internal, admin, auth, publicURL, internalURL, adminURL
# Deprecated group/name - [DEFAULT]/os_endpoint_type
interface = internalURL

# Authentication URL (unknown value)
auth_url = http://192.168.0.2:5000/v2.0/

# Project name to scope to (unknown value)
# Deprecated group/name - [DEFAULT]/tenant-name
project_name = services

# Username (unknown value)
# Deprecated group/name - [DEFAULT]/user-name
username = ceilometer

# User's password (unknown value)
password = a1b2c3d4e5

/etc/ceilometer/pipeline_ha.yaml:

---
sources:
    - name: ha_source
      interval: 30
      meters:
            - "storage_connectivity"
      sinks:
          - ha_sink
sinks:
    - name: ha_sink
      transformers:
      publishers:
          - udp://192.168.0.6:4953  #存储网ip及端口
#         - udp://192.168.1.2:4953  #管理网ip及端口

启动/重启服务

service ceilometer-polling start/restart

控制节点

控制节点安装的ceilometer程序用来收集计算节点上报的数据,然后入库保存,这里涉及数据库核心表有三个:sample, meter, resource

安装

需要安装的包:python-ceilometer, python-ceilometerclient, ceilometer-common, ceilometer-polling, ceilometer-collector, ceilometer-api, ceilometer-agent-notification,至于安装过程上文已经提到,在此不做赘述。

配置

涉及到一个配置文件:/etc/ceilometer/ceilometer.conf,下面给出这个配置文件的sample。

/etc/ceilometer/ceilomter.conf

[DEFAULT]
# Polling namespace(s) to be used while resource polling (list value)
# Allowed values: compute, central, ipmi
polling_namespaces = central

# Dispatchers to process metering data. (multi valued)
# Deprecated group/name - [DEFAULT]/dispatcher
meter_dispatchers = database

# If set to true, the logging level will be set to DEBUG instead of the default INFO level. (boolean value)
debug = false

# If set to false, the logging level will be set to WARNING instead of the default INFO level. (boolean value)
# This option is deprecated for removal.
# Its value may be silently ignored in the future.
verbose = true

[collector]
# Address to which the UDP socket is bound. Set to an empty string to disable. (string value)
udp_address = 0.0.0.0

# Port to which the UDP socket is bound. (port value)
# Minimum value: 0
# Maximum value: 65535
udp_port = 4953

[database]
# Number of seconds that samples are kept in the database for (<= 0 means forever). (integer value)
# Deprecated group/name - [database]/time_to_live
metering_time_to_live = 1800

# The SQLAlchemy connection string to use to connect to the database. (string value)
# Deprecated group/name - [DEFAULT]/sql_connection
# Deprecated group/name - [DATABASE]/sql_connection
# Deprecated group/name - [sql]/connection
connection = mysql://ceilometer:ceilometer123@192.168.0.2/ceilometer?charset=utf8

[oslo_messaging_rabbit]
# The RabbitMQ broker address where a single node is used. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_host
rabbit_host = 192.168.0.6

# The RabbitMQ broker port where a single node is used. (port value)
# Minimum value: 0
# Maximum value: 65535
# Deprecated group/name - [DEFAULT]/rabbit_port
rabbit_port = 5673

# Connect over SSL for RabbitMQ. (boolean value)
# Deprecated group/name - [DEFAULT]/rabbit_use_ssl
rabbit_use_ssl = false

# The RabbitMQ userid. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_userid
rabbit_userid = nova

# The RabbitMQ password. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_password
rabbit_password = 0edA4bYmVlJHDuzkI93mYBs1

# The RabbitMQ virtual host. (string value)
# Deprecated group/name - [DEFAULT]/rabbit_virtual_host
rabbit_virtual_host = /

# Try to use HA queues in RabbitMQ (x-ha-policy: all). If you change this option, you must wipe the RabbitMQ database. In RabbitMQ 3.0,
# queue mirroring is no longer controlled by the x-ha-policy argument when declaring a queue. If you just want to make sure that all queues
# (except  those with auto-generated names) are mirrored across all nodes, run: "rabbitmqctl set_policy HA '^(?!amq\.).*' '{"ha-mode":
# "all"}' " (boolean value)
# Deprecated group/name - [DEFAULT]/rabbit_ha_queues
rabbit_ha_queues = True

# Number of seconds after which the Rabbit broker is considered down if heartbeat's keep-alive fails (0 disable the heartbeat). EXPERIMENTAL
# (integer value)
heartbeat_timeout_threshold = 60

[publisher]
# Secret value for signing messages. Set value empty if signing is not required to avoid computational overhead. (string value)
# Deprecated group/name - [DEFAULT]/metering_secret
# Deprecated group/name - [publisher_rpc]/metering_secret
# Deprecated group/name - [publisher]/metering_secret
telemetry_secret = nmWpu7Pu8DDWhQ3oPaBpVQjn

创建ceilometer数据库

首先,进入到mysql交互命令行,依次执行如下命令即可:

create database ceilometer CHARACTER SET utf8;
GRANT ALL PRIVILEGES ON ceilometer.* TO 'ceilometer'@'localhost' IDENTIFIED BY 'ceilometer123';
GRANT ALL PRIVILEGES ON ceilometer.* TO 'ceilometer'@'%' IDENTIFIED BY 'ceilometer123';

注意:上面命令中ceilometer123是访问ceilometer数据库的密码,要跟/etc/ceilometer/ceilometer.confconnection的配置保持一致。

其次,对ceilometer数据库进行初始化:

ceilometer-dbsync --config-file /etc/ceilometer/ceilometer.conf

最后,收集到的数据是用来做服务判断的,所以没必要做长时间保留,可以通过metering_time_to_live配置项设置sample数据保留时间,然后通过crontab定期清理过期数据,比如2个小时清理一次:

0 */2 * * * ceilometer-expirer --config-file /etc/ceilometer/ceilometer.conf

keystone相关

首先,创建ceilometer用户:

openstack user create --domain default --password-prompt ceilometer

注意:根据提示设置ceilometer用户密码,这个要跟配置保持一致。

其次,添加admin角色到ceilometer用户上:

openstack role add --project service --user ceilometer admin

然后,创建ceilometer服务实体:

openstack service create --name ceilometer --description "Telemetry" metering

最后,创建endpoint:

# 命令中ip要根据实际情况更改
openstack endpoint create --region RegionOne metering public http://172.31.11.204:8777
openstack endpoint create --region RegionOne metering admin http://192.168.0.2:8777
openstack endpoint create --region RegionOne metering internal http://192.168.0.2:8777

启动/重启服务
service ceilometer-api start/restart
service ceilometer-agent-notification start/restart
service ceilometer-polling start/restart
service ceilometer-collector start/restart

至此,ceilometer的安装及部署已经完成,下面开始umha的安装及部署。

2. umha安装

这部分程序是安装在控制节点上的,安装umha之前,我们有必要重新明确umha的功能:evacuate和邮件告警:

  • evacuate:是umha的核心功能,在某计算节点被检测到故障时,自动将其上的虚机疏散到其他可用节点;
  • 邮件告警:这个功能是从用户体验的角度考虑的,可以让用户及时地获知节点或者是虚机的状态。

安装

安装包的名称是umha_xxx_amd64.deb,直接用dpkg命令安装即可,至于安装过程中的配置问题,跟上面处理类似。

3. umha配置

umha相关的配置文件都会统一放在/etc/umha/目录下,具体文件可参见如下:

文件路径
/etc/umha/umha.conf
/etc/umha/ipmi_info
/etc/umha/node_storage_ip_info

umha.conf

这个是umha的核心配置文件,安装完unha之后我们一般都要重新配置,它会涉及到几个section:DEFAULT,认证、数据库、邮件、时间和阈值,具体可参照下面的sample:

[DEFAULT]

#
# From umha.conf
#

# ipmi have configured (boolean value)
has_IPMI = true

# filename store ipmi info (string value)
ipmi_info = /etc/umha/ipmi_info

# number of ping to check connectivity (integer value)
ping_count = 3

# if allow evacuate local vm (boolean value)
allow_evacuate_local_vm = false

# (boolean value)
on_share_storage = true

# (integer value)
ipmi_check_max_count = 3

# (integer value)
ipmi_check_interval = 10

# (boolean value)
force_evacuate_when_ipmi_unknown = false

# (dict value)
business_connectivity_cfg = enabled:0,severity:0

# (dict value)
path_access_ok_cfg = enabled:0,severity:0

# filename store node storage ip info (string value)
node_storage_ip_info = /etc/umha/node_storage_ip_info

# (boolean value)
really_evacuate = true

# (integer value)
interval_check_nova_service = 10

# (integer value)
max_second_wait_for_service_down = 60

# (integer value)
seconds_before_check_vm_status = 20

[auth]

#
# From umha.conf
#

# (string value)
auth_url = http://192.168.0.2:35357/v2.0

# (string value)
username = admin

# (string value)
password = admin

# (string value)
tenant_name = admin

# (string value)
nova_version = 2

# (string value)
cinder_version = 2

# (string value)
region_name = RegionOne

[db]

#
# From umha.conf
#

# db connection (string value)
connection = mysql://ceilometer:ceilometer123@192.168.0.2/ceilometer?charset=utf8

[email]

#
# From umha.conf
#

# (boolean value)
enable_send_mail = true

# (string value)
send_server = smtp.126.com

# (integer value)
send_port = 25

# (boolean value)
need_auth = true

# (string value)
sender_username = zynf183913831@126.com

# (string value)
sender_pwd = Fch19880810

# (list value)
receiver_list = fuchunhui@umcloud.com

# (string value)
mail_subject = UMHA高可用告警

# (integer value)
send_interval_hours = 3

[time]

#
# From umha.conf
#

# (integer value)
sample_interval = 60

# interval of running check (integer value)
ha_check_interval = 180

# timeout of evacuate (integer value)
evacuate_timeout = 60

[valve]

#
# From umha.conf
#

# (integer value)
storage_number = 6

# (floating point value)
storage_ratio = 0.6

# (integer value)
power_off_number = 6

# (floating point value)
power_off_ratio = 0.6

# (integer value)
panic_number = 6

# (floating point value)
panic_ratio = 0.6

# (integer value)
error_count = 3

# (integer value)
continue_count = 5

数据库同步

umha新增了几个表:

Table Name
node_heartbeat
ipmi_info
umha_log
evacuate_history

首次安装umha程序或者存在表更新时,在安装完umha之后要手动做数据库同步:执行umha_db_sync

导入存储网信息

把计算节点存储网的信息记录到/etc/umha/node_storage_ip_info,文件内容格式:hostname:storage_ip

比如:

node-1.domain.tld:192.168.0.6

node-3.domain.tld:192.168.0.10

编辑完毕后,然后执行load_node_storage_ip

导入IPMI信息

把计算节点IPMI信息记录到/etc/umha/ipmi_info,文件内容格式:hostname:ip:username:password

比如:

node-1.domain.tld:172.31.11.56:admin:admin

node-3.domain.tld:172.31.11.168:root:root123

编辑完毕后,执行命令load_ipmi_info

启动/重启服务

service umha-processing start/restart

4. umha部署到生产环境

以上ceilometer和umha的安装和部署一般是针对开发或测试环境(通常一个计算节点和一个控制节点即可),如果是部署到生产环境,必须要避免单节点发生故障时服务不可用的情况。鉴于生产环境的这种需求,可以采用corosync和pacemaker来对节点集群进行管理,且一般使用主备模式。

部署准备

一般情况下,openstack集群下的控制节点都已安装corosync、pacemaker和crmsh,并且也已经把节点纳管到pacemaker集群管理中,所以我们只需要做资源的添加及配置。

注:以下所有操作均在控制节点上完成。

属性设置

crm configure property no-quorum-policy=ignore stonith-enabled=false

添加资源

需要将ceilometer-collector和umha-processing加入到pacemaker集群资源管理中。

另外,对于新添加的这两个资源对应的服务要检查各自的upstart文件是不是配置了禁用开机启动,如果没有,需要配置禁用服务开机自启动。

新增ceilometer-collector-ha资源

crm configure primitive ceilometer-collector-ha upstart:ceilometer-collector-ha op monitor interval=30s timeout=60s

新增umha-processing-ha资源

crm configure primitive umha-processing-ha upstart:umha-processing-ha op monitor interval=30s timeout=60s

资源设置节点倾向

随意选择一个控制节点,设置资源对每个节点倾向,节点域名可通过执行uname -n获取

# ceilometer-collector-ha
crm configure location ceilometer-collector-ha-on-node-7.domain.tld ceilometer-collector-ha 100: node-7.domain.tld
crm configure location ceilometer-collector-ha-on-node-8.domain.tld ceilometer-collector-ha 100: node-8.domain.tld
crm configure location ceilometer-collector-ha-on-node-6.domain.tld ceilometer-collector-ha 100: node-6.domain.tld

# umha-processing-ha
crm configure location umha-processing-ha-on-node-7.domain.tld umha-processing-ha 100: node-7.domain.tld
crm configure location umha-processing-ha-on-node-8.domain.tld umha-processing-ha 100: node-8.domain.tld
crm configure location umha-processing-ha-on-node-8.domain.tld umha-processing-ha 100: node-8.domain.tld

设置协同约束

# 强约束ceilometer-collector和umha-processing运行同一个节点
crm configure colocation ceilometer-collector-ha_with_umha-processing-ha inf: ceilometer-collector-ha umha-processing-ha

防火墙问题

如上所述,ceilometer采集和收集数据是通过udp,这里设置了新端口4953,所以控制节点iptables要开放udp 4953端口,否则无法收集数据,具体操作如下:

# 在/etc/iptables/rules.v4的chain input末尾追加:
-A INPUT -p udp -s 192.168.0.0/24 --dport 4953 -m comment --comment "Allow mgnt traffic to umha" -j ACCEPT
# 保存生效
iptables-restore < /etc/iptables/rules.v4

<完>

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 158,847评论 4 362
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 67,208评论 1 292
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 108,587评论 0 243
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 43,942评论 0 205
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 52,332评论 3 287
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 40,587评论 1 218
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 31,853评论 2 312
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 30,568评论 0 198
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 34,273评论 1 242
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 30,542评论 2 246
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 32,033评论 1 260
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 28,373评论 2 253
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 33,031评论 3 236
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 26,073评论 0 8
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 26,830评论 0 195
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 35,628评论 2 274
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 35,537评论 2 269