Ceph实践之存储池

Ceph对集群中所有存储资源进行池化管理,pool是一个逻辑上的概念,其表达的是一组约束条件。本文旨在记录一下存储池相关的实际操作。

创建副本存储池

  1. 创建副本类型的crushrule

命令形式:
ceph osd crush rule create-replicated {crushrule_name} {root_bucket_name} {failure_domain}

例如:
创建一个名为data_ruleset(故障域host,使用data为root bucket)的副本crushrule
ceph osd crush rule create-replicated data_ruleset data host

输出形如:

[root@node81 ~]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                 STATUS REWEIGHT PRI-AFF 
 -9       0.05389 root data        ##root bucket的名字                              
-10       0.01797     host data_host_node81                         
  0   hdd 0.00899         osd.0                 up  1.00000 1.00000 
  1   hdd 0.00899         osd.1                 up  1.00000 1.00000 
-13       0.01797     host data_host_node82                         
  2   hdd 0.00899         osd.2                 up  1.00000 1.00000 
  3   hdd 0.00899         osd.3                 up  1.00000 1.00000 
-15       0.01794     host data_host_node85                         
  4   hdd 0.00897         osd.4                 up  1.00000 1.00000 
  5   hdd 0.00897         osd.5                 up  1.00000 1.00000 
 -1       0.05392 root default                                      
 -3       0.01797     host node81                                   
  0   hdd 0.00899         osd.0                 up  1.00000 1.00000 
  1   hdd 0.00899         osd.1                 up  1.00000 1.00000 
 -5       0.01797     host node82                                   
  2   hdd 0.00899         osd.2                 up  1.00000 1.00000 
  3   hdd 0.00899         osd.3                 up  1.00000 1.00000 
 -7       0.01797     host node85                                   
  4   hdd 0.00899         osd.4                 up  1.00000 1.00000 
  5   hdd 0.00899         osd.5                 up  1.00000 1.00000 
[root@node81 ~]# 
[root@node81 ~]# 
[root@node81 ~]# ceph osd crush rule create-replicated data_ruleset data host
[root@node81 ~]# ceph osd crush rule dump data_ruleset  //查看创建后的data_ruleset详情
{
    "rule_id": 1,
    "rule_name": "data_ruleset",
    "ruleset": 1,   //crush rule id
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -9,
            "item_name": "data"  //root bucket
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"   //故障域host
        },
        {
            "op": "emit"
        }
    ]
}

[root@node81 ~]# 


2.创建副本策略存储池


命令形式:
ceph osd pool create {pool_name} {pg_num} {pgp_num} replicated {crushrule_name}
pg_num的算法:官网推荐100*{disksize}/{size} (size:数据份数),然后去取结果值最近的2的n次方的值

eg:创建一个名为data的副本存储池
ceph osd pool create data 256 256 replicated data_ruleset

输出形如:

[root@node81 ~]# ceph osd pool create data 256 256 replicated data_ruleset
pool 'data' created
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool ls 
data
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool ls  detail  //查看存储池详情, size 此时为3
pool 1 'data' replicated size 3 min_size 1 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 35 flags hashpspool stripe_width 0

[root@node81 ~]# 


3.修改存储池副本size


如果创建存储池后不调整size,那么该size为集群配置默认值,即osd_pool_default_size参数决定
手动调整存储池size
命令形式:
ceph osd pool set {pool_name} size {number}
eg:
修改data为双副本:
ceph osd pool set data size 2

输出形如:

[root@node81 ~]# ceph osd pool ls  detail  //此时存储池为3副本
pool 1 'data' replicated size 3 min_size 1 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 35 flags hashpspool stripe_width 0

[root@node81 ~]# 
//查看环境osd_pool_default_size默认值
[root@node81 ~]# cat /etc/ceph/ceph.conf | grep osd_pool_default_size
osd_pool_default_size = 3
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool set data size 2  //修改存储池为双副本
set pool 1 size to 2
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool ls  detail  //size此时为2
pool 1 'data' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 38 flags hashpspool stripe_width 0

[root@node81 ~]# 
[root@node81 ~]# 


创建纠删码存储池

1.创建erasure-code-profile


命令形式:
ceph osd erasure-code-profile set {profile_name} k={knum} m={mnum} crush-failure-domain={failure_domain} crush-root={root_bucket_name}
或者
ceph osd erasure-code-profile set {profile_name} k={knum} m={mnum} ruleset-failure-domain={failure_domain} ruleset-root={root_bucket_name}

eg:创建一个名为h_profile 策略2+1故障域host的profile
ceph osd erasure-code-profile set h_profile k=2 m=1 crush-failure-domain=host crush_root=data
ceph osd erasure-code-profile set h1_profile k=2 m=1 ruleset-failure-domain=host ruleset_root=data

输出形如:

[root@node81 ~]# ceph osd erasure-code-profile set h_profile k=2 m=1 crush-failure-domain=host crush_root=data
[root@node81 ~]# 
[root@node81 ~]# ceph osd erasure-code-profile ls  //查看所有的profile文件
default
h_profile
[root@node81 ~]# 
[root@node81 ~]# ceph osd erasure-code-profile get h_profile //获取某个profile文件详情
crush-device-class=
crush-failure-domain=host
crush-root=default
crush_root=data
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
technique=reed_sol_van
w=8
[root@node81 ~]# 

2.创建纠删码类型crushrule


命令形式:
ceph osd crush rule create-erasure {crushrule _name} {erasure-code-profile}

eg:
ceph osd crush rule create-erasure h_ruleset h_profile

输出形如:

[root@node81 ~]# ceph osd crush rule create-erasure h_ruleset h_profile
created rule h_ruleset at 2
[root@node81 ~]# 
[root@node81 ~]# 
[root@node81 ~]# ceph osd crush rule ls  //查看所有的crushrule
eplicated_rule
data_ruleset
h_ruleset
[root@node81 ~]# ceph osd crush rule dump h_ruleset //查看某个crushrule
{
    "rule_id": 2,
    "rule_name": "h_ruleset",
    "ruleset": 2,
    "type": 3,
    "min_size": 3,
    "max_size": 3,
    "steps": [
        {
            "op": "set_chooseleaf_tries",
            "num": 10
        },
        {
            "op": "set_choose_tries",
            "num": 200
        },
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_indep",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

[root@node81 ~]# 


  1. 创建纠删码存储池

命令形式:
ceph osd pool create {pool_name} {pg_num} {pgp_num} erasure {erasure-code-profile} {crushrule_name}

eg:
ceph osd pool create h 256 256 erasure h_profile h_ruleset

输出形如:

[root@node81 ~]# ceph osd pool create h 256 256 erasure h_profile h_ruleset
pool 'h' created
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool ls
data
h
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool ls detail
pool 1 'data' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 38 flags hashpspool stripe_width 0
pool 2 'h' erasure size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 256 pgp_num 256 last_change 42 flags hashpspool stripe_width 131072

[root@node81 ~]# 


删除存储池

  • 删除存储池
    ceph osd pool rm {pool_name} {pool_name} --yes-i-really-really-mean-it
  • 删除crushrule
    ceph osd crush rule rm {crushrule_name}
  • 删除erasure-code-profile(仅适用纠删码存储池)
    ceph osd erasure-code-profile rm {profile_name}

eg:
ceph osd pool rm h h --yes-i-really-really-mean-it
ceph osd crush rule ls
ceph osd crush rule rm h_ruleset
ceph osd erasure-code-profile ls
ceph osd erasure-code-profile rm h_profile

输出形如:

[root@node81 ~]# ceph osd pool ls 
data
h
[root@node81 ~]# ceph osd pool rm h h --yes-i-really-really-mean-it //删除存储池
pool 'h' removed 
[root@node81 ~]# 
[root@node81 ~]# ceph osd pool ls  //查看存储池是否删除成功
data
[root@node81 ~]# 
[root@node81 ~]# ceph osd crush rule ls
replicated_rule
data_ruleset
h_ruleset
[root@node81 ~]# 
[root@node81 ~]# ceph osd crush rule rm h_ruleset  //删除crush rule
[root@node81 ~]# 
[root@node81 ~]# ceph osd crush rule ls //查看crush rule是否删除成功
replicated_rule
data_ruleset
[root@node81 ~]# 
[root@node81 ~]# ceph osd erasure-code-profile ls
default
h_profile
[root@node81 ~]# 
[root@node81 ~]# ceph osd erasure-code-profile rm h_profile //删除profile
[root@node81 ~]# 
[root@node81 ~]# ceph osd erasure-code-profile ls //查看profile是否删除成功
default
[root@node81 ~]# 


调整存储池属性

  • 统一命令规则
    ceph osd pool set {pool_name} {key} {value}
  • 特殊
    调整存储池配额:
    {max_objects}:数量配额
    {max_bytes}:容量配额
    ceph osd pool set-quota {pool_name} max_objects|max_bytes {value}
    eg:
    ceph osd pool set-quota h max_objects 1000
    ceph osd pool set-quota h max_bytes 1000M

Luminous新特性

ceph Luminous版本支持纠删码存储池作为文件系统数据池/对象存储数据池/块设备数据池,但在使用之前,必须将纠删码存储allow_ec_overwrites设置为true
命令:
ceph osd pool set {erasure_pool_name} allow_ec_overwrites true
New in Luminous: Erasure Coding for RBD and CephFS