Nautilus 版本PG 分布调优

0.108字数 152阅读 285

借鉴秦总的文章
https://mp.weixin.qq.com/s/WtozWOzNWlIJnWmcVUpYJQ

Nautius的分布调优,方法略有不同。

1:开始调优前,集群是10个1T ssd,两个hdd。pg分布情况是。

image.png

除了HDD的,ssd的最大最小差34个pg,还是极其不均衡的。

2:开始调优

nautilus改变了balancer module 此module是默认永远开启的,所以我们不需要设置开启此module

  $ module 'balancer' is already enabled (always-on)

   启动balancer服务
  $ ceph balancer on
   设置crush-compat
  $ ceph balancer mode crush-compat
   (Luminous的max_misplaced在Nautilus已经不存在了)
   看下状态
  $ ceph balancer status
    {
       "active": true, 
       "plans": [], 
       "mode": "crush-compat"
     }
  对集群所有pool进行权重调整计算
  $ ceph balancer eval
  current cluster score 0.081403 (lower is better)
  生产一个调优配置记住n版本必须制定pool
  $ ceph balancer optimize plan1 ssd
 开始执行调优计划
  $ ceph balancer eval plan1
   plan plan1 final score 0.010106 (lower is better)
 查看调优结果
  $ ceph balancer show plan1

         # starting osdmap epoch 19047                          
         # starting crush version 113                           
         # mode crush-compat                                    
         ceph osd crush weight-set reweight-compat 0 0.959158   
         ceph osd crush weight-set reweight-compat 1 0.886730   
         ceph osd crush weight-set reweight-compat 2 0.924439   
         ceph osd crush weight-set reweight-compat 3 0.830849   
         ceph osd crush weight-set reweight-compat 4 0.981423   
         ceph osd crush weight-set reweight-compat 5 0.454987   
         ceph osd crush weight-set reweight-compat 6 0.454987   
         ceph osd crush weight-set reweight-compat 7 0.968349   
         ceph osd crush weight-set reweight-compat 8 0.910090   
         ceph osd crush weight-set reweight-compat 9 0.963822   
         ceph osd crush weight-set reweight-compat 10 0.943591  
         ceph osd crush weight-set reweight-compat 11 0.941545  

 根据计划进行最终调优
 $ ceph balancer execute plan1
 调优结果不满意,回滚继续重新来一遍(我执行后发现效果不好,重新来一遍)
 $ ceph balancer reset

经过多次尝试得到完美结果

image.png
    关闭balancer模块
    $ ceph balancer off

3:后记

pg分布这个重要性就不说了,建议在集群上线前必须要进行这个操作。对集群的整体性能提升绝对是有效果的。