k8s-scheduler

k8s-scheduler

api-server获取pending的且没有指定nodeName的pods,然后创建一个binding,表示将pod调度到哪个node。

调度算法

For given pod:

    +---------------------------------------------+
    |               Schedulable nodes:            |
    |                                             |
    | +--------+    +--------+      +--------+    |
    | | node 1 |    | node 2 |      | node 3 |    |
    | +--------+    +--------+      +--------+    |
    |                                             |
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+

    Pred(预选). filters: node 3 doesn't have enough resource

    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+
    |             remaining nodes:                |
    |   +--------+                 +--------+     |
    |   | node 1 |                 | node 2 |     |
    |   +--------+                 +--------+     |
    |                                             |
    +-------------------+-------------------------+
                        |
                        |
                        v
    +-------------------+-------------------------+

    Priority(优选) function:    node 1: p=2
                          node 2: p=5

    +-------------------+-------------------------+
                        |
                        |
                        v
            select max{node priority} = node 2

一种调度器每次只调度一个pod;一种调度器多副本情况,会选举一个leader。

  • 预选,过滤不符合条件的node;
  • 优选,根据算法对node打分,选择分数最高的node,如果多个node分数相同,则随机选择一个。

扩展

policy-config

通过--policy-config-file命令行,指定策略文件修改调度策略。

{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
    {"name" : "PodFitsHostPorts"},
    {"name" : "PodFitsResources"},
    {"name" : "NoDiskConflict"},
    {"name" : "NoVolumeZoneConflict"},
    {"name" : "MatchNodeSelector"},
    {"name" : "HostName"}
    ],
"priorities" : [
    {"name" : "LeastRequestedPriority", "weight" : 1},
    {"name" : "BalancedResourceAllocation", "weight" : 1},
    {"name" : "ServiceSpreadingPriority", "weight" : 1},
    {"name" : "EqualPriority", "weight" : 1}
    ],
"hardPodAffinitySymmetricWeight" : 10,
"alwaysCheckAllPredicates" : false
}

多调度器

根据kube-scheduler组件源代码,自己重新开发scheduler,工作量巨大,必须完全理解schduler源代码逻辑,详见Configure Multiple Schedulers

Schduler extender

利用scheduler extender特性,增强kube-schduler功能,非常优雅,推荐。大概逻辑是开发自己的Web Scheduler服务,并且提供filer、prioritize、bind等接口,详见Scheduler extender如何实现自己的k8s调度器

推荐阅读更多精彩内容