k8s网络系列学习笔记之二-----Service IPVS代理

前文我们知道Service有多种代理模式,因为IPVS模式的高性能和对大规模几区的支撑能力的优势,最新的版本已经把IPVS作为缺省的代理层模式,所以本文首先来学习一下Service IPVS代理模式。

ipvs vs iptables

IPVS模式在Kubernetes v1.8中引入,并在v1.9中进入了beta。 IPTABLES模式在v1.1中添加,并成为自v1.2以来的默认操作模式。 IPVS和IPTABLES都基于netfilter。 IPVS模式和IPTABLES模式之间的差异如下:

  • IPVS为大型集群提供了更好的可扩展性和性能。
  • IPVS支持比iptables更复杂的负载平衡算法(最小负载,最少连接,位置,加权等)。
  • IPVS支持服务器健康检查和连接重试等。

kube-proxy开启ipvs的前置条件

由于ipvs已经加入到了内核的主干,所以为kube-proxy开启ipvs的前提需要加载以下的内核模块:

ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4
在所有的Kubernetes节点node1和node2上执行以下脚本:

mkdir -p  /etc/sysconfig/modules/
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

上面脚本创建了的/etc/sysconfig/modules/ipvs.modules文件,保证在节点重启后能自动加载所需模块。 使用lsmod | grep -e ip_vs -e nf_conntrack_ipv4命令查看是否已经正确加载所需的内核模块。

接下来还需要确保各个节点上已经安装了ipset软件包:apt install ipset。

为了便于查看ipvs的代理规则,还需要安装一下管理工具ipvsadm: apt install ipvsadm。

如果以上前提条件如果不满足,则即使kube-proxy的配置开启了ipvs模式,也会退回到iptables模式。

kube-proxy开启ipvs

修改ConfigMap的kube-system/kube-proxy中的config.conf,mode: “ipvs”:

kubectl edit cm kube-proxy -n kube-system

之后重启各个节点上的kube-proxy pod:

kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'

确定一下ipvs的工作状态

等kube-proxy重启以后,运行kubectl apply -f nginx.yaml文件,创建一个pod与对应的service。

apiVersion: v1
kind: Pod
metadata:
 name: nginx
 labels:
   app: web
spec:
 hostNetwork: false
 subdomain:  nginx
 hostname: nginx-80
 containers:
   - name: nginx
     image: nginx
     ports:
     - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: web
  clusterIP: 10.96.0.11
  ports:
  - name: foo # Actually, no port is needed.
    port: 80
    targetPort: 80

检查一下ipvs的规则:

root@bogon2:~/k8s-yml-tests# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 172.25.39.6:6443             Masq    1      0          0         
TCP  10.96.0.10:53 rr
  -> 10.244.0.6:53                Masq    1      0          0         
  -> 10.244.0.7:53                Masq    1      0          0         
TCP  10.96.0.11:80 rr
  -> 10.244.1.4:80                Masq    1      0          0         
UDP  10.96.0.10:53 rr
  -> 10.244.0.6:53                Masq    1      0          0         
  -> 10.244.0.7:53                Masq    1      0          0 

由于我创建了一个Pod和一个Service

  • POD: ip为10.244.1.4,端口号为80
  • Service:clusterIP为10.96.0.11,端口号为80

从上面的ipvs格则中可以看到:

TCP  10.96.0.11:80 rr
  -> 10.244.1.4:80                Masq    1      0          0    

这条规则,实现了访问到10.96.0.11:80的TCP请求,转发到10.244.1.4:80中,如果有多个endpoints,则会负载均衡的在多个endpoints中访问。

ipvs对iptables的依赖

ipvs 会使用 iptables 进行包过滤、SNAT、masquared(伪装)。具体来说,ipvs 将使用ipset来存储需要DROP或masquared的流量的源或目标地址,以确保 iptables 规则的数量是恒定的,这样我们就不需要关心我们有多少服务了。

下表就是 ipvs 使用的 ipset 集合:

set name members usage
KUBE-CLUSTER-IP All service IP + port Mark-Masq for cases that masquerade-all=true or clusterCIDR specified
KUBE-LOOP-BACK All service IP + port + IP masquerade for solving hairpin purpose
KUBE-EXTERNAL-IP service external IP + port masquerade for packages to external IPs
KUBE-LOAD-BALANCER load balancer ingress IP + port masquerade for packages to load balancer type service
KUBE-LOAD-BALANCER-LOCAL LB ingress IP + port with externalTrafficPolicy=local accept packages to load balancer with externalTrafficPolicy=local
KUBE-LOAD-BALANCER-FW load balancer ingress IP + port with loadBalancerSourceRanges package filter for load balancer with loadBalancerSourceRanges specified
KUBE-LOAD-BALANCER-SOURCE-CIDR load balancer ingress IP + port + source CIDR package filter for load balancer with loadBalancerSourceRanges specified
KUBE-NODE-PORT-TCP nodeport type service TCP port masquerade for packets to nodePort(TCP)
KUBE-NODE-PORT-LOCAL-TCP nodeport type service TCP port with externalTrafficPolicy=local accept packages to nodeport service with externalTrafficPolicy=local
KUBE-NODE-PORT-UDP nodeport type service UDP port masquerade for packets to nodePort(UDP)
KUBE-NODE-PORT-LOCAL-UDP nodeport type service UDP port with externalTrafficPolicy=local accept packages to nodeport service with externalTrafficPolicy=local

在以下情况下,ipvs 将依赖于 iptables:

    1. kube-proxy 配置参数 –masquerade-all=true

如果 kube-proxy 配置了--masquerade-all=true参数,则 ipvs 将伪装所有访问 Service 的 Cluster IP 的流量,此时的行为和 iptables 是一致的,由 ipvs 添加的 iptables 规则如下:

# iptables -t nat -nL

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-MARK-MASQ (2 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
    1. 在 kube-proxy 启动时指定集群 CIDR

如果 kube-proxy 配置了--cluster-cidr=<cidr>参数,则 ipvs 会伪装所有访问 Service Cluster IP 的外部流量,其行为和 iptables 相同,假设 kube-proxy 提供的集群 CIDR 值为:10.244.16.0/24,那么 ipvs 添加的 iptables 规则应该如下所示:

# iptables -t nat -nL

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-MARK-MASQ (3 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  -- !10.244.16.0/24       0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-CLUSTER-IP dst,dst
    1. Load Balancer 类型的 Service

对于loadBalancer类型的服务,ipvs 将安装匹配 KUBE-LOAD-BALANCER 的 ipset 的 iptables 规则。特别当服务的 LoadBalancerSourceRanges 被指定或指定 externalTrafficPolicy=local 的时候,ipvs 将创建 ipset 集合KUBE-LOAD-BALANCER-LOCAL/KUBE-LOAD-BALANCER-FW/KUBE-LOAD-BALANCER-SOURCE-CIDR,并添加相应的 iptables 规则,如下所示规则:

# iptables -t nat -nL

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-FIREWALL (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER-SOURCE-CIDR dst,dst,src
KUBE-MARK-DROP  all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-LOAD-BALANCER (1 references)
target     prot opt source               destination
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER-FW dst,dst
RETURN     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER-LOCAL dst,dst
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-MARK-DROP (1 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (2 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-LOAD-BALANCER  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER dst,dst
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOAD-BALANCER dst,dst
    1. NodePort 类型的 Service

对于 NodePort 类型的服务,ipvs 将添加匹配KUBE-NODE-PORT-TCP/KUBE-NODE-PORT-UDP的 ipset 的iptables 规则。当指定externalTrafficPolicy=local时,ipvs 将创建 ipset 集KUBE-NODE-PORT-LOCAL-TC/KUBE-NODE-PORT-LOCAL-UDP并安装相应的 iptables 规则,如下所示:(假设服务使用 TCP 类型 nodePort)

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-MARK-MASQ (2 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-NODE-PORT (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-NODE-PORT-LOCAL-TCP dst
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-NODE-PORT  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-NODE-PORT-TCP dst
    1. 指定 externalIPs 的 Service

对于指定了externalIPs的 Service,ipvs 会安装匹配KUBE-EXTERNAL-IP ipset 集的 iptables 规则,假设我们有指定了 externalIPs 的 Service,则 iptables 规则应该如下所示:

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-MARK-MASQ (2 references)
target     prot opt source               destination
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-LOOP-BACK dst,dst,src

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination
KUBE-MARK-MASQ  all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-EXTERNAL-IP dst,dst
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-EXTERNAL-IP dst,dst PHYSDEV match ! --physdev-is-in ADDRTYPE match src-type !LOCAL
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            match-set KUBE-EXTERNAL-IP dst,dst A

访问流量分析

推荐阅读更多精彩内容