Kubernetes - 基于 Grafana Loki 的日志系统

系统架构

Kubernetes Logs

默认情况下,容器日志会存储在 /var/log/pods 路径下。

$ ls /var/log/pods

kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff           
lab_job-employee-cronjob-1615078800-n2rxh_134ce637-c2a7-47b8-896f-348931125acb
kube-system_kube-proxy-lfzmx_90605182-ae56-4085-801e-fc4a83531945
...

每个文件夹对应一个 Pod,Pod 下级目录为容器名,再下级即为容器日志。

$ tree kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff/

kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff/
├── install-cni
│   └── 3.log -> /data/docker/containers/6accaa2d6890df8ca05d1f40aaa9b8da69ea0a00a8e4b07a0949cdc067843e37/6accaa2d6890df8ca05d1f40aaa9b8da69ea0a00a8e4b07a0949cdc067843e37-json.log
└── kube-flannel
    ├── 2.log -> /data/docker/containers/9e8eea717cc3efd0804900a53244a32286d9e04767f76d9c8a8cc3701c83ece5/9e8eea717cc3efd0804900a53244a32286d9e04767f76d9c8a8cc3701c83ece5-json.log
    └── 3.log -> /data/docker/containers/06389981d26cbe60328cd5a46af7b003c8d687d1c411704784aa12d4d82672b8/06389981d26cbe60328cd5a46af7b003c8d687d1c411704784aa12d4d82672b8-json.log

日志文件 kube-flannel/3.log 只是对 /data/docker/containers/***/***.log 文件的软链接,本质上还是 Docker 维护日志, k8s 对其引用而已。

$ tail -n 2 kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff/kube-flannel/3.log

{"log":"E0210 03:09:16.016563       1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: Get https://**.**.**.**:443/api/v1/nodes?resourceVersion=0: dial tcp **.**.**.**:443: connect: connection refused\n","stream":"stderr","time":"2021-02-10T03:09:16.016698205Z"}
{"log":"E0210 03:12:11.710762       1 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to watch *v1.Node: Get https://**.**.**.**:443/api/v1/nodes?resourceVersion=113277271\u0026timeoutSeconds=569\u0026watch=true: dial tcp **.**.**.**:443: connect: connection refused\n","stream":"stderr","time":"2021-02-10T03:12:11.711020233Z"}

日志是 JSON 格式的,每一行包含如下三个信息:

  • log:日志内容
  • stream:stderr(异常输出)、stdout(正常输出)
  • time:时间

注意:/data/docker/containers 并非 docker 默认的数据存储路径,是通过 /etc/docker/daemon.json 配置的。

Promtail & Loki

1、部署 Loki

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-config
  namespace: default
data:
  loki-config.yml: |
    auth_enabled: false

    server:
      http_listen_port: 3100

    ingester:
      lifecycler:
        address: 127.0.0.1
        ring:
          kvstore:
            store: inmemory
          replication_factor: 1
        final_sleep: 0s
      chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
      max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
      chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
      chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
      max_transfer_retries: 0     # Chunk transfers disabled

    schema_config:
      configs:
        - from: 2021-01-01
          store: boltdb-shipper
          object_store: filesystem
          schema: v11
          index:
            prefix: index_
            period: 24h

    storage_config:
      boltdb_shipper:
        active_index_directory: /tmp/loki/boltdb-shipper-active
        cache_location: /tmp/loki/boltdb-shipper-cache
        cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: filesystem
      filesystem:
        directory: /tmp/loki/chunks

    compactor:
      working_directory: /tmp/loki/boltdb-shipper-compactor
      shared_store: filesystem

    limits_config:
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      ingestion_rate_mb: 64

    chunk_store_config:
      max_look_back_period: 0s

    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s

    ruler:
      storage:
        type: local
        local:
          directory: /tmp/loki/rules
      rule_path: /tmp/loki/rules-temp
      alertmanager_url: http://localhost:9093
      ring:
        kvstore:
          store: inmemory
      enable_api: true

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: loki-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      nodeSelector:
        deviceType: cpu
      containers:
        - name: loki
          image: grafana/loki:2.0.0
          imagePullPolicy: Always
          args:
            - -config.file=/mnt/config/loki-config.yml
          ports:
            - containerPort: 3100
          volumeMounts:
            - mountPath: /tmp/loki
              name: storage-volume
            - mountPath: /mnt/config
              name: config-volume
          securityContext:
            runAsUser: 0
            runAsGroup: 0
      volumes:
        - name: storage-volume
          hostPath:
            path: /data/loki
        - name: config-volume
          configMap:
            name: loki-config
            items:
              - key: loki-config.yml
                path: loki-config.yml

---

kind: Service
apiVersion: v1
metadata:
  name: loki-service
  namespace: default
spec:
  ports:
    - port: 3100
      targetPort: 3100
  selector:
    app: loki

2、部署 Promtail

apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: default
data:
  promtail-config.yml: |
    server:
      http_listen_port: 9080
      grpc_listen_port: 0

    positions:
      filename: /tmp/positions.yaml

    # clients:
    # - url: http://loki-service:3100/loki/api/v1/push

    scrape_configs:
    - job_name: containers
      static_configs:
      - targets:
        - localhost
        labels:
          log_from: static_pods
          __path__: /var/log/pods/*/*/*.log
      pipeline_stages:
      - docker: {}
      - match:
          selector: '{log_from="static_pods"}'
          stages:
          - regex:
              source: filename
              expression: "(?:pods)/(?P<namespace>\\S+?)_(?P<pod>\\S+)-\\S+?-\\S+?_\\S+?/(?P<container>\\S+?)/"
          - labels:
              namespace:
              pod:
              container:
      - match:
          selector: '{namespace!~"(default|kube-system)"}'
          action: drop
          drop_counter_reason: no_use

---

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: promtail-deployment
  namespace: default
spec:
  selector:
    matchLabels:
      app: promtail
  template:
    metadata:
      labels:
        app: promtail
    spec:
      containers:
        - name: loki
          image: grafana/promtail:2.0.0
          imagePullPolicy: Always
          args:
            - -config.file=/mnt/config/promtail-config.yml
            - -client.url=http://loki-service:3100/loki/api/v1/push
            - -client.external-labels=hostname=$(NODE_NAME)
          ports:
            - containerPort: 9080
          volumeMounts:
            - mountPath: /data/docker/containers
              name: containers-volume
            - mountPath: /var/log/pods
              name: pods-volume
            - mountPath: /mnt/config
              name: config-volume
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          securityContext:
            runAsUser: 0
            runAsGroup: 0
      volumes:
        - name: containers-volume
          hostPath:
            path: /data/docker/containers
        - name: pods-volume
          hostPath:
            path: /var/log/pods
        - name: config-volume
          configMap:
            name: promtail-config
            items:
              - key: promtail-config.yml
                path: promtail-config.yml
      tolerations:
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule

注意:上述提到 /var/log/pods 下的日志只是对 /data/docker/containers 下日志的软链接,所以 Promtail 部署时需要同时挂载这两个目录。

Grafana Dashboard

1、添加 Datasource

2、配置日志可视化

通过 Promtail 定义的标签,在此处进行筛选,显示指定应用的日志。示例公式:{pod="lab-websocket-deployment"}

3、配置日志搜索框

添加变量

修改步骤 ② 中的公式为:{pod="lab-websocket-deployment"} |~ "(?i)$search" ,即可实现日志搜索功能。

4、配置日志类型统计

示例公式:sum(count_over_time(({pod="lab-websocket-deployment", stream="stdout"})[60s]))

5、最终效果

参考文档

推荐阅读更多精彩内容