系统架构
Kubernetes Logs
默认情况下,容器日志会存储在 /var/log/pods
路径下。
$ ls /var/log/pods
kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff
lab_job-employee-cronjob-1615078800-n2rxh_134ce637-c2a7-47b8-896f-348931125acb
kube-system_kube-proxy-lfzmx_90605182-ae56-4085-801e-fc4a83531945
...
每个文件夹对应一个 Pod,Pod 下级目录为容器名,再下级即为容器日志。
$ tree kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff/
kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff/
├── install-cni
│ └── 3.log -> /data/docker/containers/6accaa2d6890df8ca05d1f40aaa9b8da69ea0a00a8e4b07a0949cdc067843e37/6accaa2d6890df8ca05d1f40aaa9b8da69ea0a00a8e4b07a0949cdc067843e37-json.log
└── kube-flannel
├── 2.log -> /data/docker/containers/9e8eea717cc3efd0804900a53244a32286d9e04767f76d9c8a8cc3701c83ece5/9e8eea717cc3efd0804900a53244a32286d9e04767f76d9c8a8cc3701c83ece5-json.log
└── 3.log -> /data/docker/containers/06389981d26cbe60328cd5a46af7b003c8d687d1c411704784aa12d4d82672b8/06389981d26cbe60328cd5a46af7b003c8d687d1c411704784aa12d4d82672b8-json.log
日志文件 kube-flannel/3.log
只是对 /data/docker/containers/***/***.log
文件的软链接,本质上还是 Docker 维护日志, k8s 对其引用而已。
$ tail -n 2 kube-system_kube-flannel-ds-amd64-9x66j_28e71490-d614-4cd8-9ea7-af23cc7b9bff/kube-flannel/3.log
{"log":"E0210 03:09:16.016563 1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: Get https://**.**.**.**:443/api/v1/nodes?resourceVersion=0: dial tcp **.**.**.**:443: connect: connection refused\n","stream":"stderr","time":"2021-02-10T03:09:16.016698205Z"}
{"log":"E0210 03:12:11.710762 1 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to watch *v1.Node: Get https://**.**.**.**:443/api/v1/nodes?resourceVersion=113277271\u0026timeoutSeconds=569\u0026watch=true: dial tcp **.**.**.**:443: connect: connection refused\n","stream":"stderr","time":"2021-02-10T03:12:11.711020233Z"}
日志是 JSON 格式的,每一行包含如下三个信息:
-
log
:日志内容 -
stream
:stderr(异常输出)、stdout(正常输出) -
time
:时间
注意:/data/docker/containers
并非 docker 默认的数据存储路径,是通过 /etc/docker/daemon.json
配置的。
Promtail & Loki
1、部署 Loki
apiVersion: v1
kind: ConfigMap
metadata:
name: loki-config
namespace: default
data:
loki-config.yml: |
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
schema_config:
configs:
- from: 2021-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /tmp/loki/chunks
compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 64
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: /tmp/loki/rules
rule_path: /tmp/loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: loki-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: loki
template:
metadata:
labels:
app: loki
spec:
nodeSelector:
deviceType: cpu
containers:
- name: loki
image: grafana/loki:2.0.0
imagePullPolicy: Always
args:
- -config.file=/mnt/config/loki-config.yml
ports:
- containerPort: 3100
volumeMounts:
- mountPath: /tmp/loki
name: storage-volume
- mountPath: /mnt/config
name: config-volume
securityContext:
runAsUser: 0
runAsGroup: 0
volumes:
- name: storage-volume
hostPath:
path: /data/loki
- name: config-volume
configMap:
name: loki-config
items:
- key: loki-config.yml
path: loki-config.yml
---
kind: Service
apiVersion: v1
metadata:
name: loki-service
namespace: default
spec:
ports:
- port: 3100
targetPort: 3100
selector:
app: loki
2、部署 Promtail
apiVersion: v1
kind: ConfigMap
metadata:
name: promtail-config
namespace: default
data:
promtail-config.yml: |
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
# clients:
# - url: http://loki-service:3100/loki/api/v1/push
scrape_configs:
- job_name: containers
static_configs:
- targets:
- localhost
labels:
log_from: static_pods
__path__: /var/log/pods/*/*/*.log
pipeline_stages:
- docker: {}
- match:
selector: '{log_from="static_pods"}'
stages:
- regex:
source: filename
expression: "(?:pods)/(?P<namespace>\\S+?)_(?P<pod>\\S+)-\\S+?-\\S+?_\\S+?/(?P<container>\\S+?)/"
- labels:
namespace:
pod:
container:
- match:
selector: '{namespace!~"(default|kube-system)"}'
action: drop
drop_counter_reason: no_use
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: promtail-deployment
namespace: default
spec:
selector:
matchLabels:
app: promtail
template:
metadata:
labels:
app: promtail
spec:
containers:
- name: loki
image: grafana/promtail:2.0.0
imagePullPolicy: Always
args:
- -config.file=/mnt/config/promtail-config.yml
- -client.url=http://loki-service:3100/loki/api/v1/push
- -client.external-labels=hostname=$(NODE_NAME)
ports:
- containerPort: 9080
volumeMounts:
- mountPath: /data/docker/containers
name: containers-volume
- mountPath: /var/log/pods
name: pods-volume
- mountPath: /mnt/config
name: config-volume
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
runAsGroup: 0
volumes:
- name: containers-volume
hostPath:
path: /data/docker/containers
- name: pods-volume
hostPath:
path: /var/log/pods
- name: config-volume
configMap:
name: promtail-config
items:
- key: promtail-config.yml
path: promtail-config.yml
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
注意:上述提到 /var/log/pods
下的日志只是对 /data/docker/containers
下日志的软链接,所以 Promtail
部署时需要同时挂载这两个目录。
Grafana Dashboard
1、添加 Datasource
2、配置日志可视化
通过 Promtail
定义的标签,在此处进行筛选,显示指定应用的日志。示例公式:{pod="lab-websocket-deployment"}
3、配置日志搜索框
修改步骤 ② 中的公式为:{pod="lab-websocket-deployment"} |~ "(?i)$search"
,即可实现日志搜索功能。
4、配置日志类型统计
示例公式:sum(count_over_time(({pod="lab-websocket-deployment", stream="stdout"})[60s]))
5、最终效果