argocd告警管理之notification服务

封面

简介:

notification服务让你通过媒介快速的获得APP 同步的状态,第一时间去处理告警问题。想象下当团队中多人协作的时候,某一位同事通过argocd部署了一个服务,口头的传达可能略显效率低下,和其他传统CD工具如runner、jenkins.argocd这边也有个很多的告警支持。这次我们关注于如何启用argocd notification服务并通过邮件将告警内容发出去:
系列文章同步更新中:

argocd的secret管理之SealedSecret:在git里面加密敏感配置
argocd告警管理之notification服务:让你第一时间得到argocd app的状态信息
argocd蓝绿/金丝雀发布之rollout: 快速方便的启用基于gitops的蓝绿/金丝雀发布
gitops之argocd

一,安装Argocd:

Argocd notification 服务

1,安装

官方帮助文档

kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj-labs/argocd-notifications/stable/manifests/install.yaml

安装就如其他operator 一样只需要一个命令就可以一间安装,主要是启动了一个argocd-notification-controller去检测argocd的同步状态,并通过配置的媒介将同步消息送给指定的收件人。

二,配置邮件服务
apiVersion: v1
kind: Secret
metadata:
  name: argocd-notifications-secret  ##必须为这个名字,不然argocd-notification服务不能使用
  namespace: argocd
stringData:
  notifiers.yaml: |
    email:
      host: smtp.163.com
      port: 465
      from: <user>@163.com
      username: <user>@163.com
      password: <pass>
type: Opaque
三,配置告警模板
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  config.yaml: |
    subscriptions:
    - recipients:
      - email:<user>@163.com  ##所以告警这个邮件都会收到,运维人员
      trigger: on-sync-status-unknown
      trigger: on-sync-status-sync
      trigger: on-sync-status-syncfailed
      trigger: app-sync-running
      trigger: app-health-degraded
    triggers:
      - name: on-sync-status-unknown
        enabled: true
        condition: app.status.sync.status == 'Unknown'
        template: app-sync-status-unknown
        oncePer: app.status.sync.revision
      - name: on-sync-status-sync
        enabled: true
        condition: app.status.sync.status == 'Synced' and app.status.operationState.phase in ['Succeeded']
        template: app-sync-succeeded
      - name: on-sync-status-syncfailed
        enabled: true
        condition: app.status.operationState.phase in ['Error', 'Failed']
        template: app-sync-failed
      - name: app-sync-running
        enabled: true
        condition: app.status.operationState.phase in ['Running']
        template: app-sync-running
      - name: app-health-degraded
        enabled: true
        condition: app.status.health.status == 'Degraded'
        template: app-health-degraded
    templates:
      - name: app-sync-succeeded
        title: Application {{.app.metadata.name}} has been successfully synced.
        body: |
          {{if eq .context.notificationType "slack"}}:white_check_mark:{{end}} Application {{.app.metadata.name}} has been successfully synced at {{.app.status.operationState.finishedAt}}.
          Sync operation details are available at: https://<ArgoCD_IP>/applications/{{.app.metadata.name}}.
          ClusterName: {{.context.clusterName}}.
      - name: app-sync-status-unknown
        title: Application {{.app.metadata.name}} sync status is {{.app.status.sync.status}}
        body: |
          {{if eq .context.notificationType "slack"}}:exclamation:{{end}} Application {{.app.metadata.name}} sync is 'Unknown'.
          {{if ne .context.notificationType "slack"}}
          {{range $c := .app.status.conditions}}
               * {{$c.message}}
          {{end}}
          {{end}}
          Sync operation details are available at: https://<ArgoCD_IP>/applications/{{.app.metadata.name}}.
          ClusterName: {{.context.clusterName}}.
      - name: app-sync-failed
        title: Failed to sync application {{.app.metadata.name}}.
        body: |
          {{if eq .context.notificationType "slack"}}:exclamation:{{end}}  The sync operation of application {{.app.metadata.name}} has failed at {{.app.status.operationState.finishedAt}} with the following error: {{.app.status.operationState.message}}
          Sync operation details are available at: https://<ArgoCD_IP>/applications/{{.app.metadata.name}}?operation=true .
          ClusterName: {{.context.clusterName}}.
      - name: app-sync-running
        title: Start syncing application {{.app.metadata.name}}.
        body: |
          The sync operation of application {{.app.metadata.name}} has started at {{.app.status.operationState.startedAt}}.
          Sync operation details are available at: https://<ArgoCD_IP>/applications/{{.app.metadata.name}}?operation=true .
          ClusterName: {{.context.clusterName}}.
          App Version: {{.app.status.resources.summary.images | join ", " }}
      - name: app-health-degraded
        title: Application {{.app.metadata.name}} has degraded.
        body: |
          {{if eq .context.notificationType "slack"}}:exclamation:{{end}} Application {{.app.metadata.name}} has degraded.
          Application details: https://<ArgoCD_IP>/applications/{{.app.metadata.name}}.
          ClusterName: {{.context.clusterName}}.
四,为单独的APP定制收件人

在APP的注册yaml中添加相应的annotations

如图添加相应的收件人

如上图所示不光可以添加所以状态都发送,还可以定制:

on-sync-failed.recipients.argocd-notifications.argoproj.io: email:<sample-email>
五,收到告警邮件

The sync operation of application <app-name>has started at 2020-06-22T06:12:31Z.
Sync operation details are available at: []https://<ip addr>/applications/cfappsmonitoring?operation=true .
ClusterName: <no value>.