7.1 Tasks: Enable and configure Alertmanager

Task 7.1.1: Install Alertmanager and Thanosruler

Update your monitoring application (charts/user-monitoring/values.yaml) and update the alertmanager.enabled and ruler.enabled flag to true:

charts/user-monitoring/values.yaml:

user: <user> # Replace me
# prometheus
prometheus:
  enabled: true
# thanos-query
query:
  enabled: true
# grafana
grafana:
  enabled: true
  datasources:
  - name: prometheus
    access: proxy
    editable: false
    type: prometheus
    url: http://prometheus-operated:9090
# blackboxexporter
blackboxexporter:
  enabled: true
# pushgateway
pushgateway:
  enabled: true
# alertmanager
alertmanager:
  enabled: true
# thanos-ruler
ruler:
  enabled: true

Commit and push the changes.

Verify the installation and sync process in the ArgoCD UI . Or execute the following command:

kubectl -n $USER-monitoring get pod

This will install two Custom Resources (CR):

...
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  labels:
    app.kubernetes.io/name: alertmanager
  name: apps-monitoring
  namespace: <user>-monitoring
spec:
  alertmanagerConfigNamespaceSelector:
    matchNames:
    - <user>-monitoring
  alertmanagerConfigSelector:
  image: quay.io/prometheus/alertmanager:v0.25.0
  replicas: 2
  resources:
    requests:
      cpu: 4m
      memory: 40Mi
  storage:
    volumeClaimTemplate:
      spec:
        resources:
          requests:
            storage: 100Mi
---
apiVersion: monitoring.coreos.com/v1
kind: ThanosRuler
metadata:
  labels:
    app.kubernetes.io/name: thanos-ruler
  name: thanos-ruler
spec:
  image: quay.io/thanos/thanos:v0.28.1
  evaluationInterval: 10s
  queryEndpoints:
  - dnssrv+_http._tcp.thanos-query:10902
  ruleSelector: {}
  ruleNamespaceSelector:
    matchLabels:
      user: {{ .Values.user }}
  alertmanagersConfig:
    key: alertmanager-configs.yaml
    name: thanosruler-alertmanager-config

Task 7.1.3: Enable Alertmanager in Thanos Ruler

We connceted the thanos ruler to the Alertmanager instance with the following config.

---
apiVersion: v1
kind: Secret
metadata:
  name: thanosruler-alertmanager-config
stringData:
  alertmanager-configs.yaml: |-
    alertmanagers:
    - static_configs:
      - "dnssrv+_web._tcp.alertmanager-operated.<user>-monitoring.svc.cluster.local"
      api_version: v2

Task 7.1.4: Add Alertmanager as monitoring target in Prometheus

This is repetition: The Alertmanagers (alertmanager-operated.<user>-monitoring.svc:9093) also exposes metrics, which can be scraped by Prometheus.

The ServiceMonitor telling Prometheus where to scrape the metrics of Alertmanager was already created by enabling the alertmanager.

Check in the Prometheus user interface if the target can be scraped.

Task 7.1.5: Query an Alertmanager metric

After you add the Alertmanager metrics endpoint, you will have huge bunch of different values and identifiers.

Use Querier UI to get the list of all available metrics. {job="alertmanager-operated"}

Hints

Then you get all metrics as follows (shortened), and you can pick whatever you’re interested in.

# HELP alertmanager_alerts How many alerts by state.
# TYPE alertmanager_alerts gauge
alertmanager_alerts{state="active"} 0
alertmanager_alerts{state="suppressed"} 0
# HELP alertmanager_alerts_invalid_total The total number of received alerts that were invalid.
# TYPE alertmanager_alerts_invalid_total counter
alertmanager_alerts_invalid_total{version="v1"} 0
alertmanager_alerts_invalid_total{version="v2"} 0
# HELP alertmanager_alerts_received_total The total number of received alerts.
# TYPE alertmanager_alerts_received_total counter
alertmanager_alerts_received_total{status="firing",version="v1"} 0
alertmanager_alerts_received_total{status="firing",version="v2"} 0
alertmanager_alerts_received_total{status="resolved",version="v1"} 0
alertmanager_alerts_received_total{status="resolved",version="v2"} 0
...

Task 7.1.6: Alertmanager UI

Open the Alertmanager UI and explore its capabilities.