7.1 Tasks: Enable and configure Alertmanager
Task 7.1.1: Install Alertmanager and Thanosruler
Update your monitoring application (charts/user-monitoring/values.yaml) and update the alertmanager.enabled and ruler.enabled flag to true:
charts/user-monitoring/values.yaml:
user: <user> # Replace me
# prometheus
prometheus:
enabled: true
# thanos-query
query:
enabled: true
# grafana
grafana:
enabled: true
datasources:
- name: prometheus
access: proxy
editable: false
type: prometheus
url: http://prometheus-operated:9090
# blackboxexporter
blackboxexporter:
enabled: true
# pushgateway
pushgateway:
enabled: true
# alertmanager
alertmanager:
enabled: true
# thanos-ruler
ruler:
enabled: true
Commit and push the changes.
Verify the installation and sync process in the ArgoCD UI . Or execute the following command:
kubectl -n $USER-monitoring get pod
This will install two Custom Resources (CR):
...
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
labels:
app.kubernetes.io/name: alertmanager
name: apps-monitoring
namespace: <user>-monitoring
spec:
alertmanagerConfigNamespaceSelector:
matchNames:
- <user>-monitoring
alertmanagerConfigSelector:
image: quay.io/prometheus/alertmanager:v0.25.0
replicas: 2
resources:
requests:
cpu: 4m
memory: 40Mi
storage:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 100Mi
---
apiVersion: monitoring.coreos.com/v1
kind: ThanosRuler
metadata:
labels:
app.kubernetes.io/name: thanos-ruler
name: thanos-ruler
spec:
image: quay.io/thanos/thanos:v0.28.1
evaluationInterval: 10s
queryEndpoints:
- dnssrv+_http._tcp.thanos-query:10902
ruleSelector: {}
ruleNamespaceSelector:
matchLabels:
user: {{ .Values.user }}
alertmanagersConfig:
key: alertmanager-configs.yaml
name: thanosruler-alertmanager-config
Task 7.1.3: Enable Alertmanager in Thanos Ruler
We connceted the thanos ruler to the Alertmanager instance with the following config.
---
apiVersion: v1
kind: Secret
metadata:
name: thanosruler-alertmanager-config
stringData:
alertmanager-configs.yaml: |-
alertmanagers:
- static_configs:
- "dnssrv+_web._tcp.alertmanager-operated.<user>-monitoring.svc.cluster.local"
api_version: v2
Task 7.1.4: Add Alertmanager as monitoring target in Prometheus
Note
This setup is only suitable for our lab environment. In real life, you must consider how to monitor your monitoring infrastructure: Having an Alertmanager instance as an Alertmanager AND as a target only in the same Prometheus is a bad idea!This is repetition: The Alertmanagers (alertmanager-operated.<user>-monitoring.svc:9093) also exposes metrics, which can be scraped by Prometheus.
The ServiceMonitor telling Prometheus where to scrape the metrics of Alertmanager was already created by enabling the alertmanager.
Check in the Prometheus user interface if the target can be scraped.
Task 7.1.5: Query an Alertmanager metric
After you add the Alertmanager metrics endpoint, you will have huge bunch of different values and identifiers.
Use Querier UI
to get the list of all available metrics. {job="alertmanager-operated"}
Hints
Then you get all metrics as follows (shortened), and you can pick whatever you’re interested in.
# HELP alertmanager_alerts How many alerts by state.
# TYPE alertmanager_alerts gauge
alertmanager_alerts{state="active"} 0
alertmanager_alerts{state="suppressed"} 0
# HELP alertmanager_alerts_invalid_total The total number of received alerts that were invalid.
# TYPE alertmanager_alerts_invalid_total counter
alertmanager_alerts_invalid_total{version="v1"} 0
alertmanager_alerts_invalid_total{version="v2"} 0
# HELP alertmanager_alerts_received_total The total number of received alerts.
# TYPE alertmanager_alerts_received_total counter
alertmanager_alerts_received_total{status="firing",version="v1"} 0
alertmanager_alerts_received_total{status="firing",version="v2"} 0
alertmanager_alerts_received_total{status="resolved",version="v1"} 0
alertmanager_alerts_received_total{status="resolved",version="v2"} 0
...
Task 7.1.6: Alertmanager UI
Open the Alertmanager UI and explore its capabilities.