1. Setting up Prometheus
In this first section we are going to set up our first parts of the Prometheus stack. Each trainee will have their own stack installed and configured.
Working mode (GitOps)
During the labs you will deploy and update several resources on your Kubernetes environment. ArgoCD will be your primary interface to interact with the cluster and will simplify the GitOps process for you.
Note
Argo CD is a part of the Argo Project and affiliated under the Cloud Native Computing Foundation (CNCF) . The project is just under three years old, completely open source, and primarily implemented in Go.
As the name suggests, Argo CD takes care of the continuous delivery aspect of CI/CD. The core of Argo CD consists of a Kubernetes controller, which continuously compares the live-state with the desired-state. The live-state is tapped from the Kubernetes API, and the desired-state is persisted in the form of manifests in YAML or JSON in a Git repository. Argo CD helps to point out deviations of the states, to display the deviations or to autonomously restore the desired state.
The configuration and deployments needed for you are already in a git repository. Navigate to your Gitea and look for a project called ‘prometheus-training-lab-setup’. The repository consists of two Helm Charts you will further use in this lab. In this first section we will no setup your Prometheus instance step by step.
We’re going to use two main Namespaces for the lab
<user>- where the user workload (Demo application, Webshell) is deployed<user>-monitoring- where we deploy our monitoring stack to
How do metrics end up in Prometheus?
Since Prometheus is a pull-based monitoring system, the Prometheus server maintains a set of targets to scrape. This set can be configured using the scrape_configs option in the Prometheus configuration file. The scrape_configs consist of a list of jobs defining the targets as well as additional parameters (path, port, authentication, etc.) which are required to scrape these targets. As we will be using the Prometheus Operator on Kubernetes, we will never actually touch this configuration file by ourselves. Instead, we rely on the abstractions provided by the Operator, which we will look at closer in the next section.
There are two basic types of targets that we can add to our Prometheus server:
Static targets
In this case, we define one or more targets statically. In order to make changes to the list, you need to change the configuration file. As the name implies, this way of defining targets is inflexible and not suited to monitor workloads inside of Kubernetes as these are highly dynamic.
Dynamic configuration
Besides the static target configuration, Prometheus provides many ways to dynamically add/remove targets. There are builtin service discovery mechanisms for cloud providers such as AWS, GCP, Hetzner, and many more. In addition, there are more versatile discovery mechanisms available which allow you to implement Prometheus in your environment (e.g. DNS service discovery or file service discovery). Most importantly, the Prometheus Operator makes it very easy to let Prometheus discover targets dynamically using the Kubernetes API.
Prometheus Operator
The Prometheus Operator is the preferred way of running Prometheus inside of a Kubernetes Cluster. In the following labs you will get to know its CustomResources in more detail, which are the following:
- Prometheus : Manage the Prometheus instances
- Alertmanager : Manage the Alertmanager instances
- ServiceMonitor : Generate Kubernetes service discovery scrape configuration based on Kubernetes service definitions
- PrometheusRule : Manage the Prometheus rules of your Prometheus
- AlertmanagerConfig : Add additional receivers and routes to your existing Alertmanager configuration
- PodMonitor : Generate Kubernetes service discovery scrape configuration based on Kubernetes pod definitions
- Probe : Manage Prometheus blackbox exporter targets
- ThanosRuler : Manage Thanos rulers
Service Discovery
When configuring Prometheus to scrape metrics from containers deployed in a Kubernetes Cluster it doesn’t really make sense to configure every single target (Pod) manually. That would be far too static and wouldn’t really work in a highly dynamic environment. A container platform is too dynamic. Pods can be scaled, the names are random and so on.
In fact, we tightly integrate Prometheus with Kubernetes and let Prometheus discover the targets, which need to be scraped, automatically via the Kubernetes API.
The tight integration between Prometheus and Kubernetes can be configured with the Kubernetes Service Discovery Config .
The way we instruct Prometheus to scrape metrics from an application running as a Pod is by creating a ServiceMonitor.
ServiceMonitors are Kubernetes custom resources, which look like this:
# just an example
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: example-web-python
name: example-web-python-monitor
spec:
endpoints:
- interval: 30s
port: http
scheme: http
path: /metrics
selector:
matchLabels:
prometheus-monitoring: 'true'
How does it work
The Prometheus Operator watches namespaces for ServiceMonitor custom resources. It then updates the Service Discovery configuration of the Prometheus server(s) accordingly.
The selector part in the ServiceMonitor defines which Kubernetes Services will be scraped. Here we are selecting the correct service by defining a selector on the label prometheus-monitoring: 'true'.
# servicemonitor.yaml
...
selector:
matchLabels:
prometheus-monitoring: 'true'
...
The corresponding Service needs to have this label set:
apiVersion: v1
kind: Service
metadata:
name: example-web-python
labels:
prometheus-monitoring: 'true'
...
The Prometheus Operator then determines all Endpoints(which are basically the IPs of the Pods) that belong to this Service using the Kubernetes API. The Endpoints are then dynamically added as targets to the Prometheus server(s).
The spec section in the ServiceMonitor resource allows further configuration on how to scrape the targets.
In our case Prometheus will scrape:
- Every 30 seconds
- Look for a port with the name
http(this must match the name in theServiceresource) - Scrape metrics from the path
/metricsusinghttp
Best practices
Use the common k8s labels https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
If possible, reduce the number of different ServiceMonitors for an application and thereby reduce the overall complexity.
- Use the same
matchLabelson differentServicesfor your application (e.g. Frontend Service, Backend Service, Database Service) - Also make sure the ports of different
Serviceshave the same name - Expose your metrics under the same path