8.3 Tasks: Troubleshoot Kubernetes Service Discovery

Task 8.3.1: Troubleshooting Kubernetes Service Discovery

We will now deploy an application with an error in the monitoring configration.

  • Deploy Loki in your namespace by adding the following files to your git repo in the user-demo/ folder:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: loki
  name: loki
spec:
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      containers:
      - image: mirror.gcr.io/grafana/loki:latest
        name: loki
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: loki
  name: loki
spec:
  ports:
    - name: http
      port: 3100
      protocol: TCP
      targetPort: 3100
  selector:
    app: loki
  type: NodePort
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: loki
  name: loki
spec:
  endpoints:
    - interval: 30s
      port: http
      scheme: http
      path: /metrics
  selector:
    matchLabels:
      prometheus-monitoring: 'true'

Add, commit and push changes to your git repository and let ArgoCD sync your app again.

  • When you visit the Prometheus user interface you will notice, that the Prometheus Server does not scrape metrics from Loki. Try to find out why.
Hints

The quickest way to do this is to follow the instructions in the info box above. So let’s first find out which of the following statements apply to us

  • The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration
    • Let’s check if Prometheus reads the configuration defined in the ServiceMonitor resource. To do so navigate to Prometheus configuration and search if loki appears in the scrape_configuration. You should find a job with the name serviceMonitor/loki/loki/0, therefore this should not be the issue in this case.
  • The Endpoint appears in the Prometheus configuration but not under targets.
    • The namespaceSelector in the ServiceMonitor does not match the namespace of your app
    • The label selector does not match the Service of your app
    • The port name does not match the Service of your app
  • The Endpoint appears as a Prometheus target, but no data gets scraped
    • The application does not provide metrics under the correct path and port
    • Networking issues
    • Authentication required, but not configured
Hints

The quickest way to do this is to follow the instructions in the info box above. So let’s first find out which of the following statements apply to us

  • The configuration defined in the ServiceMonitor does not appear in the Prometheus scrape configuration
    • Let’s check if Prometheus reads the configuration defined in the ServiceMonitor resource. To do so navigate to Prometheus configuration and search if loki appears in the scrape_configuration. You should find a job with the name serviceMonitor/loki/loki/0, therefore this should not be the issue in this case.
  • The Endpoint appears in the Prometheus configuration but not under targets.
    • Lets check if the application is running

      kubectl get pod
      

      You should see a loki Pod in the Running state:

      NAME                    READY   STATUS    RESTARTS   AGE
      loki-5846d87f4c-tthsr   1/1     Running   0          34m
      
    • Lets check if the application is exposing metrics

      PODNAME=$(kubectl get pod -l app=loki -o name)
      kubectl exec $PODNAME -it -- wget -O - localhost:3100/metrics
      ...
      
    • The application exposes metrics and Prometheus generated the configuration according to the defined servicemonitor. Let’s verify, if the ServiceMonitor matches the Service.

      kubectl get svc loki -o yaml
      
      apiVersion: v1
      kind: Service
      metadata:
        ...
        labels:
          app: loki
        name: loki
      spec:
        ...
        ports:
        - name: http
          ...
      

      We see that the Service has the port named http and the label app: loki set. Let’s check the ServiceMonitor

      kubectl get servicemonitor loki -o yaml
      
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      ...
      spec:
        endpoints:
        - interval: 30s
          ...
          port: http
          ...
        selector:
          matchLabels:
            prometheus-monitoring: "true"
      

      We see that the ServiceMonitor expect the port named http and a label prometheus-monitoring: "true" set. So the culprit is the missing label. Let’s adjust the service manifest, commit and push.

      apiVersion: v1
      kind: Service
      metadata:
        labels:
          app: loki
          prometheus-monitoring: "true"
        name: loki
      spec:
        ports:
          - name: http
            port: 3100
            protocol: TCP
            targetPort: 3100
        selector:
          app: loki
        type: NodePort
      

      Verify that the target now gets scraped in the Prometheus user interface .