Contents

Observability part 3 - configuring OpenTelemetry components

Aleksandra Bielawa

15 Jul 2024.9 minutes read

Observability part 3 - configuring OpenTelemetry components webp image

Welcome to the third part of a series of articles about Observability.

In the previous article, we have successfully created a Kubernetes cluster with Kind and deployed a local try-me environment with Pulumi.

In this article, we will focus on the main part of our Meerkat project - configuring OpenTelemetry components: OpenTelemetry Collector and Instrumentation. We will also configure exporters in OpenTelemetry Collector to receive logs, traces, and metrics from backends: Loki, Tempo, and Mimir. We will explore dashboards and some cool features in Grafana - like links from logs to traces and more.

Components Overview

The source code for this project, with documentation, can be found on our GitHub repository.

If you have followed the first part of the tutorial to set up a local try-me environment, you should have all components up and running in the Kubernetes cluster. If you haven't gone through the first part of our tutorial, I highly encourage you to do so, as this article is a continuation. The pulumi up command installed the:

  • LGTM stack - Loki, Grafana, Tempo and Mimir installed as Helm Charts
  • OpenTelemetry Operator
  • OpenTelemetry Collector and Instrumentation - installed through the Kustomization
  • Demo application

We’ve already discovered the details of the configuration of the LGTM stack, let's explore the remaining blocks.

OpenTelemetry Operator is a Kubernetes Operator. It manages the OpenTelemetry Collector and Instrumentation Custom Resources. It requires a cert-manager to run. Both the OpenTelemetry Operator and cert-manager are installed as Helm Charts.

Kustomize allows customizing and managing Kubernetes configurations declaratively. All customization specifications are contained within a kustomization.yaml file. The OpenTelemetry Collector, Instrumentation, and Grafana dashboards are installed through Kustomize.

In Kubernetes, there are two ways to apply multiple resources at once - Helm Charts and Kustomization. Both are well supported by Pulumi. In our setup, some components like the LGTM stack, are installed with Helm Charts, however, for OTEL Collector we decided to go with Kustomization. The reason is simple: we want the OTEL Collector configuration to be deployable easily on any new or existing cluster. Kustomization is well supported by IaC tools like Pulumi or Terraform and GitOps tools like FluxCD or ArgoCD. You can also use it with kubectl (by using kubectl apply -k).
Just like we deploy Helm Charts with Pulumi, we also apply Kustomization using Pulumi code. For those interested in applying only the OTEL Collector configuration, check more details here.

As a demo application, we are using the Spring PetClinic Sample Application.

Kustomization

The kustomization.yaml file is located in the root folder of the meerkat repository you cloned. The Kustomization installs OpenTelemetry manifests from the otel folder and Grafana dashboards as ConfigMaps. We use configMapGenerator, which creates a ConfigMap directly from a JSON file.

kustomization.yaml file:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: observability
resources:
 - otel

configMapGenerator:
 - name: host-metrics-dashboard
    files:
 - ./try-me/observability/dashboards/host-metrics-dashboard.json
 - name: kubernetes-nodes-dashboard
    files:
 - ./try-me/observability/dashboards/kubernetes-nodes-dashboard.json
 - name: kubernetes-pods-dashboard
    files:
 - ./try-me/observability/dashboards/kubernetes-pods-dashboard.json
 - name: jvm-dashboard
    files:
 - ./try-me/observability/dashboards/jvm-dashboard.json
generatorOptions:
  disableNameSuffixHash: true
  labels:
    grafana_dashboard: "1"

Let's explore the content of Kustomization from the otel folder:

otel/kustomization.yaml file:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: observability
resources:
 - jvm-autoinstrumentation.yaml
 - jvm-collector.yaml
 - jvm-collector-targetallocator-clusterrole.yaml
 - jvm-collector-targetallocator-serviceaccount.yaml
 - jvm-collector-targetallocator-clusterrolebinding.yaml
 - jvm-collector-clusterrole.yaml
 - jvm-collector-clusterrolebinding.yaml
 - jvm-collector-serviceaccount.yaml

This Kustomization installs the OpenTelemetry Collector, Instrumentation, Cluster Roles, Cluster Role Bindings and a Service Account.

OpenTelemetry Collector

OpenTelemetry Collector is a single service that receives, processes, and exports telemetry data. Instead of running multiple data collectors, you utilize one service. You can adjust the deployment to suit your needs - the collector can run in Daemonset or Deployment mode.

In our project, we run the Collector as a Daemonset - this way, we can collect metrics from Kubernetes nodes, pods, and containers. You can expect an article dedicated to a detailed description of the collector configuration in this blog series. To receive logs, metrics, and traces from Loki, Mimir, and Tempo, the Collector's exporters section is configured this way:

Part of the otel/jvm-collector.yaml file:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: jvm-otel
spec:
  config:
    exporters:
      debug:
        verbosity: basic
      loki:
        endpoint: "http://loki-gateway.observability/loki/api/v1/push"
        default_labels_enabled:
          exporter: false
          job: true
      otlphttp/metrics:
        endpoint: "http://mimir-nginx.observability/otlp"
        tls:
          insecure: true
      otlphttp/traces:
        endpoint: "http://tempo-distributor:4318"
        tls:
          insecure: true

Instrumentation

Instrumentation in OpenTelemetry is the process of adding code to your applications to capture telemetry data. There are two types of instrumentation:

  • Auto-instrumentation is a feature that automatically collects telemetry data from your application without requiring manual code changes. It simplifies the process of monitoring and observing your applications. In Kubernetes, to start auto-instrumentation, patch the deployment with specific annotations. An agent running in a sidecar container adds instrumentation for the libraries you’re using. For example, requests and responses, database calls, and message queue calls can be instrumented.
    It’s a great option for DevOps or Ops Professionals who want to start with OpenTelemetry and Observability without touching the code.
  • Manual instrumentation is a code-based instrumentation, where you add code to your application to collect telemetry data using OpenTelemetry SDKs and APIs. This method gives you more control over what to track and monitor. However, it requires specific knowledge and experience, typically of a Developer Professional.

In our project, we are using auto-instrumentation. Details of auto-instrumentation sections:

  • exporter - configures collector endpoint
  • propagators - context propagation in terms of tracing is a concept that enables distributed tracing. Signals can be correlated with each other, regardless of where they are produced.
  • sampler - configures sampling strategy for traces to reduce overhead
  • java - defines configuration for Java auto-instrumentation. With the OTEL_RESOURCE_ATTRIBUTES variable, we are adding the same labels to each trace.

otel/jvm-autoinstrumentation.yaml file:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: jvm-autoinstrumentation
spec:
  exporter:
    endpoint: http://jvm-otel-collector.observability.svc.cluster.local:4317
  propagators:
 - tracecontext
 - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"
  java:
    env:
 - name: OTLP_METRICS_EXPORTER
        value: otlp
 - name: OTEL_LOGS_EXPORTER
        value: otlp
 - name: OTEL_TRACES_EXPORTER
        value: otlp
 - name: OTEL_RESOURCE_ATTRIBUTES
        value: service.name=petclinic,service.namespace=default

Demo application

Another component is the demo application, which has been already deployed with the pulumi up command ✨🪄 .

  1. The application should be running in the default namespace. Check it with:
    kubectl get pods -l app=petclinic
  2. You can also use port-forwarding to access the PetClinic application UI:
    kubectl port-forward services/petclinic 8888:8080
  3. In your web browser enter http://localhost:8888/ and explore the demo app.
  4. Patch the deployment with the annotation to start the automatic instrumentation:
    kubectl patch deployment petclinic -n default -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-java":"observability/jvm-autoinstrumentation"}}}} }'
  5. The setup is ready - you can now analyze and visualise your logs, traces, and metrics in Grafana.
  6. Retrieve password for Grafana:
    kubectl get secret --namespace observability grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
  7. Use port-forwarding to access Grafana:
    kubectl port-forward --namespace observability services/grafana 8000:80
  8. Open your web browser and enter https://softwaremill.com.
  9. On the sign-in page, enter admin for the username and paste the password you retrieved from the secret.

Explore Grafana

In the previous article, we configured data sources for Loki, Mimir, and Tempo, enabling really useful functionalities like linking logs to traces and traces to metrics. Let's explore those features.

Logs

Follow these steps to analyze your logs:

  1. Navigate to Grafana and open the Explore page.
  2. Choose Loki as a data source.
  3. In Label filters choose app = petclinic.
  4. Add the JSON parser expression.
  5. Click the Run query button in the top-right corner.

Refer to the GIF below to see these steps in action:

Each log line has an expandable section called Log details that can be opened by clicking on the log line. In the Log details view, explore the JSON logs from the demo app and the labels in the Fields section:

Some labels were added by the OpenTelemetry Collector's processor.
Notice the traceid and spanid labels at the bottom of the Fields section in the screenshot below.

Integrating distributed tracing information such as traceid and spanid as labels in Loki logs greatly improves the ability to trace and analyze log data. To achieve this for the Spring PetClinic app, we've added a property logging.pattern.level = trace_id=%mdc{trace_id} span_id=%mdc{span_id} trace_flags=%mdc{trace_flags} %5p in the src/main/resources/application.properties file.

Some Java logging libraries add traceid and spanid labels to logs by default. Check which logging libraries support this and find more details here.

Clicking the View Trace button will open a new tab, showing us the details about the trace directly related to the log.

Find more details about configuring Loki data source in the Grafana documentaion.

Traces

Now that we've covered logs, let's move on to describing traces with Tempo:

  1. In Grafana Explore page choose Tempo as a data source.
  2. Select TraceQL as a query type. Other very useful query type is a Search, where you can easily use drop-down menus and text fields to build your query.
  3. Let's search for a trace with the POST request method. In the query editor, provide the query: { .http.method = "POST"} and run the query.
  4. The list of traces will appear. Click on some Trace ID from the list.

The link will open a new Trace View tab, where you can explore the details of the traces and spans on the timeline. Next to each span, you can find the link icon 🔗. Click on the icon 🔗. A small window with Related logs and Sample query fields will appear. From here, you can navigate to related logs and relevant metrics.

Click on the Related Logs field for the selected span. A new Loki View tab opens with a query that searches for logs containing the trace ID value. You can find more details here.

Clicking a Sample Query field opens a new Mimir View tab. Our Sample Query, which presents the JVM CPU Utilization graph in the GIF below, is just an example. In the Mimir data source configuration, you can add more custom queries and links. See the details here.

Metrics

Choose Mimir as a data source. From various metrics choose the one you're interested in. In this example, we’re using the jvm_gc_memory_allocated metric. In Label filters choose app = petclinic and run the query. A graph with metric visualization will appear.

With JVM metrics you can create robust and insightful dashboards to monitor application performance and resource usage effectively. Check dashboards we prepared below.

Grafana dashboards

Dashboards are defined in JSON files located in the ./try-me/observability/dashboards/ directory. Here is the link.
Dashboards are installed via Kustomization from the ./kustomization.yaml file, which is applied when you run the pulumi up command.

We prepared several dashboards, though work is still in progress:

  • JVM dashboard - to monitor demo applications based on JVM metrics.
  • OpenTelemetry Collector HostMetrics dashboard - based on metrics about the host system from OpenTelemetry Host Metrics Receiver.
  • Kubernetes Nodes, Pods and Containers dashboard - based on metrics from kube-state-metrics service.

To give you a better idea of how these dashboards look, here is a screenshot of HostMetrics dashboard:

Check other parts of our Observability series:

Stay tuned for more articles from the Observability series.

Might interest you: Observability: we are Grafana technology partners

Reviewed by Paweł Maszota

Blog Comments powered by Disqus.