Observability part 2 - building a local try-me environment

12 Jun 2024. 9 minutes read

Observability part 2 - building a local try-me environment featured image

This blog post is the second part of a series of articles about Observability.

If you haven’t heard about our Meerkat project yet - jump back to the article written by Adam, where you can read a short introduction to the project and this blog series as well as find out who can benefit from this tool.

Meerkat is an Observability Starter Kit that consists of several components. The project's goal is to provision a ready-to-deploy configuration of the OpenTelemetry Operator and Collector with instrumentation and observability tools like Grafana, Loki, Mimir, and Tempo. With these tools configured, you can easily implement basic observability standards into your projects.

In this article, I will cover how to build a local try-me environment. The next article will focus on installing OpenTelemetry tools and a demo Java application. We will also explore dashboards and telemetry signals in Grafana. If you already have an existing environment - jump over to the next article.

Installation tutorial

When trying new tools, it is nice to quickly install the whole setup locally and give it a try without diving deep into the documentation. That's why we prepared a local try-me environment configuration, which you can install with just a few commands.
The configuration is provisioned as code using Pulumi, which allows you to define infrastructure as code (IaC) in several programming languages. In this project, we are using JavaScript.

If you are a beginner to Pulumi - don’t worry, we got you - our next blog will cover the basics. But even for a beginner, installing the whole stack is very easy, I promise.

The source code for this project with documentation can be found on our GitHub repository.

To start with, install the prerequisites:

Kind - Kubernetes in Docker

The core of the local environment is a Kubernetes cluster managed by Kind, which is a tool for running local Kubernetes clusters using Docker container "nodes". We prepared scripts to quickly bootstrap a Kind cluster with a control-plane node and three worker nodes, in which you will install all the other components along with the demo application.

Let’s set up the Kubernetes cluster. Clone the Git repository:

git clone https://github.com/softwaremill/meerkat.git

Once you've cloned the repository, navigate into the meerkat folder:

cd meerkat

If needed, adjust the configuration by modifying the try-me/kind/kind-config.yaml file. Run the command to install the cluster:

try-me/kind/cluster_create.sh

To destroy the cluster run:

try-me/kind/cluster_delete.sh

Initialize Pulumi code

Now, let's look at the Pulumi code.

The try-me/observability folder contains Pulumi code to deploy Observability components to the Kubernetes cluster. Make sure to connect to the correct Kubernetes context. By default, Pulumi will use a local kubeconfig if available. After installing a Kind cluster, it should be your current context.
Inside the try-me/observability folder:

Install libraries. Run:
```
npm install
```
Initialize new Pulumi stack. When initializing the Pulumi stack, you will be asked to provide a passphrase to secure your stack's secrets. Make sure to remember this passphrase, as it will be required whenever you work with this stack in the future. Run:
```
pulumi stack init localstack --no-select
```
Now our cluster is ready and the Pulumi code is initialized. Let's deploy some cool stuff there!
This command bootstraps the LGTM stack, OpenTelemetry Operator and the Kustomization. Deploy necessary components:
```
pulumi up localstack
```

While the stack is being installed, let's take a look at the configuration details.

LGTM stack

The LGTM stack from Grafana is an ideal choice to collect and store telemetry data emitted by running applications. It consists of Loki, Grafana, Tempo and Mimir, each designed to digest one of the three observability pillars: Mimir stores metrics, Tempo stores traces and Loki stores logs. The "G" in LGTM stands for Grafana, which is well known and needs no introduction.

Deployment modes

Loki, Mimir, and Tempo Helm Charts can be deployed in different modes.

Distributed Mode: This deployment method involves running each backend component as a microservice, allowing independent scaling of each component. It is the preferred method for production deployments but is also the most complex. In our configuration, Mimir and Tempo are installed in this way.
Single Binary (Monolithic) Mode: This deployment method runs all backend components within a single process as a single binary or Docker image. It is the simplest deployment mode and you can utilize it in a testing environment.
Simple Scalable Mode: Loki can be deployed in a mode that is an intermediary between distributed and single binary modes. This mode is preferred if your log volume is up to a few terabytes and that’s the one we chose. Beyond that, you might want to use the distributed mode.

Here are code snippets of values for the Loki, Mimir, Tempo, and Grafana Helm Charts. Let’s go through the most important parameters from each file:

loki_values.yaml file:

minio:                      # enable MinIO Helm Chart installation
  enabled: true
read:
  replicas: 1
write:
  replicas: 2
backend:
  replicas: 1
chunksCache:
  enabled: false
loki:
  auth_enabled: false
  storage_config:           # configure storage mode
    tsdb_shipper:
      active_index_directory: /var/loki
      cache_location: /var/loki
      cache_ttl: 24h
  limits_config:
    allow_structured_metadata: true
  schemaConfig:
    configs:
      - from: "2024-04-18"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h

Starting from the top, Loki Helm Chart has MinIO subchart installation enabled. In our setup, MinIO is a common storage for all observability signals.

MinIO is an open-source single object storage backend, highly scalable and compatible with AWS S3. It provides high-performance object storage, which satisfies the low-latency requirements of Loki, Tempo and Mimir. These tools use object storage as their storage layer, which could be a more cost-effective solution compared to EFK Stack (Elasticsearch, Fluent Bit, and Kibana) or AWS CloudWatch.

Our Pulumi code creates Kubernetes Jobs, which are responsible for setting up MinIO buckets for Loki, Tempo, and Mimir.

Back to the loki_values.yaml file: Loki is deployed in a simple-scalable mode. Specify the number of replicas for read, write, and backend pods.
Configure storage to use Single Store TSDB mode. Single Store refers to using object storage for persisting both Loki's index and data. TSDB mode is the recommended way for persisting data in Loki.

mimir_values.yaml file:

minio:
  enabled: false
mimir: 
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          bucket_name: mimir-metrics
          endpoint: loki-minio.observability:9000
          insecure: true
          secret_access_key: "${MINIO_SECRET_KEY}"
          access_key_id: "${MINIO_ACCESS_KEY_ID}"
    blocks_storage:
      s3:
        bucket_name: mimir-tsdb
    alertmanager_storage:
      s3:
        bucket_name: mimir-ruler
global:
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword
ingester:
  replicas: 1
query_scheduler:
  enabled: false
querier:
  replicas: 1
overrides_exporter:
  enabled: false

Configure MinIO as the storage backend for Mimir. However, the installation of the MinIO Helm Chart here is disabled since we have already installed it as Loki's subchart.

Provide the MinIO endpoint and credentials Define endpoint configuration and bucket names in the S3 storage config block - MinIO is compatible with AWS S3. Credentials are retrieved from already existing secrets and passed as environment variables. Kubernetes Jobs handle creating buckets.
Mimir is deployed in distributed mode - specify the number of replicas of each component.

tempo_values.yaml file:

traces:             # enable otlp and http protocol receivers in distributor configuration
  otlp:
    grpc:
      enabled: true
    http:
      enabled: true
minio:              # disable MinIO Helm Chart installation because it is instaled with Loki Helm Chart
  enabled: false
metricsGenerator:   # enable metrics generator feature
  enabled: true
  remoteWriteUrl: "http://mimir-nginx.observability/api/v1/push"
  extraArgs:
  - "-config.expand-env=true"
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword
storage:            # configure storage backend for MinIO which is S3 compatible
  trace:
    backend: s3
    s3:
      bucket: 'tempo-traces'
      endpoint: 'loki-minio.observability:9000'
      insecure: true
      secret_key: "${MINIO_ACCESS_KEY_SECRET}"
      access_key: "${MINIO_ACCESS_KEY_ID}"
distributor:
  replicas: 1
  extraArgs:
  - "-config.expand-env=true"
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword
compactor:
  replicas: 1
  extraArgs:
  - "-config.expand-env=true"
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword
ingester:
  extraArgs:
  - "-config.expand-env=true"
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword
querier:
  replicas: 1
  extraArgs:
  - "-config.expand-env=true"
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword
queryFrontend:
  extraArgs:
    - "-config.expand-env=true"
  extraEnv:
  - name: MINIO_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootUser
  - name: MINIO_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: loki-minio
        key: rootPassword

Starting from the top, otlp and http protocol receivers are enabled in the distributor configuration. The distributor receives spans and forwards them to the appropriate ingesters.

MinIO Helm Chart installation is disabled.

Next, the metrics generator feature is enabled. Metrics-generator is an optional Tempo feature that derives metrics from ingested traces, enhancing your monitoring capabilities. The following metrics in Prometheus format are sent to Mimir remote write endpoint. The following metrics are exported:

source: https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/#metrics

Configure MinIO as the storage backend for Tempo. To retrieve credentials for storage from already existing secrets, first enable referencing environment variables with -config.expand-env=true parameter, then add extraEnv block. It's important to note that within the Tempo Distributed Helm Chart, extra environment variables must be added individually to each component section, rather than globally. This means including both the -config.expand-env=true parameter and the extraEnv block for each component.

grafana_values.yaml file:

sidecar:
  datasources:
    enabled: true
  dashboards:
    enabled: true
grafana.ini:
  feature_toggles:
    enable: 'traceToMetrics = true'
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      access: proxy
      url: "http://loki-gateway"
      version: 1
      uid: loki
      jsonData:
        maxLines: 1000
        derivedFields:              # enable link from logs to traces
          - datasourceUid: tempo
            matcherType: label
            matcherRegex: traceid
            name: TraceID
            url: '$${__value.raw}'
            urlDisplayLabel: 'View Trace'
    - name: Tempo
      type: tempo
      url: "http://tempo-query-frontend:3100"
      access: proxy
      basicAuth: false
      uid: tempo
      jsonData:
        tracesToLogsV2:             # enable link from traces to logs
          datasourceUid: 'loki'
          spanStartTimeShift: '-1s'
          spanEndTimeShift: '1s'
          filterByTraceID: false
          filterBySpanID: false
          customQuery: true
          query: '{namespace="$${__span.tags.namespace}", pod="$${__span.tags.pod}"} |="$${__trace.traceId}"'
        tracesToMetrics:            # enable link from traces to metrics
          datasourceUid: 'mimir'
          spanStartTimeShift: '-1m'
          spanEndTimeShift: '1m'
          tags: [{ key: 'app'}]
          queries:
            - name: 'Sample query'
              query: 'sum(rate(process_runtime_jvm_cpu_utilization{$$__tags}[5m]))'
    - name: Mimir
      type: prometheus
      url: "http://mimir-nginx.observability.svc/prometheus"
      uid: mimir

The last component of our stack is Grafana. In the Grafana configuration enable datasources and dashboards sidecar containers that inject configuration to Grafana from labeled ConfigMaps.
Configure Loki, Tempo and Mimir datasources URLs:

http://loki-gateway for Loki
http://tempo-query-frontend:3100 for Tempo
http://mimir-nginx.observability.svc/prometheus for Mimir

For the Loki datasource, add configuration for derived fields to add a link from logs to traces. For the Tempo datasource, add configuration to navigate from traces to logs and metrics.

Summary

Now, your Kubernetes cluster with LGTM stack should be up and running.

In the next article, we will focus on the main part of our project - the configuration for OpenTelemetry:

OpenTelemetry Operator
OpenTelemetry Collector
Instrumentation

I will show you how to configure metrics, logs and traces exporters in OpenTelemetry Collector with endpoints from Mimir, Loki and Tempo. We will also install the demo application and explore the Grafana dashboards and features like links between observability signals.

Check other parts of our Observability series:

Reviewed by Paweł Maszota

Services overview

Why Choose Us

How we work

Portfolio

Technologies

About us

Join Us

Open Source

Scalar Conference

Services overview

How we work

Technologies

About us

Open Source

Why Choose Us

Why Choose Us

Why Choose Us

Why Choose Us

Join Us

Scalar Conference

Scalar Conference

Scalar Conference

Scalar Conference

Contents

Contents

Observability part 2 - building a local try-me environment

Installation tutorial

Kind - Kubernetes in Docker

Initialize Pulumi code

LGTM stack

Deployment modes

source: https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/#metrics

Summary