Observability part 2 - building a local try-me environment
This blog post is the second part of a series of articles about Observability.
If you haven’t heard about our Meerkat project yet - jump back to the article written by Adam, where you can read a short introduction to the project and this blog series as well as find out who can benefit from this tool.
Meerkat is an Observability Starter Kit that consists of several components. The project's goal is to provision a ready-to-deploy configuration of the OpenTelemetry Operator and Collector with instrumentation and observability tools like Grafana, Loki, Mimir, and Tempo. With these tools configured, you can easily implement basic observability standards into your projects.
In this article, I will cover how to build a local try-me environment. The next article will focus on installing OpenTelemetry tools and a demo Java application. We will also explore dashboards and telemetry signals in Grafana. If you already have an existing environment - jump over to the next article.
Installation tutorial
When trying new tools, it is nice to quickly install the whole setup locally and give it a try without diving deep into the documentation. That's why we prepared a local try-me environment configuration, which you can install with just a few commands.
The configuration is provisioned as code using Pulumi, which allows you to define infrastructure as code (IaC) in several programming languages. In this project, we are using JavaScript.
If you are a beginner to Pulumi - don’t worry, we got you - our next blog will cover the basics. But even for a beginner, installing the whole stack is very easy, I promise.
The source code for this project with documentation can be found on our GitHub repository.
To start with, install the prerequisites:
Kind - Kubernetes in Docker
The core of the local environment is a Kubernetes cluster managed by Kind, which is a tool for running local Kubernetes clusters using Docker container "nodes". We prepared scripts to quickly bootstrap a Kind cluster with a control-plane node and three worker nodes, in which you will install all the other components along with the demo application.
Let’s set up the Kubernetes cluster. Clone the Git repository:
git clone https://github.com/softwaremill/meerkat.git
Once you've cloned the repository, navigate into the meerkat
folder:
cd meerkat
If needed, adjust the configuration by modifying the try-me/kind/kind-config.yaml
file. Run the command to install the cluster:
try-me/kind/cluster_create.sh
To destroy the cluster run:
try-me/kind/cluster_delete.sh
Initialize Pulumi code
Now, let's look at the Pulumi code.
The try-me/observability
folder contains Pulumi code to deploy Observability components to the Kubernetes cluster. Make sure to connect to the correct Kubernetes context. By default, Pulumi will use a local kubeconfig
if available. After installing a Kind cluster, it should be your current context.
Inside the try-me/observability
folder:
- Install libraries. Run:
npm install
- Initialize new Pulumi stack:
pulumi stack init localstack --no-select
Now our cluster is ready and the Pulumi code is initialized. Let's deploy some cool stuff there!
- This command bootstraps the LGTM stack, OpenTelemetry Operator and the Kustomization. Deploy necessary components:
pulumi up localstack
While the stack is being installed, let's take a look at the configuration details.
LGTM stack
The LGTM stack from Grafana is an ideal choice to collect and store telemetry data emitted by running applications. It consists of Loki, Grafana, Tempo and Mimir, each designed to digest one of the three observability pillars: Mimir stores metrics, Tempo stores traces and Loki stores logs. The "G" in LGTM stands for Grafana, which is well known and needs no introduction.
Deployment modes
Loki, Mimir, and Tempo Helm Charts can be deployed in different modes.
- Distributed Mode: This deployment method involves running each backend component as a microservice, allowing independent scaling of each component. It is the preferred method for production deployments but is also the most complex. In our configuration, Mimir and Tempo are installed in this way.
- Single Binary (Monolithic) Mode: This deployment method runs all backend components within a single process as a single binary or Docker image. It is the simplest deployment mode and you can utilize it in a testing environment.
- Simple Scalable Mode: Loki can be deployed in a mode that is an intermediary between distributed and single binary modes. This mode is preferred if your log volume is up to a few terabytes and that’s the one we chose. Beyond that, you might want to use the distributed mode.
Here are code snippets of values for the Loki, Mimir, Tempo, and Grafana Helm Charts. Let’s go through the most important parameters from each file:
loki_values.yaml
file:
minio: # enable MinIO Helm Chart installation
enabled: true
read:
replicas: 1
write:
replicas: 2
backend:
replicas: 1
chunksCache:
enabled: false
loki:
auth_enabled: false
storage_config: # configure storage mode
tsdb_shipper:
active_index_directory: /var/loki
cache_location: /var/loki
cache_ttl: 24h
limits_config:
allow_structured_metadata: true
schemaConfig:
configs:
- from: "2024-04-18"
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h
Starting from the top, Loki Helm Chart has MinIO subchart installation enabled. In our setup, MinIO is a common storage for all observability signals.
MinIO is an open-source single object storage backend, highly scalable and compatible with AWS S3. It provides high-performance object storage, which satisfies the low-latency requirements of Loki, Tempo and Mimir. These tools use object storage as their storage layer, which could be a more cost-effective solution compared to EFK Stack (Elasticsearch, Fluent Bit, and Kibana) or AWS CloudWatch.
Our Pulumi code creates Kubernetes Jobs, which are responsible for setting up MinIO buckets for Loki, Tempo, and Mimir.
Back to the loki_values.yaml
file: Loki is deployed in a simple-scalable mode. Specify the number of replicas for read
, write
, and backend
pods.
Configure storage to use Single Store TSDB mode. Single Store refers to using object storage for persisting both Loki's index and data. TSDB mode is the recommended way for persisting data in Loki.
mimir_values.yaml
file:
minio:
enabled: false
mimir:
structuredConfig:
common:
storage:
backend: s3
s3:
bucket_name: mimir-metrics
endpoint: loki-minio.observability:9000
insecure: true
secret_access_key: "${MINIO_SECRET_KEY}"
access_key_id: "${MINIO_ACCESS_KEY_ID}"
blocks_storage:
s3:
bucket_name: mimir-tsdb
alertmanager_storage:
s3:
bucket_name: mimir-ruler
global:
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
ingester:
replicas: 1
query_scheduler:
enabled: false
querier:
replicas: 1
overrides_exporter:
enabled: false
Configure MinIO as the storage backend for Mimir. However, the installation of the MinIO Helm Chart here is disabled since we have already installed it as Loki's subchart.
Provide the MinIO endpoint and credentials Define endpoint configuration and bucket names in the S3 storage config block - MinIO is compatible with AWS S3. Credentials are retrieved from already existing secrets and passed as environment variables. Kubernetes Jobs handle creating buckets.
Mimir is deployed in distributed mode - specify the number of replicas of each component.
tempo_values.yaml
file:
traces: # enable otlp and http protocol receivers in distributor configuration
otlp:
grpc:
enabled: true
http:
enabled: true
minio: # disable MinIO Helm Chart installation because it is instaled with Loki Helm Chart
enabled: false
metricsGenerator: # enable metrics generator feature
enabled: true
remoteWriteUrl: "http://mimir-nginx.observability/api/v1/push"
extraArgs:
- "-config.expand-env=true"
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
storage: # configure storage backend for MinIO which is S3 compatible
trace:
backend: s3
s3:
bucket: 'tempo-traces'
endpoint: 'loki-minio.observability:9000'
insecure: true
secret_key: "${MINIO_ACCESS_KEY_SECRET}"
access_key: "${MINIO_ACCESS_KEY_ID}"
distributor:
replicas: 1
extraArgs:
- "-config.expand-env=true"
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
compactor:
replicas: 1
extraArgs:
- "-config.expand-env=true"
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
ingester:
extraArgs:
- "-config.expand-env=true"
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
querier:
replicas: 1
extraArgs:
- "-config.expand-env=true"
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
queryFrontend:
extraArgs:
- "-config.expand-env=true"
extraEnv:
- name: MINIO_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: loki-minio
key: rootUser
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: loki-minio
key: rootPassword
Starting from the top, otlp
and http
protocol receivers are enabled in the distributor configuration. The distributor receives spans and forwards them to the appropriate ingesters.
MinIO Helm Chart installation is disabled.
Next, the metrics generator feature is enabled. Metrics-generator is an optional Tempo feature that derives metrics from ingested traces, enhancing your monitoring capabilities. The following metrics in Prometheus format are sent to Mimir remote write endpoint. The following metrics are exported:
source: https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/#metrics
Configure MinIO as the storage backend for Tempo. To retrieve credentials for storage from already existing secrets, first enable referencing environment variables with -config.expand-env=true
parameter, then add extraEnv
block. It's important to note that within the Tempo Distributed Helm Chart, extra environment variables must be added individually to each component section, rather than globally. This means including both the -config.expand-env=true
parameter and the extraEnv
block for each component.
grafana_values.yaml
file:
sidecar:
datasources:
enabled: true
dashboards:
enabled: true
grafana.ini:
feature_toggles:
enable: 'traceToMetrics = true'
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: "http://loki-gateway"
version: 1
uid: loki
jsonData:
maxLines: 1000
derivedFields: # enable link from logs to traces
- datasourceUid: tempo
matcherType: label
matcherRegex: traceid
name: TraceID
url: '$${__value.raw}'
urlDisplayLabel: 'View Trace'
- name: Tempo
type: tempo
url: "http://tempo-query-frontend:3100"
access: proxy
basicAuth: false
uid: tempo
jsonData:
tracesToLogsV2: # enable link from traces to logs
datasourceUid: 'loki'
spanStartTimeShift: '-1s'
spanEndTimeShift: '1s'
filterByTraceID: false
filterBySpanID: false
customQuery: true
query: '{namespace="$${__span.tags.namespace}", pod="$${__span.tags.pod}"} |="$${__trace.traceId}"'
tracesToMetrics: # enable link from traces to metrics
datasourceUid: 'mimir'
spanStartTimeShift: '-1m'
spanEndTimeShift: '1m'
tags: [{ key: 'app'}]
queries:
- name: 'Sample query'
query: 'sum(rate(process_runtime_jvm_cpu_utilization{$$__tags}[5m]))'
- name: Mimir
type: prometheus
url: "http://mimir-nginx.observability.svc/prometheus"
uid: mimir
The last component of our stack is Grafana. In the Grafana configuration enable datasources
and dashboards
sidecar containers that inject configuration to Grafana from labeled ConfigMaps.
Configure Loki, Tempo and Mimir datasources URLs:
http://loki-gateway
for Lokihttp://tempo-query-frontend:3100
for Tempohttp://mimir-nginx.observability.svc/prometheus
for Mimir
For the Loki datasource, add configuration for derived fields to add a link from logs to traces. For the Tempo datasource, add configuration to navigate from traces to logs and metrics.
Summary
Now, your Kubernetes cluster with LGTM stack should be up and running.
In the next article, we will focus on the main part of our project - the configuration for OpenTelemetry:
- OpenTelemetry Operator
- OpenTelemetry Collector
- Instrumentation
I will show you how to configure metrics, logs and traces exporters in OpenTelemetry Collector with endpoints from Mimir, Loki and Tempo. We will also install the demo application and explore the Grafana dashboards and features like links between observability signals.
Check other parts of our Observability series:
Reviewed by Paweł Maszota