Akka monitoring with Kamon part 1
Let's say we are developing an application with the Akka stack: Http, Clustering, Persistence. Everything is asynchronous, non-blocking and event-oriented. Nice, but how to monitor such an architecture? With good, old fashioned, thread-per-request approach - everything was much simpler. Fortunately, more and more (not only commercial) tools are becoming available to cover this area. One of them is Kamon. In this blogpost (the first of the series) I would like to show how to start with Kamon and get some insights about the behavior of Akka in a nice, readable way.
Kamon setup
We will start with Kamon configuration for the HTTP endpoints monitoring. The required dependencies are:
val kamonVersion = "0.6.6"
lazy val monitoringDependencies = Seq(
"io.kamon" %% "kamon-core" % kamonVersion,
"io.kamon" %% "kamon-jmx" % kamonVersion,
"io.kamon" %% "kamon-akka-2.4" % kamonVersion,
"io.kamon" %% "kamon-akka-http" % kamonVersion,
"io.kamon" %% "kamon-datadog" % kamonVersion
)
Because we are using Kamon modules that require AspectJ weaving, we need to start our application with the AspectJ Weaver. If you are launching your Akka application by running the main class, the simplest way to add weaver is by configuring the sbt-aspectj-runner
plugin in plugins.sbt:
resolvers ++= Seq( Resolver.bintrayIvyRepo("kamon-io", "sbt-plugins"))
addSbtPlugin("io.kamon" % "sbt-aspectj-runner" % "1.0.1")
Then we'll need some basic Kamon configuration, which could be placed in a separate kamon.conf
file.
kamon {
datadog {
time-units = "ms"
memory-units = "b"
}
metric {
tick-interval = 5 seconds
filters {
akka-actor {
includes = ["sandbox-actor-system/**"]
excludes = ["sandbox-actor-system/system**", "sandbox-actor-system/user**"]
}
}
}
kamon-mxbeans {
mbeans = [
],
identify-delay-interval-ms = 1000,
identify-interval-ms = 1000,
value-check-interval-ms = 1000
}
}
Most of the settings are well documented in reference.conf
of each artifact, so for more insight please refer to them. Default values are usually sensible, but for the sake of this example I just wanted to get only specific metrics with a smaller tick interval (explained later).
Next step is to actually start Kamon.
object Main extends App with MainModule {
Kamon.start()
def start(): Unit = {
//....
}
def terminate(): Unit = {
Try(Kamon.shutdown())
}
sys.addShutdownHook {
terminate()
}
start()
}
And, finally (with the sbt-aspectj-runner plugin), run the application as usual. Make sure that you can see the following lines in the console:
[run-main-0] INFO kamon.Kamon$Instance:35 - Initializing Kamon...
[run-main-0] [DatadogExtension(akka://kamon)] Starting the Kamon(Datadog) extension
[run-main-0] [JMXExtension(akka://kamon)] Starting the Kamon(JMX) extension
[run-main-0] [JMXMetricsExtension(akka://kamon)] Starting the Kamon(JMXMetrics) extension
You can also clone a complete demo from here, checkout part1
tag, run sbt run
to start the application and sbt "perfTest/gatling:testOnly com.softwaremill.sandbox.GetUserSimulation"
to simulate some load.
Kamon JMX module
Ok, enough of this boring configuration, I want to see some results!. The easiest way is to do this locally with some JMX MBeans reader. I recommend Java Mission Control (installed by default with JDK, e.g. in /usr/bin/jmc
). After choosing the relevant process, you can expand the Kamon tree:
Each HTTP status code has its own counter. We could also check some statistics about open connections and active requests. One of the nice Kamon features is tracing. However, we haven't configured it yet, so all interactions with our application are marked as UnnamedTrace
. JMC provides a way to visualise it:
with a new chart:
Now, run the Gatling scenario again and see the percentiles with response times.
As you can see, all the metrics are calculated in 5-second intervals (tick-interval
param).
Kamon Datadog module
JMC is good for a start, but it’s not a convenient way to monitor your application in the long term. The MBeans exposed by Kamon could also be processed by other tools like jmxtrans, which can populate many popular storage mechanisms for further processing/visualization. Some storages are supported out of the box, e.g. Kamon InfluxDB extension. Still, it seems like a lot of additional work to do. Fortunately, cloud visualization services become more popular nowadays. For me, Kamon Datadog integration is the easiest way to have something (nearly) production ready.
After a successful Datadog agent configuration (explained during the registration process) make sure that your host is available in the infrastructure:
and some basic metrics are available:
Now, restart the app and run the Gatling simulation to feed Datadog. The next step will be to create a custom time dashboard with e.g. response status count and time:
Note that the extension only populates the 95th percentile of response time (usually it is enough to detect some problems). The line on the chart is also much smoother, because of the default Datadog presentation accuracy, which is 30 seconds.
Summary
This is just an introduction to monitoring Akka with Kamon, and so far we have only shown how to measure the overall response time. This is a use case that surely has simpler solutions, and we took it on here just to get familiar with new tools and get some meaningful results. Stay tuned for the next part with Akka Persistence and Clustering where you will be able to see the full power of Kamon.
This blog post is a part 1 of the 3-part series, see part 2 | part 3.