How to run Apache Kafka in the Cloud?
Apache Kafka is a powerful technology used by many companies in various use cases and applications. Today, let's focus on Kafka cluster setup. When you want to leverage Kafka for event-driven microservices, brokers setup on bare VMs or on-premises can be overcomplicated, especially without proper experience. Let’s see what are the simpler methods available across different cloud offerings.
If you’d like to learn more what is Apache Kafka and what are Apache Kafka use cases read our other publication.
Before diving into the typical cloud specific offerings, let’s start with Kubernetes. Your company may be already leveraging it. If that is the case the reasonable option may be just to set up a Kafka Cluster on K8S using one of the available Kubernetes Operators. It may be the cheapest option, however, keep in mind that in this approach maintenance & monitoring is on your side, and this adds up to the final costs.
Strimzi is probably the most known Kafka operator. It not only allows running the brokers, but offers support for Kafka Connect, Mirror Maker, Exporter (for Metrics) and Kafka Bridge (HTTP API). It is even possible to manage topics automatically. We have used Strimzi in some of our projects and it worked quite well.
For an example of running Strimzi together with Kafka connect, take a look at the Running Kafka Connect Cluster as a Native Kubernetes Application blogpost.
As an alternative, Confluent has released its own Kubernetes Operator. It supports only commercial Confluent Platform. However, on the other side it supports additional products from the Kafka family than Strimzi does not - such as Schema Registry and ksqlDB.
There is one more operator, less known but promising. The Koperator was named formerly as Banzai Cloud Kafka Operator and is now part of Cisco. It allows to set up the Kafka cluster, together with Cruise Control and Prometheus. Open Source version has a limited set of features but in the enterprise variant it is possible to use Kafka Connect and ksqlDB.
Quite often we see that business clients do not want to maintain their own cluster. It is complicated, you need to employ people with proper experience and maintain the infrastructure. Instead, they choose one of the “Kafka as a service” types of products. Let’s walk through a few of the top cloud providers, and see what each of them has to offer.
Amazon Web Services (AWS)
Amazon Web Services offer a few products which can be used for data streaming or events passing. The most important ones are AWS SQS and Amazon Kinesis. When you dive into how Kinesis works underneath it looks quite similar to Kafka and some of its concepts. However, it is not a drop-in replacement. There are no compatible APIs. Fortunately, there is an alternative in the form of managed Kafka instances.
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Amazon MSK is a managed Kafka cluster. You pay for broker instances and storage (not for the ZooKeeper service). It is highly available with automatic multi-az replication. Data is encrypted at rest and in transit. SLA for this service is 99.9% uptime, but it covers only selected scenarios e.g. does not include failure caused by bugs in the Apache Kafka itself. Some people call MSK partially-managed. You need to define the sizes of Kafka server nodes, monitor performance and scale accordingly. There is a quite lengthy guide from AWS about best practices for sizing Kafka clusters on MSK.
Amazon MSK Serverless
MSK Serverless is an attempt to make MSK fully managed. You are charged hourly for the cluster, number of partitions, storage, and data transfers in and out. Like in the most serverless approaches for some use cases this will be better, for other ones more expensive. The 99.9% SLA does not apply to the MSK Serverless.
Amazon MSK Connect provides managed Kafka Connect services. It is charged hourly for connector usage, depending on the number of workers. You can use it together with MSK, but not only - other Apache Kafka clusters are compatible as well.
AWS offer does not include Kafka Streams and standard Schema Registry. However, it includes some alternatives. You can process MSK messages and stream data using Amazon Kinesis Data Analytics and leverage additionally AWS Glue Schema Registry. It integrates with a lot of other AWS products as well.
Google Cloud Platform (GCP)
GCP does not offer any managed Kafka. Its default products for messaging are Pub/Sub and Pub/Sub Lite. The Lite variant concept sounds similar to Kafka and what is interesting is that it offers a “Kafka API”.
Pub/Sub Lite with Apache Kafka-like API
It is possible to use the Kafka client library together with Google Pub/Sub Lite Kafka Shim Client.
Unfortunately, it has some limitations:
- does not support transactions,
- messages can be produced or consumed from a single topic at a time,
- it is not possible to send messages to a specific partition.
Underneath it uses gRPC to communicate with Pub/Sub Lite services.
Pricing depends on reserved capacity - throughput, storage and egress. Uptime depending on the topic type ranges from 99.5% to 99.95%. It is HIPAA and SOC2 compliant.
Apache Kafka on HDInsight architecture
Azure HDInsight is a product that allows running Apache Hadoop, Apache Spark and other big data systems. It is possible to leverage it for Apache Kafka as well. You are billed for the provisioned cluster (pricing depends on a number of nodes and their types). Data is encrypted at rest and the service has a 99.9% uptime. You have to monitor it using Azure monitor, quite similarly to Amazon MSK.
Azure Event Hubs with Apache Kafka API
That’s the approach similar to GCP Pub/Sub Lite. It is possible to use Event Hubs, leveraging Kafka Client libraries. There are some limitations, e.g. it does not support log compaction. As an alternative to Kafka Streams Azure proposes to use one of the stream processing services which can be just used with Event Hub. Pricing depends on provisioned capacity and chosen plan. Uptime is dependent on the plan and ranges from 99.95% to 99.99%.
Apart from cloud specific services, external companies offer running and managing Kafka and related products in the cloud of your choice, sometimes even on your own cloud account.
Confluent Cloud is the product that offers the biggest number of known Kafka-related services. It supports Connect (including Confluent connectors), Schema Registry, but also ksqlDB. It has a few custom solutions as well - like Stream designer or Stream Governance that can be a nice base for en event streaming platform.
Pricing depends on a few factors. There are 3 plans you can choose from:
Each of them has different features, prices and compliance certifications. The highest one supports ISO 27011, SOC 3, HIPAA and PCI, offers multi-az deployment with 99,99% uptime, data encryption and infinite storage. You can run it on AWS, Azure or GCP. Price is different for every plan, cloud provider, amount of data in & out, storage and others. We’re not choosing node sizes here, it is a fully-managed variant, a bit similar to MSK Serverless. However, keep in mind that because of MSK limitations and exclusions in the SLA, Confluent looks much better here.
Instaclustr is a company that offers various open source projects in a managed manner - e.g. Kafka, Kafka Connect and Schema Registry. It can be used with AWS, Azure and GCP but what is more with IBM Cloud and DigitalOcean. It is PCI-DSS, SOC2 and HIPAA compliant. What is interesting is that it offers 99,999% SLA and can be run on your own cloud account. Pricing? Depends on the plan (Developer or Production) and number of nodes & their sizes (both in Kafka and Connect case).
Aiven is quite similar to Instaclustr. It offers Kafka, Kafka Connect and Karaspace. Can be used with AWS, Azure, GCP, DigitalOcean and UpCloud. They offer 3 pricing plans: Startup, Business and Premium. Only premium can be run under your own cloud account. Plans have different max storage and under each of them different variants of CPU & RAM per VM are available.
Apache Kafka becomes a de-facto standard for event-driven and data streaming architectures hence every major cloud provider offers Kafka-related services. They offer different pricing models, SLA terms and features. Some of the projects due to licensing limitations can’t be offered as a SaaS model from anyone apart from Confluent. That is why alternatives are being created or integrations with other similar already existing services.
What to choose for your next project? That is not an easy question. It depends! It depends on SLA, pricing, security requirements - what certifications are needed, whether you need a 100% automatically managed environment or if a bit of maintenance is ok for you. It depends on the status of the project and what the performance requirements are. There are various factors that you need to consider before making a decision. Good luck!
Do you want to start using Apache Kafka in your company? Talk to us!
Reviewed by: Grzegorz Kocur and Michał Ostruszka