SoftwareMill

Kafka Visualization

Kafka
Apache Kafka is a distributed event streaming platform. Using the tool below you can simulate how data flows through a replicated Kafka topic, to gain a better understanding of the message processing model.
Choose the number of partitions - between which data will be evenly distributed. Experiment with various counts of brokers, turning them on and off, and seeing how the system adapts. Make sure to store data in replicas, so that they are not lost! Simulate load by increasing the consume interval. Finally, verify how offsets are commited, and see how this impacts redelivery when consumers or brokers are added/removed.
12345
12345
12345
ticks
Consumer 1
ticks
messages
A
Note that our simulation assumes default producer & broker configuration (for Kafka version <2.8.0) which accepts new messages, even if a majority of brokers are down. See this blog for more details.
SoftwareMill
Created by SoftwareMill
SoftwareMill offers architecture, development and consulting services for projects leveraging (or considering) Apache Kafka as well as tailored training programs.
Apache Kafka is a distributed streaming platform for building real-time data pipelines. It allows you to send in-order, persistent messages between applications.
To process data with Kafka, you need to understand a few unique concepts.
Kafka consists of servers and clients that communicate via a high-performance TCP network protocol. The communication looks as follows: messages are grouped into topics — a primary Kafka abstraction. The sender (producer) sends messages to a specific topic. The recipient (consumer) receives all messages from a specific topic from many senders. In Kafka, producers and consumers are fully decoupled to achieve the high scalability that Kafka is known for.
P
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Producers are those client applications that write messages to Kafka. Producers always publish to a category known as a topic. Any message from a given topic sent by any producer that you create will go to every consumer that is listening to that topic.
A topic logically and physically splits into partitions. A producer partitioner maps each message to a topic partition, and the producer sends a request to the leader of that partition. All writes to the partition must go through the partition leader. In case the leader fails, there may be replicas that will take over its role and acknowledge the write. The number of replicas depends on the configuration.
TL;DR
  • the Kafka producer is conceptually much simpler than the consumer since it has no need for group coordination;
  • producers configuration affects overall throughput, durability and delivery guarantees.
PRO TIP
Although the exactly-once semantics for message delivery is a Holy Grail in general, Kafka lets you achieve it in some specific cases, for more see here.
1
2
1
2
A
1
A Kafka broker, in other words a Kafka node, is a Kafka server that runs in a Kafka cluster. Kafka Broker receives messages from producers and stores them on disk and later allows consumers to fetch them by topic, partition and offset.
A Kafka cluster usually consists of 3+ Kafka brokers. Why? Best practice is to have 3 copies of your data. In case of failure, you still have 2 brokers replicating data, so you can potentially survive another failure without data loss.
If you don’t want to set it up by yourself, there are multiple companies offering Kafka as a Service.
A
1
2
When using Apache Kafka in your system’s architecture, you want to use data produced to Kafka topics in many scenarios. Here’s where the Kafka consumer concept comes to play. Consumers are those client applications that subscribe to one or multiple topics.
Kafka consumers help your application keep up with the rate of incoming messages, but you cannot rely on a single consumer reading and processing the data. Kafka consumers are typically part of a “consumer group” where each consumer in the group receives messages from a different subset of the partitions in the topic. You can scale the number of consumers, up to the overall number of partitions in a specified topic.
TL;DR
  • a new consumer group can be created for each application that needs all the messages from one or more Kafka topics;
  • new consumers can be added to an existing consumer group to scale data consumptions from a Kafka topic, so each additional consumer in a group will only get a subset of the messages.
PRO TIP
When creating a topic, you have to think about the future. The number of partitions affects the maximum number of consumers actively processing messages in a single consumer group.
Want to dive deeper into Apache Kafka? Go to SoftwareMill Tech Blog.
Reap the benefits of real-time data streaming and Apache Kafka with a trusted and experienced Confluent Partner.