Dive into real-time data with Apache Kafka

21 Dec 2020. 6 minutes read

Apache Kafka is among the most often leveraged technologies in the IT and one of the most popular big data tools. Kafka is used by over 12,000 companies around the world enabling them to take up the challenge of efficient real-time data streaming when other broker technologies based have failed.

Kafka has a few applications, ranking from simple message passing, via inter-service communication in microservices architecture, to whole stream processing platform applications. The platform is a modern and slick solution that helps businesses reap the benefits of real-time data processing.

We use Kafka on a daily basis, but we also provide consulting & training services to our customers. We managed to encounter Kafka not only from usage perspective but also from open source side, when we worked on reactive-kafka library (later named akka-streams-kafka and alpakka-kafka).

"As a contributor to the akka-streams-kafka project, I wanted to understand how to optimally configure Kafka. Apart from that, I perceive Kafka as a very promising technology in projects. Customers often ask about it and we wanted to have experts on our team."
~ Andrzej Ludwikowski, Software Journeyman

Kafka is used heavily in the big data space as a reliable way to ingest and move large amounts of data very quickly. Allows us to build a modern and scalable ETL (extract, transform, load), CDC (change data capture and Big Data Ingest systems). Usage of Kafka continues to grow across multiple industry segments. The technology fits perfectly into the dynamically changing data-driven markets where huge volumes of generated records need to be processed as they occur.

Due to our extensive Kafka usage we have recently decided to join Confluent as Consulting Premium Partner. You can see our profile on the Confluent platform.

This is a good opportunity to sum up our Kafka-related actions from the last few years.

Publications

During last 5 years, we have blogged about Kafka over 50 times! Our most-read top5 is:

Using Kafka as a message queue – blog post written by Adam Warski in 2017. It is about our kmq project allowing for individual messages acknowledgments in Kafka. Adam described the design and presented the performance benchmarks to specify the solution overhead.
Event sourcing using Kafka next article from Adam Warski, this time about Event Sourcing (ES), Kafka Streams and KSQL. Can you implement ES just using Kafka, or do you need more complicated frameworks? Read to know the answer!
Does Kafka really guarantee the order of messages? – article from Kamil Charłampowicz about Kafka-related ordering guarantees and settings. Quite important knowledge since the default behaviour changed during one of the Kafka releases.
Kafka pitfalls – Q&A with a Kafka Architect – transcript from our Q&A session with Andrzej Ludwikowski. You may read or watch the video below.
What is Apache Kafka and what are Kafka use cases? – introduction to Kafka written by Maria Wachal. Article includes technological & business benefits of Kafka, most common mistakes and info about who actually uses Kafka.

Outside of top5, you can find other, interesting Kafka related blog posts:

Evaluating persistent, replicated message queues (2020 edition) – extensive comparison of various queues performance written by Adam Warski and Kasper Kondzielski. If you’d like to know how Kafka benchmarks compares to Pulsar or RabbitMQ, then it is a must-read!
Comparing Apache Kafka and Apache Pulsar – comparison by Jarosław Kijanowski about Apache Kafka and Apache Pulsar. What are the advantages of both solutions? Read before starting a new project.
Should I backup my Kafka cluster? And how? – blog post by Jarosław Kijanowski about performing Kafka backups. It appears that backup is only one side of the problem, but sometimes the restore part is missing.
7 mistakes when using Apache Kafka – article by me about most common Kafka-related mistakes. Read to know what you should avoid.
Good resources for learning about Apache Kafka – summary of interesting presentations and articles about Kafka by Michał Matłoka. There you can find resources for both beginners and advanced Kafka users.

What is more, last year we published an ebook, providing an introduction to Kafka.

Open source

Open source projects are a great way of developing quality products by working together with other developers. At SoftwareMill we believe in this and participate in various OSS communities. Here we explained why contributing to open source projects is truly needed and beneficial in the software world.

Apache Kafka has been open-source for over many years and will remain open-source forever. The value of the open-source software (OSS) community opens up for a mindset of innovation and true collaboration.

If you’re from the Java or Scala world, then you have probably heard about Alpakka project. Its aim is to integrate Akka Streams with various technologies – databases, message brokers, cloud services and others. One of its modules is related to Kafka. It combines Akka Streams with consuming from or producing to Kafka. However, did you know that Alpakka Kafka Connector originates at SoftwareMill? Initially it was developed with the name Reactive Kafka by our colleague Krzysztof Ciesielski. Later, over the years, when the Alpakka project was created, we have passed ownership of the code to Lightbend.

This is not the only Kafka related contribution we have. Kafka Message Queue (KMQ) is a project focused on the feature of individual messages acknowledgment when using Kafka (standard behaviour is to commit up to a given offset).

What is more, Lech Głowiak and Krzysztof Ciesielski have their contributions to a newer project – zio-kafka.

Training & Certifications

We believe that it is important to grow continuously. That is why we try to support our developers giving them space to develop their skills. During the last few years our engineers have finished 15 Confluent trainings – including Confluent Developer Training, Operations Training and Streams & KSQL Training. We managed to obtain 5 Confluent Certifications. And we continue to develop further by growing our Kafka experts Team continuously.

That commitment and hands-on commercial experience gained trust of our clients who we trained from the Apache Kafka ecosystem. Our Kafka training was rated as “very good” and helped them gain development skills and optimize the benefits from the software systems

Experience

Understanding your industry is one thing. Understanding the technology you are using is another. We are Kafka experts and consultants with battle-proven experience. We had the opportunity to develop various types of Kafka-based systems. If you’d like to learn more what we do, you may take a look at our portfolio, case studies and clients’ testimonials:

Conclusions

Apache Kafka has been an important part of SoftwareMill for many years. We hope that Confluent Plus partnership will allow us to encounter new challenges and to provide better services for current clients. If you’re considering Kafka introduction in your system then it is good to get to know potential pitfalls first. However, there is a big chance it will allow you to remediate your problems, achieve better performance and lower coupling between services.

Want to implement Apache Kafka and benefit from data in real-time and business intelligence?
We understand when and where real-time data streaming makes sense and how to avoid common pitfalls.
Get in touch and let’s discuss your ideas and needs!

Monthy tech news flash
Looking for quality tech content about Machine Learning, Big Data or Stream Processing? Join Data Times, a monthly dose of tech news curated by SoftwareMill's engineers.

Services overview

Why Choose Us

How we work

Portfolio

Technologies

About us

Join Us

Open Source

Scalar Conference

Services overview

How we work

Technologies

About us

Open Source

Why Choose Us

Why Choose Us

Why Choose Us

Why Choose Us

Join Us

Scalar Conference

Scalar Conference

Scalar Conference

Scalar Conference

Contents

Contents

Dive into real-time data with Apache Kafka

Publications

Open source

Training & Certifications

Experience

Conclusions