Leveraging Data Streaming in a Modern Data Architecture - Meetup by Confluent
Processing data in the on-demand way makes it constantly available to all kinds of applications. Modern businesses want analytics results in real time, this is why Big Data and stream processing gain popularity.
Apache Kafka is a big data tool used by over 28,000 companies worldwide. The number of such digital natives is also growing in our local community This is why we were excited to join a Lunch and Learn Session last week and dive deeper into Data in Motion. The meetup was organised by Confluent, the company behind Kafka.
On the agenda was:
• An introduction to the Kafka angle on the DataMesh concept
• From Kafka to Websockets, by our CTO Adam Warski
• Live-Demo: Bringing Data in Motion with a fully managed service for ApacheKafka
The various approaches to using Kafka - from Kafka to websockets
Kafka has a few applications, ranging from simple message passing, via inter-service communication in microservices architecture to complete stream processing platforms.
Our CTO, Adam Warski presented a technical perspective on incorporating Kafka into your software architecture, for a particular use-case. Here's the challenge: we've got a Kafka topic, where services publish messages to be delivered to browser-based clients through websockets. Sounds simple? It might, but we're faced with an increasing number of messages, as well as a growing count of websocket clients. How do we scale our solution? He shed light on the strengths and weaknesses of sending messages to websockets through Kafka, as well as limitations created by the chosen technology.
Check out the slides "From Kafka to Websockets"
"While Kafka’s partition model provides high resilience and scalability, sometimes to adapt to a specific use-case, data reshuffling just needs to occur. That’s what is needed here: if there’s more than one edge node accepting web sockets, we can’t control which one users to connect. Hence we need to relay the messages from a relatively small number partitions, to a large number of websockets; but also, we need to maintain an eventually consistent view, of which client is connected where. Luckily, using the tools that Kafka gives us, this is a relatively straightforward task.”
~ Adam Warski, CTO SoftwareMill
Feel the vibes of the Lunch and Learn session by Confluent
The event was a wonderful opportunity to finally meet in person with our friends from Confluent and with people from many different firms that adopt Kafka - who we knew online, but rarely had the chance to mingle and pick minds in person.
It is always inspiring to find out what kind of problems can be solved with technology and - in particular - what Kafka and Confluent Platform are used for. Plus it is always great to relax a little bit over a slice of good pizza and gossip about the IT scene around Europe ;)
~ Tomasz Szymański, CEO SoftwareMill
The importance of Data in Motion
Our developers have been carrying out projects using Kafka as well as working on the akka-streams-kafka open source project.
One of my takeaways was the benefit of a decentralised approach to data processing architectures. In such a setup each category of data has its owner (e.g. a division of an enterprise) who is responsible for the correctness of the data and broadcasting it to whomever might be interested in it. By adding Kafka as the data streaming platform, we end up with an architecture that is both scalable (thanks to the scalable nature of Kafka) and guarantees that the data is always up-to-date (due to each data owner being a single source of truth).
~ Jacek Kunicki, Senior Scala Engineer, SoftwareMill
Especially in the last decade, market demand shifted the focus: it's no longer that important how big your data is, it's much more important how fast you can analyse it and gain insights.
Processing data in a streaming fashion became more popular over the more "traditional" way of batch-processing big data sets available as a whole. Time became one of the main aspects and it needed first-class handling. We had to answer the question: “what's current?”. It might be the last 5 minutes or the last 2 hours. Or maybe the last 256 events?
IT technology responded and real-time stream processing has been gaining momentum in recent years. Read more >>
We believe that when implementing a stream data platform at enterprise-scale, the Confluent Platform truly extends the capabilities of Kafka - and our partnership helps our mutual clients succeed. SoftwareMill has one of the highest partner tier, "Preferred", at Confluent and by participating in such events, we want to further build the value of our partnership.
~ Marcin Głasek, Business Developer, SoftwareMill
Need a storage solution for streaming data? We are Apache Kafka experts with a long history of developing with Kafka and using Confluent Platform in production. Ready to help you leverage Confluent best practices and tools. Let’s talk.
And hope to see you soon on another exciting event for the Kafka Community!