Want more than just batch processing? Apache Kafka - a tool reserved not only for big players
The level of complexity that surrounds us is greater than ever and every business is constantly being challenged by competitors and technology. Moreover, customers are embracing new technologies rapidly and become the biggest drivers of change. In the end, it’s mainly customer focus that dictates your strategy.
As a brand you must create personalized experiences by understanding customers needs and their journey in a data-driven way. But learnings about your customers cannot be translated into effective strategy, unless you can drive insights from the right data at the right time. Unfortunately, data can originate in a number of different sources and it’s difficult to access it when you need it and in a clean form.
The challenge remains: how to make decisions with a faster time-to-value?
In this post you’ll find out how combining a traditional dataset with Apache Kafka can serve as the single source of truth and in turn help you become customer focused.
How to put data into action
First, let’s uncover what data is the most significant for modern business analytics. A classic approach to data, which was a standard for years, is batch processing. It means that you copy database entries in periodic, duplicative batches and everything runs on a fixed schedule. This approach doesn’t scale anymore, because the business environment is increasingly urgent and you often cannot wait to act upon it.
Making informed decisions based on old data is not even an option. You can rely on solutions like Excel and CRM to quickly generate some useful metrics, but these tools are not enough to generate insights after capturing the most current changes in your business. In reality, most of what a business does can be described as streams of events and you should at least be able to process them incrementally as they happen to avoid bottlenecks.
Each software system generates events that when processed in real-time, give you competitive advantage and confidence in business.
Requests, errors, logs, security metrics, and so on, compiled with transactional data, play a pivotal role in generating revenue.To reliably get the data you want between systems, you need to implement real-time streaming data pipelines.
What are data streams and stream processing apps?
Let me explain further. Customers are interacting with your business on many platforms and channels. These footprints create volumes of transactional data in the form of various events. Page likes, recommendations, searches, social interactions, inside-app behaviour, all these combined with operational data and transactional data captured when the product is sold or purchased is informing multiple functions of your business.
In the ideal world relevant data is used for business analytics and the results are streamed into operational systems or business intelligence tools to improve the business processes and make the right decisions. There is no gap in your data infrastructure and you can easily tie interactions together to create a single digital profile of your customer.
But that’s the ideal world. In reality, many are far away from being truly data-driven and still search for a reliable way to ingest data for analysis and reporting. This is where solutions like Apache Kafka come in. They allow you to create data streaming platforms focused on a flow of real-time data streams.
When we take a look at IT systems, usually the same data is being used by multiple applications simultaneously. Applications can serve the data in different ways, they might be streaming apps, batch apps or micro-batch apps. They also consume data in different ways. In various use cases that apply here, in order to be real-time, you need a messaging system that immediately “tells” our application that some data originated in one place and has to be processed.
An event streaming application has two primary uses:
1.Stream processing: the ability to continuously react to, process, or transform data streams.
2. Data Integration: the ability to capture streams of events or data changes and to feed these to other data systems such as relational databases, data lakes or data warehouses.
Apache Kafka is a big data tool that has become an event streaming platform that combines messages, storage, and data processing far beyond pub-sub use cases it was initially designed for. Kafka is being used for building core operations software in a form of various streaming applications. Read more What is Apache Kafka >>>
How to leverage a streaming approach and big data tools like Kafka?
The big players set a good example on how to benefit from data-driven business analytics when designing software and operations. You’ve probably heard that stream processing is implemented in companies that process mountains of data (we also already wrote about the most well-known Kafka use cases). But being data-driven is not only the modus operandi of large organizations like Netflix or Uber.
According to Stackshare currently there are almost 900 companies that use Apache Kafka. For example, Netflix lately leveraged it to plan, determine spending, and account for all it’s content which is estimated to be at the value of $15billion.
But what about smaller businesses? Is there a place for Kafka in their infrastructure?
Let’s, for example, take a look at the eCommerce business. Due to the nature of the online retail, ecommerce platforms work 24 hours a day and therefore must be deployed on a system featuring self-healing, as well as load balancing. When running an online store you also need to monitor the infrastructure health and performance and capture metrics to derive business value in real-time. These data volumes demand not just effective storage capabilities, but also the ability to scale during peak hours. It’s an ideal use case of stream processing applications that are resilient and scalable (take a look at this case study).
It’s not only retail that produces and processes streams of events with transactional data. It’s also other types of businesses that can be seen this way when designing software to streamline their operations. For logistics it’s the transactional data that can occur as events when orders, shipments, and other changes of state in the logistic chain happen. For fintech companies the events may include currency values, stock prices, and for any digital business these might be website clicks, visits, searches, etc.
Netflix, Spotify, or Uber, may lead by example, but making the right data available at the right time is an objective that each modern organization should focus on.
How to design big data software?
How to design your data streaming and data integration software that leverages Kafka? Let’s start from a sketch and stick to the ecommerce example.
As I mentioned before, data is often not just consumed by one application, but by several different applications. Beside the need to implement messaging tools, you also need a solution that stores data and decouples each consumer and producer from one another. Apache Kafka is a good solution for both these requirements. As your events can be now processed in real time the system misses the interface integrated into the standard BI tool so you can act upon relevant data and learn from actionable metrics.
In the picture above Kafka acts as the core foundation of a modern integration layer of the big data software. According to Confluent’s surveys 66% of companies use it for stream processing, 60% use it for data integration and the most common use case for Kafka is data pipelines (81%) and microservices (51%).
What are the business benefits of data streams with Kafka?
For most modern businesses, their core data is continuous, not batch. It's non-stop events that you have to process and analyze continuously and in real-time. Introducing data streams strengthens your business by enabling:
- Real-time response - adopting stream processing enables a significant reduction of time between when an event is recorded and when the system and data application reacts to it. You can gain speed and confidence in the dynamically changing data-driven environment.
- Single source of truth - tools like Kafka enable your software to ingest and quickly move data in a reliable way between different applications. You can easily communicate loosely connected elements of your IT systems. If the entire application has full event tracking coverage, everything can be streamed and logged into the database, adding significant business value.
- Scalability - the shift towards event driven microservice architectures gives your application agility, not only from a development and operations point of view but especially from a business perspective.
The only sustainable strategy for any business is to learn things as fast as the world changes around you. Data streaming systems and tools like Kafka answer these modern data requirements.
As software developers and architects, we often implement Kafka because it comes with multiple tools that are highly attractive for data integration - a clear choice to handle distributing processing. For different organisations, not necessarily only the big players, the business value behind improved business intelligence enables data-driven decision making and customer focus.
Thinking of implementing big data into your software? Let's talk!
Get “Start with Apache Kafka eBook”
We’ve gathered our lessons learned while consulting clients and using Kafka in commercial projects.