Real-Time insights into customers online behaviour

How we helped one of the biggest retailers in Poland keep up with their customers


About the project

LPP S.A. is one of the biggest retailers in Europe, with more than 1700 stores in 20 countries. The company's goal is to be recognisable worldwide - therefore, its strategy includes the development of an online store chain.

Running online stores for five brands, targeting different groups of customers, causes an immense volume of data arriving at high velocity. LPP is aware of the challenges of the eCommerce market caused by various data sources and siloses.

The company approached SoftwareMill to help them streamline data pipelines and in result, provide customers with a unique experience. A tailored interaction, that is the key to customers' heart. The goal of the project was to enable custom-fit product recommendations in real-time, by tracking online customers behaviour with no delay. Such mechanism, of metrics and graphs aggregating orders and customers' behaviour in real-time, can significantly increase sales by enabling the right and timely marketing decisions. Companies that are unable to access data struggle to effectively measure marketing ROI.


  • 3 - 5 devs


  • 3 months

Team role

  • Senior DevOps Engineer
  • Apache Kafka Engineers


  • Retail
  • eCommerce


  • Apache Kafka
  • Kafka Connect
  • Kafka Streams
  • Apache Beam
  • Google Cloud Platform (Dataflow, BigQuery, CloudSQL)
  • Snowplow
  • Kubernetes
  • Strimzi


LPP S.A. collects online customers’ data from multiple sources. The two biggest inputs are orders stored in databases and customers' online behaviour recorded with Google Analytics. Relying on Google Analytics to track customers' online behaviour is not enough, as we have to deal with up to 72h of delay while the data can be processed. On an ordinary day, up to 600 events per second need to be processed.

Hence, big data can play a pivotal role in generating more sales for any eCommerce business. The value extracted from data allows retail businesses to reap the rewards of better customer experiences and bigger profits. If a customer spends time browsing products on the site, the retailer must understand their needs and provide proper recommendations before they leave.

The existing solution was based on batch processing moving data between different data siloses. Although this approach had lots of possibilities, it missed the opportunity to interact with the customer during their current shopping session. Processing data, arriving at high velocity, required us to use battle-proven tools for stream processing. Massive volumes demand not just huge storage capabilities but also the ability to scale during peak hours.

Due to the nature of the online retail business, our solutions have to work 24 hours a day and, therefore, must be deployed on a system featuring self-healing, as well as load balancing.

Finally, it's required to monitor the infrastructure health and performance and capture metrics exposed by the delivered applications to derive business value in real-time.

Technology used

  • #Apache Kafka
  • #Kafka Connect
  • #Kafka Streams
  • #Apache Beam
  • #Google Cloud Platform
  • #Dataflow
  • #BigQuery
  • #CloudSQL
  • #Snowplow
  • #Kubernetes
  • #Strimzi

Every project is an adventure. Take another one with us!

Let’s dive into project together


The first milestone was to move data from the databases, transform it and push it into Google BigQuery for further analysis. We achieved it with Kafka Connect and the Debezium connector to pull customers’ and ordering data from relational databases into Kafka.

The next step was transforming the data using Kafka Streams and pushing it to BigQuery via the WePay Kafka Connect connector. Google Analytics data was collected, processed, or enriched and finally pushed into BigQuery with Snowplow, an event data collection platform.

We built the infrastructure on the Google Cloud Platform leveraging Google Kubernetes Engine and the underlying Compute Engines and deployed Kafka cluster by Kubernetes with the Strimzi operator. The monitoring and alerting infrastructure has been set up with the Prometheus operator, Grafana and AlertManager.

The goal of the second milestone was to prepare data for a recommendation engine. Based on order history and buyers' paths, a built-in-house application proposes products to be displayed on the currently visited page. Furthermore, graphs and reports display the current state of orders and cumulative summaries of total orders, abandoned baskets, most-viewed products, etc. These reports no longer have to run expensive queries against Google BigQuery but are fed in real-time via stream processing applications built on Kafka Streams and Apache Beam.


During three months, we transformed the existing batch-based process into a data streaming platform built on mature and popular open-source tools. Since our customer already relied on the Google Cloud Platform, we took advantage of the available services, like BigQuery, Dataflow, CloudSQL, Memorystore, and many more. Seven streaming applications have been delivered, as well as DevOps scripts setting up Kubernetes and all necessary tools.

As a result, LPP S.A. could attach custom dashboards to display the sales volume and feed their recommendation engine cost-effectively without latency.

Thanks to the deployed solutions, LPP S.A. can implement an all-encompassing real-time marketing strategy from square one - from real-time product recommendation on-site to a multichannel customer-centric approach, also known as Customer360, in the nearest future.

Szymon Chojnacki
Big Data Architect at LPP S.A

"They [SoftwareMill] stuck to the three-month timeline, doing everything to deliver a ready solution. (...) thanks to the infrastructure, we’re able to compete with market leaders."

Got an idea?

We'll make it happen!