Services
- Services overview
  Partner with us to experience how the right technology choices can strengthen the core of your business, driving growth and excellence.
  Explore
  - Software Development
    Discovery Workshops
    Performant Backends
    Data-Intensive Frontends
    Legacy Systems Migration
  - Leadership
    Technology Partner
    Fractional CTO/Architect
    Architecture
    Software Audit & Consulting
    Cybersecurity Services
  - AI and ML
    MLOps
    Bespoke AI Chatbots
    Science as a Service
    Computer Vision
  - Operational
    Cloud Cost Reduction
    DevOps as a Service
    Platform Engineering
    Developer Experience
    Observability Services
  Technology Partnerships
  We are official partners of Snyk, Confluent and Grafana. We can advise you on best practices, technological strategies, and implementation processes.
  Let's talk
Industries
- Services overview
  Strenghten Your Business Core. Grow with the Right Technology.
  Discover
  - Entertainment
  - Telco
  Success Stories
  Find out how well-designed technology keeps our clients ahead of the curve.
  Explore
Technologies
- Technologies
  Experience the difference that true engineering can make.
  Discover
  - Backend
    Java
    Scala
    Rust
    Kotlin
    TypeScript
    Elixir
    Apache Struts
    Node.js
  - Frontend
    React
    Angular
    Vue
  - Cloud
    Kubernetes
    Grafana
    OpenTelemetry
    AWS
    GCP
    Azure
  - Data
    Apache Kafka
    Apache Flink
    Apache Spark
    Apache Cassandra
  - Security
    Snyk Partner
Company
- About us
  Our motto: "Engineering. Excellence. Trust."
  Learn
  - Technology Partnerships
    Confluent
    Snyk
    Grafana
    Redis
  SoftwareMill News
  Subscribe for monthly tech insights and receive the SoftwareMill Handbook as a welcome gift!
  Sign up
Technology Blog
Resources
- Technology Blog
  Discover a top destination for tech content! Join 50k monthly unique visitors who enjoy our blog.
  Read
  - Knowledge Base
    Ebooks
    Insights
    Success Stories
    Youtube Channel
    Tech Trends of the Decade
  - Our Tools
    Kafka Visualisation Tool
    RX Playground
    LLM Tool
    Open Source
  - Newsletters
    Tapir Tech Update
    Scala Times
    SoftwareMill News
  Scalar Conference
  Join the biggest Scala event in Central Europe. Est. 2014.
  Visit
Talk to us

Technology Partnerships

Partner with us to experience how the right technology choices can strengthen the core of your business, driving growth and excellence.

Success Stories

Strenghten Your Business Core. Grow with the Right Technology.

Experience the difference that true engineering can make.

SoftwareMill News

Our motto: "Engineering. Excellence. Trust."

Scalar Conference

Discover a top destination for tech content! Join 50k monthly unique visitors who enjoy our blog.

We are official partners of Snyk, Confluent and Grafana. We can advise you on best practices, technological strategies, and implementation processes.

We are official partners of Snyk, Confluent and Grafana. We can advise you on best practices, technological strategies, and implementation processes.

We are official partners of Snyk, Confluent and Grafana. We can advise you on best practices, technological strategies, and implementation processes.

We are official partners of Snyk, Confluent and Grafana. We can advise you on best practices, technological strategies, and implementation processes.

Apache Kafka
Apache Flink
Apache Spark
Apache Cassandra

Snyk Partner

Subscribe for monthly tech insights and receive the SoftwareMill Handbook as a welcome gift!

Join the biggest Scala event in Central Europe. Est. 2014.

Join the biggest Scala event in Central Europe. Est. 2014.

Join the biggest Scala event in Central Europe. Est. 2014.

Contents

Windowing data in Big Data Streams - Spark, Flink, Kafka, Akka

26 Oct 2016. 0 minutes read

More and more often it’s not enough to process a “big data” dataset offline; it’s a requirement to process data as it comes, in a streaming fashion (on-line). This brings a whole new set of challenges.

Rarely actions can be taken basing on single data elements. We need to aggregate the results somehow to get valuable insights. In case of data streams, this usually means making decisions basing on data received in a time window.

There’s a lot of choices to be made when partitioning data into windows: should they be sliding, or tumbling; should boundaries be determined by event-time or processing-time; should the grouping be by time, or sessions. And finally, how to handle late data points, it at all.

With so many variables, each tool tries to solve the problem differently. In the presentation, we’ll first define what are the possible characteristics of a data windowing mechanism, and then we’ll try to compare the approaches taken by Apache Kafka, Flink, Spark and Akka Streams, with code examples.

Slides

Blog Comments powered by Disqus.