Contents

Comparing Java Streams with Jox Flows

Comparing Java Streams with Jox Flows webp image

Both Java Streams (introduced way back in Java 8) and Jox's Flows (part of an open-source virtual-thread-based safe concurrency & streaming library) allow transforming data in a streaming fashion.

From such a shallow description, it might seem that both serve the same purpose, and can be used to solve the same problems, which would make one of these implementations unnecessary (and since Java Streams are in the standard library, that would probably be Jox!).

What is the difference, then? It's a question I often face when I give talks on Jox and virtual-thread-based streaming; hence, let's examine these differences in more detail. As you'll find out, when we look more closely, even though the APIs look similar, both approaches have distinct use-cases and both might be used in a single codebase, for different purposes.

Java Streams

Let's first examine the characteristics of Java Streams. Here's a simple example of an initially infinite stream, which is then transformed using high-level operations, before being consumed by collecting the results into a list:

List<Integer> result = Stream
  .iterate(0, i -> i + 1)
  .map(i -> i * 2)
  .filter(i -> i % 3 == 2)
  .limit(10)
  .collect(Collectors.toList());

Even though the initial stream is infinite, the example above doesn't try to generate all natural numbers, as Java streams are lazy: until a terminal operation is invoked (such as .collect), nothing happens—only a blueprint of the transformation is created. The execution of the pipeline is highly optimized, producing and transforming only the data that will ultimately be needed, with a rich set of intermediate and terminal operations.

The above illustrates the most common use case for Java Streams: transforming collections, using a functional programming-inspired API, with lambdas supplied to higher-order functions such as map. But there's more!

With the arrival of Stream Gatherers in Java 24, Java Streams have become extensible. The versatility and flexibility of Java Streams has increased significantly, as it is now possible to implement custom intermediate operations without much effort.

For example, there are a couple of built-in gatherers, such as mapConcurrent (leveraging virtual threads), or windowFixed and windowSliding for data windowing; third-party libraries provide additional ones, as exemplified by more-gatherers. This gives you functionalities such as distinctBy, zip, or sampling. Equipped with such a rich set of operators, you can get quite far!

Stream
  .iterate(0, i -> i + 1)
  .gather(Gatherers.mapConcurrent(4, i -> i * 2))
  .gather(Gatherers.windowFixed(10))
  .findFirst();

Stream gatherers expand the scope of Java Streams to handle more use-cases that require infinite and side-effecting streams. For example, you could perform HTTP requests as part of Gatherers.mapConcurrent, limiting the concurrency so that the target server is not overloaded, and sourcing the data from an infinite stream created using Stream.generate.

Limitations of Java Streams

As you might suspect, there's a "but …"! While the design of Java Streams provides a very approachable API, which is easy to learn, and gives you a lot of power, there are limits to what can be done with the abstraction.

The limitations come from the fact that Java Streams are pull-based. This means that the consumer is in control of when new elements should be produced (if at all). It works great when transforming collections, or when working with infinite streams, but where all transformations are performed in synchrony: the consumer becomes the "conductor" of how the stream elements are produced. That's true for all the operators that we have seen so far.

Where this design falls short is in situations where the producer should be in control, or where asynchrony is needed. Examples include:

  • time-based operations
  • merging streams that run concurrently
  • buffering (when the producer is faster than the consumer)

This is where you need a push-based design. Coupled with backpressure (so that memory usage is always bounded), you get RxJava flows, ReactiveX observables, Pekko Streams or … Jox Flows—but more on that later.

As a side note, if the pull-based design is limited, why was it chosen for Java? That would probably be best answered by Java architects. However, such a choice makes sense if you take into account the context in which the solution was created: to provide a high-level, "functional programming"-like API for transforming Java collections.

Indeed, this design is consistent with Java's collections and is its natural extension. There, the core abstraction is the Iterator, which is pull-based (it's you—the consumer—that is calling .next() in a loop). The target execution model of Java Streams is synchronous & mostly single-threaded, where a pull-based design is simpler and provides straightforward, well-known error handling based on try-catch.

For the use-cases for which Java Streams have been designed, pull-based makes most sense. Even if these use cases are now somewhat extended with stream gatherers, you still need to keep in mind their fundamental limitations.

Enter Jox Flows

Jox Flows describe asynchronous transformation pipelines. At their core, they have a different design from Java Streams, even though at the surface the API might seem similar.

Indeed, we can rewrite the two examples of Java Streams into Jox Flows:

List<Integer> result = Flows
  .iterate(0, i -> i + 1)
  .map(i -> i * 2)
  .filter(i -> i % 3 == 2)
  .take(10)
  .runToList();

And:

Flows
  .iterate(0, i -> i + 1)
  .mapPar(4, i -> i * 2)
  .grouped(10)
  .take(1)
  .runToList();

Same as Java Streams, Jox Flows are lazy: you first describe the streaming transformation, and the actual processing happens only when you run it (using one of the available stream-discharging methods).

But of course, having a separate library that just duplicates what's available out-of-the-box in Java wouldn't make any sense. That's where the different design of Jox comes in. It's a push-based system, where the producer is in control of when elements are emitted to the consumers.

This has quite a lot of consequences. For example, we can now implement time-based windowing:

eventsFlow
  .groupedWithin(100, Duration.ofSeconds(5))
  .mapPar(5, batch -> sendHttpRequest(batch))
  .runDrain();

This might be useful in telemetry or logging pipelines, where events are continuously generated and need to be batched for more efficient downstream processing, such as HTTP uploads or storage writes. Grouping by time ensures responsiveness even when the event rate is low, avoiding latency from waiting for large batches to fill.

Or we can merge two flows; here, both event flows are run concurrently in the background. The resulting flow will contain the elements of both flows as they come in (indeterministically interleaved). Because of the push-based nature, not only might producers dictate when to emit elements, but intermediate operations can do the same. Here, the merge transformation is the "conductor": it emits an element when one of the other flows has emitted an element:

eventsFlow1
  .merge(eventsFlow2)
  .map(...)

Merges are typical in event-driven systems, where multiple asynchronous sources, such as user interactions and background events, need to be processed together in real-time. With a merge-based approach, the system can react promptly to any incoming data, regardless of its origin.

Another operator that leverages asynchrony is .buffer, which is useful in case there's a faster upstream producer, and an occasionally slower consumer.

Speaking of fast producers and slow consumers, a pure push-based design would potentially run into problems with unconstrained memory usage. That's why all buffers in Jox are bounded, and if the consumers can't keep up, they eventually fill up, and the producers are blocked. That backpressure is essential to ensure resource safety.

Summary

Java Streams and Jox Flows are distinct in the same way as Java Streams and Reactive Streams implementations are distinct. Java Streams haven't rendered Reactive Streams obsolete, as they serve a different purpose. What Jox Flows brings you is the functionality and solutions for use-cases handled by Reactive Streams implementation, in a direct-style, virtual-thread-friendly package.

Where Java Streams shine is in transformations of (finite or infinite) collections, run synchronously, typically on a single thread. Backpressure is not needed (as the consumer controls the pace of running the stream), everything is deterministic, and control flow and error handling are straightforward.

Where Jox Flows are most useful is integrating with external I/O, event streams (such as message queues, Kafka topics, WebSocket frames, large binary payloads received over HTTP, etc.). In such cases, it's the producer that controls when data is available. Concurrency is simplified by the declarative nature of operations such as .merge, .mapPar, or .buffer, but the processing might be both synchronous and asynchronous. And because Jox Flows are virtual-thread-native, error handling and control flow are as straightforward as with Java Streams.

We would appreciate your feedback on Jox Flows, including which operators and integrations we should prioritize for the next release). Additionally, we are interested in hearing about any features you miss, particularly in comparison to incumbent Reactive Streams implementations.

Blog Comments powered by Disqus.