Implementing Raft using Project Loom

30 Aug 2022. 24 minutes read

Implementing Raft using Project Loom webp image

In the previous article, we've examined in detail the implementation of Raft using a functional effect system. This included an evaluation of the strong and weak sides of such an approach: we've looked at how functional programming, type safety, having an effect system in general, and using Scala and ZIO in particular impact the readability and correctness guarantees of the code.

Given that one of the main markets for a functional effect system is concurrent programming, I think it might be very interesting to examine how the same problem would be solved using a different approach to concurrency: Project Loom, which will be previewed in Java 19.

We'll still use the Scala programming language so that we vary only one component of the implementation, which should make the comparison easier. However, instead of representing side effects as immutable, lazily-evaluated descriptions, we'll use direct, virtual-thread-blocking calls. But let's not get ahead of ourselves, and introduce the main actors.

If you prefer reading the code first and prose second, it's all on GitHub, with side-by-side implementations of the Raft consensus algorithm using Scala+ZIO and Scala+Loom.

What is Project Loom?

Project Loom introduces lightweight threads to the Java platform. Before, each thread created in a Java application corresponded 1-1 to an operating system thread. Loom introduces the notion of a VirtualThread, which is cheap to create (both in terms of CPU and memory) and has low execution overhead. Virtual threads are multiplexed onto a much smaller pool of system threads with efficient context switches.

As OS threads are expensive to create and switch between, a whole ecosystem of solutions emerged to use these scarce resources efficiently. This includes thread pools, executors, and, to some degree, various reactive and asynchronous programming techniques. Not all these will be gone with Project Loom (quite the contrary), but for sure we'll have to rethink our approach to concurrency in a number of places.

If you've seen green threads or fibers in other programming languages, virtual threads are similar concepts. In fact, we've used fibers extensively in our previous implementation of Raft, when using a functional effect system. However, it's now the Java runtime itself that manages these fibers/virtual threads instead of the library code.

As "fibers" are now a native construct, the goal of Loom is to enable using the familiar "direct" (also referred to as "blocking" or "synchronous") style of programming, without the necessity for constructs such as Futures or IOs. Usual control structures: for loops, ifs, try-catch for managing exceptions should be usable again in the presence of side-effecting computations. On top of that, we're promised to get meaningful stack traces!

This wouldn't be possible without a second crucial component of Loom: retrofitting existing code, which would normally block the underlying OS thread, to be Loom-aware. That means that a Thread.sleep will now only signal to the virtual thread executor that the currently executing thread should be put aside for some time, and another virtual thread can be run instead. The same is true for most, if not all, I/O operations.

What is Raft?

Raft, a consensus algorithm for managing a replicated log, got an introduction in the previous article, so if you'd like a refresher, head over there!

loom_raft_1

Loom on a Raft, Dall-E

Why do we need Loom?

A very reasonable question to ask is: Do we need Loom at all to implement Raft?

The answer is: we don't. We could have implemented Raft using "normal" Java, as many people have done successfully.

However, we are not only after implementing Raft, but implementing it in a way that is free of technical details (to the extent to which this is possible), readable, and relatable to what's described in the Raft paper (which is proven to be correct).

And that's where Loom helps: it enables lightweight concurrency, removing much of the boilerplate known from using executors and thread pools. ZIO—the functional effects library we've used previously—has the same goal. Hence the comparison.

At least in this case, we're more after using Loom for the programming model, to create an understandable (and hence human-verifiable) implementation of Raft, rather than for performance—where Loom also has a lot to offer.

Saft: the implementation

Once again, for the detailed description of how Saft—the Scala Raft implementation we are investigating—is implemented, I'm going to redirect you to the appropriate section in the previous article as that part is mostly unchanged.

In some places, the way side-effects are handled and sequenced is different, and that's what we'll focus on. But the data structures, the code layout, and the overall architecture are exactly the same in both ZIO and Loom implementations.

The heart of Saft, that is the Node class, follows the same pattern as before. Events, including election/heartbeat timeouts, incoming requests from other nodes or clients, and responses to requests sent by the node itself, are read off a queue and processed one-by-one.

Alternatively, the readme in the source code contains a short description of the files involved, as well as a suggested "reading order".

loom_juice

Orange juice and a loom, Dall-E

Where is Loom used?

Before we dive into the ZIO-Loom comparison, let's take a look at where Loom—or more generally, threading and concurrency—is used in our implementation.

Despite being a prime example of a distributed algorithm, there isn't that much concurrency in Raft. For sure, there are multiple nodes running in parallel and exchanging messages, but this happens on separate nodes. As far as a single node is concerned, especially in our event-driven, actor-like implementation, concurrency is reduced—on purpose—to a minimum.

After all, the smaller the concurrency, the easier the system is to understand. While we do get some help from the compiler and the implementation's construction in verifying correctness, there's still a lot of manual work. Testing also gets us only that far, it might show that in some scenarios the code behaves properly, but that's no guarantee that race conditions or deadlocks won't happen.

But, back to where Loom is used. First of all, each Saft Node is run on a dedicated thread. This is especially important when running an in-memory Saft simulation with multiple nodes running on a single JVM. But for that, we could have easily used a dedicated OS thread.

Secondly, Loom is used whenever something needs to be done in the background: that is broadcasting messages to other nodes, sending replies to clients, and handling timeouts. We could have implemented that using Executors and Futures. But that would have some drawbacks:

having to think about properly creating, sizing, and scheduling tasks onto the executor; if concurrency is supposed to be lightweight, we shouldn't worry about these things
muddying the code with concurrency-related infrastructure code

Instead, we don't have to think about executors or pass them around. If something needs to happen in the background, we simply create a new virtual thread, and use ordinary virtual-thread-blocking calls.

Abstracting over Loom

Before we dive into concrete examples, you might notice that in the code, we don't create virtual threads directly. Instead, we're using an instance of a custom, suspiciously looking Loom class to fork code blocks so that they run asynchronously.

loom_kandinsky

Loom in the style of Kandinsky, Dall-E

A Loom instance wraps a StructuredTaskScope, which isn't directly part of JEP 425, but of a structured concurrency API, defined in JEP 428. Both are part of the Loom project and are previewed/incubated in Java 19.

The structured concurrency JEP defines APIs to work with threads in a structured way: the scope in which threads live should correspond to the structure of the code. For example, if a thread is started in a code block, such as a method, it should complete before the method completes. In many cases, this makes it much easier to understand concurrent code and to make sure that no threads leak.

The main class defined in JEP 428 is StructuredTaskScope, which is quite low-level. It's easy to misuse it, and it requires calling its methods in a particular order and in correct contexts. However, it does enable implementing scenarios such as racing two computations or running a number of computations in parallel and interrupting all on the first error, ensuring proper cleanup.

In our case, we want to group all virtual threads created when running a Node, so that it can be cleanly shut down. Providing an orderly way to stop a service is not only good practice and good manners, but in our case, also very useful when implementing the Raft in-memory simulation.

A StructuredTaskScope provides exactly that: it enables forking computations into new virtual threads, and has a shutdown method that interrupts all running tasks and completes when all of the started threads complete. However, because it is non-trivial to use, we'll wrap it with a custom Loom class:

import jdk.incubator.concurrent.StructuredTaskScope

class Loom private (scope: StructuredTaskScope[Any]):
  def fork(t: => Unit): Cancellable =
    val future = scope.fork(() => t)
    // this Single Abstract Method is converted to a Cancellable
    () => future.cancel(true) 

trait Cancellable:
  def cancel(): Unit

We'll need to cancel individual tasks as well, hence we introduce the custom Cancellable interface that exposes only the cancel operation of the Future returned by StructuredTaskScope.fork. We also use Scala's by-name parameters to provide nicer syntax for Loom.fork.

There's one important requirement of StructuredTaskScope: it can only be used in a structured way. That's not something that's enforced by the Java type system or the compiler (I doubt that would be at all possible in Java), so we simply have to keep in mind to use the construct correctly.

Initially, I tried to use it in a non-structured way, and quickly ran into a deadlock or weird exceptions (which in part might be because it's an incubating API under development) by creating an instance in the driver of the simulation and shutting the scope down not from within, but from the outside.

To meet this structural requirement, we need to create a parent thread (of course virtual!) that will run a given computation providing a Loom instance. Only when this computation completes, or when the parent thread is interrupted, we close the scope:

object Loom:
  def apply(t: Loom => Unit): Cancellable =
    val th = Thread.startVirtualThread { () =>
      val scope = new StructuredTaskScope[Any]()
      try t(new Loom(scope))
      // ignore so that no exception is logged & finish
      catch case _: InterruptedException => () 
      finally
        scope.join()
        scope.close()
    }
    () => th.interrupt()

Starting a Node now amounts to creating a Loom instance and returning the Cancellable created by Loom.apply (which interrupts the main thread on which NodeLoop.run runs). This way, our Loom object ensures that StructuredTaskScope is used correctly:

class Node(...):
  def start(): Cancellable =
    Loom { loom =>
      logger.info("Node started")
      try
        val initialTimer = Timer(loom, conf, comms)
        val timer = initialTimer.restartElection
        val initialState = persistence.get
        val role = 
          NodeRole.Follower(initialState, FollowerState(None), timer)
        new NodeLoop(loom, nodeId, comms, stateMachine, 
          conf, persistence).run(role)
      finally logger.info("Node stopped")
    }

The timer

An important component in Saft is the Timer, which schedules election & heartbeat events. When a follower or candidate node doesn't receive any communication for a period of time, it should schedule an election. However, if the leader does send e.g. a heartbeat before the election timeout elapses, we should restart the timer. This is implemented by cancelling the current timer & starting a new one.

Starting the timer amounts to forking a thread that sleeps for the timeout duration and then puts the timeout event on the queue. Cancelling the timer then means interrupting the sleeping thread:

class Timer(loom: Loom, conf: Conf, comms: Comms, 
    currentTimer: Cancellable):
  private def restart(timeout: Duration): Timer =
    currentTimer.cancel()
    val newTimer = loom.fork {
      Thread.sleep(timeout.toMillis)
      comms.add(ServerEvent.Timeout)
    }
    new Timer(loom, conf, comms, newTimer)

Replying to new entry requests

The NewEntry request sent by clients to a Raft leader node deserves special treatment, as there's no immediate reply. Instead, we need to put the request aside, and we reply that the entry has been added only after it has been applied to the local state machine, which requires replication to the majority of nodes.

This means that the Node will have to process a number of other events before a reply can be sent. That's why when a new entry request is being processed, we create a CompletableFuture[NewEntryReponse], and return that to handleEvent, which knows how to respond to client requests. There, we start a new thread that blocks on the future (yes! that's allowed now without any costs) and sends the response:

def handleEvent(event: ServerEvent, role: NodeRole): NodeRole =
  event match
    case ServerEvent.RequestReceived(ne: NewEntry, respond) =>
      val (responseFuture, newRole) = newEntry(ne, role)
      (() => loom.fork(respond(responseFuture.get)), newRole)
    // other cases

Why do we use Java's CompletableFuture instead of Scala's Future and Promise? Java's Future has a blocking .get that makes it straightforward to use with Loom. We could use Scala's Await, but the API is not that nice to read.

But before the thread that eventually sends the reply is created, the future is set aside, waiting for the entry to be replicated. Once this happens (in appendEntriesResponse), we complete the futures causing the replies to be sent.

This also nicely demonstrates that Loom does not eradicate Futures: it replaces some (maybe most) of its usages, but it is still a very useful construct. Having a handle to a computation is convenient and often necessary to implement logic as the one above.

Broadcasting messages to other nodes

Every now and then, the leader has to broadcast messages to other nodes—be it heartbeats or with new entries. This has to happen in the background, so that while the sending is in progress, we can serve other events. This is trivial—we simply start a new virtual thread:

def doSend(to: NodeId, 
    msg: RequestMessage with FromServerMessage): Unit =
  loom.fork {
    logger.debug(s"Send to node${to.number}: $msg")
    comms.send(to, msg)
  }

Comparing the Loom and ZIO implementations

Let's get to the really interesting part: how do the two implementations compare? Do they differ in the guarantees they give? What do we gain by using an effect system over Loom, and what do we gain the other way round? Finally, which one is BETTER? (spoiler: it depends—sorry).

Bird eye's view

If we zoom out just a little bit, both codebases are nearly identical. That's because most of the code is copy-pasted!

Both Loom and ZIO versions use the same immutable data structures to model the domain, represent server state, the events and node roles. They have the same interfaces for communications, persistence, and representing the state machine, to which entries are applied. Finally, the overall architecture and code structure in the Node implementation are the same.

Although ZIO is more of an implementation detail, as the focus of this and previous articles is on functional effect systems, and we could have used cats-effect getting a similar result, I'll refer to the original implementation simply as "ZIO" for the prosaic reason that the name is much shorter.

Fundamental differences can be found on two levels:

representing and sequencing side-effecting computations
managing concurrency

We'll cover both (and more) in subsequent sections.

loom_bird

Loom from a bird eye's view, Dall-E

From ZIO to Loom

What's important to keep in mind is that the ZIO implementation came first, and the result might have been completely different had I started with Loom, and then translated to ZIO. I might never know.

That said, the process of translating the code that did require changing was almost mechanical. The most common operation, that is sequencing two effects in ZIO—a flatMap invocation—which we often wrote as a for-comprehension:

for {
  result1 <- effect1
  result2 <- effect2
} yield combine(result1, result2)

becomes a ;, although in Scala the semicolons are inferred:

val result1 = effect1()
val result2 = effect2()
combine(result1, result2)

The latter is slightly more familiar if you are coming from a C-like background (you probably are), but both don't differ that much in readability. As a concrete example, compare these two implementations of the startCandidate method, which is invoked when the election timeout is triggered. First, the ZIO version:

def startCandidate(state: ServerState, timer: Timer): UIO[NodeRole] =
  // On conversion to candidate, start election: 
  // Increment currentTerm, Vote for self
  val newState = state.incrementTerm(nodeId)
  for {
    _ <- ZIO.log(s"Became candidate (term: ${newState.currentTerm})")
    // Reset election timer
    newTimer <- timer.restartElection
    // Send RequestVote RPCs to all other servers
    _ <- ZIO.foreachDiscard(otherNodes)(otherNodeId =>
      doSend(otherNodeId, RequestVote(newState.currentTerm, 
        nodeId, newState.lastIndexTerm))
    )
  } yield NodeRole.Candidate(newState, CandidateState(1), newTimer)

And the Loom one:

def startCandidate(state: ServerState, timer: Timer): NodeRole =
  // On conversion to candidate, start election: 
  // Increment currentTerm, Vote for self
  val newState = state.incrementTerm(nodeId)
  logger.info(s"Became candidate (term: ${newState.currentTerm})")

  // Reset election timer
  val newTimer = timer.restartElection

  // Send RequestVote RPCs to all other servers
  otherNodes.foreach(otherNodeId => doSend(otherNodeId, 
    RequestVote(newState.currentTerm, nodeId, newState.lastIndexTerm)))

  NodeRole.Candidate(newState, CandidateState(1), newTimer)

Apart from sequencing, above, we are iterating over all nodes to broadcast RequestVote messages. In ZIO, this is done using a specialised iteration operation that is ZIO-aware and manages the sequencing of the resulting effects correctly so that we get a single effect with the combined result. In the Loom version, we can use "normal" collection iteration, here using .foreach, which is equivalent to a for-loop.

Concurrency

Even concurrency feels similar—at least on a syntactic level. Let's look at the timer-reset method:

// ZIO
def restart(timeout: UIO[ServerEvent.Timeout.type]): UIO[Timer] =
  for {
    _ <- currentTimer.interrupt
    newFiber <- timeout.flatMap(comms.add).fork
  } yield new Timer(conf, comms, newFiber)

// Loom
def restart(timeout: Duration): Timer =
  currentTimer.cancel()
  val newTimer = loom.fork {
    Thread.sleep(timeout.toMillis)
    comms.add(ServerEvent.Timeout)
  }
  new Timer(loom, conf, comms, newTimer)

Thanks to the nice syntax provided by our Loom.fork method (which required rather minimal effort), if we parsed these into sufficiently high-level abstract syntax trees, we would get the same result.

Are the concurrency constructs really equivalent in both approaches? On the surface, it might seem so. However, if we go a bit deeper, that's not always the case.

An important remark here is that in Saft, we are only using a tiny fraction of ZIO's concurrency API, and on quite a low level. Most higher-level operations don't apply to our problem. On the other hand, Loom is a foundation on top of which concurrency libraries can be built, so it wouldn't even make sense to compare ZIO's high-level API with Loom in the first place.

Supervision

The first difference that is quite apparent is fiber supervision. In ZIO, we get automatic fiber supervision: if a fiber creates other fibers, the child ones are automatically bound to their parents' lifecycle. Hence interrupting the parent causes the child to be interrupted as well.

With virtual threads, that's not the case. The threads don't naturally form any kind of hierarchy. We have to resort to using StructuredTaskScope to achieve a similar effect. In fact, these scopes behave kind of like the fibers in ZIO: if you create a scope within a scope, it is automatically bound to the parent-scope lifecycle.

With the ZIO approach, it's harder to get it wrong. You can opt out of automatic supervision, but if you stick to defaults, it's simply not possible to use the API incorrectly (as far as supervision is concerned). With Loom, you have to make additional effort to ensure that no threads leak. Plus, you might need to wrap the low-level API, just as we did using the Loom class.

Interruption

The ability to interrupt a fiber or a virtual thread is crucial when writing concurrent code. Quite often, we are no longer interested in the result of a computation, maybe because of a timeout or because another one completed faster.

Both ZIO and Loom allow interrupting forked tasks. And we use this functionality extensively, e.g. for the timer. But the implementations are completely different. In Java, we have the usual interruption mechanism, which causes an InterruptedException to be thrown in the interrupted code. This is a normal exception, so it might be caught and ignored. Moreover, not every blocking call is interruptible—but this is a technical, not a fundamental limitation, which at some point might be lifted.

Keep in mind that Java must have implemented interruption for virtual threads in a backwards-compatible way. I have no idea how they would do it if that constraint was not there, but that's more of an academic discussion.

In ZIO, interruption is non-recoverable. In an effect system, an interpreter evaluates descriptions of potentially side-effecting computations. Hence ZIO has much more control over how interruptions are handled. Our code, when written using ZIO, simply has no way of "catching" an interruption and recovering from it. It is possible to define uninterruptible regions, but once the interpreter leaves such a region, any pending interruption requests will be processed.

The points at which interruption might happen are also quite different. In Loom, these are only blocking calls. In ZIO, each time we sequence two effects (using flatMap or for), we create a potential interruption point.

Type safety

On the type safety front, there isn't much difference. After all, does it matter if a function returns a UIO[Unit] or just Unit? In the first case, the effect wrapper indicates that the result contains the description of some side effects that should be run. But the second is the same: the only point of running a function returning a Unit is for that method to run side-effects. We get exactly the same information.

The situation is a bit different with a method returning UIO[NodeRole] vs just NodeRole. Here the Loom version might, or might not, run side-effects. However, as far as Raft implementation was concerned, this did not really matter a lot. In this aspect, there's no additional type-safety benefits from the wrapped representation.

Laziness

A fundamental difference between both implementations, however definitely not obvious when reading the code, is that in the ZIO version, we create lazily-evaluated descriptions of computations and combine these computations into larger blocks. In Loom, we have eagerly-evaluated effects.

This also means that in ZIO, the definition ordering is separated from computation ordering. In the Loom implementation, these two coincide: the moment we define a side-effecting computation, it is being evaluated.

Does it matter in general? Yes. Did it matter in Saft's implementation? Slightly. There's a single case where in the handleEvent method, we have to make the response function explicitly lazy (by creating a function () => respond(theResponse), vs. simply creating a response(theReponse) description. This is needed to meet Raft's requirement that state persistence happens before sending a reply.

The above case is an example of a more general property of lazily vs. eagerly evaluated code: in some cases, when using Loom, you'll have to pay additional attention and explicitly write code so that it's lazily-evaluated. In ZIO, this does not matter, as everything is lazy. This might have a smaller or higher impact, but ZIO eliminates here one potential source of bugs.

Testing

As mentioned before, testing any system, but especially a distributed algorithm, will only get you as far—you'll verify that in some scenarios your implementation works, but for the most part, you'll still need to manually verify that it's correct.

But verifying the basic scenarios is still valuable. How does testing using ZIO and Loom compare?

On a syntactic level, things are again quite similar, with the same almost mechanical process needed to translate between the two. Compare the Loom and ZIO implementations.

The tricky part when testing Saft is that Raft is a time-based algorithm, with all the consequences that it brings. In the Loom implementation, we have no choice but to live with time-sensitive tests. For example, to wait for a leader to be elected, we need to continuously probe the nodes (sleeping between each attempt), or take a simpler approach of waiting long enough until an election is most probably completed successfully. If you ever wrote tests which involve Thread.sleep, you probably know that they are fragile and prone to flakiness.

The ZIO implementation is in a better situation. As the environment in which the program description is fully controlled, in tests ZIO uses a TestClock along with a test interpreter. We can arbitrarily push the clock forward—time does not flow on its own in a test; only when we request it to.

When the clock is pushed forward, the interpreter makes sure that all effects on all currently running are fully evaluated (until all fibers become suspended, waiting on some conditions) before returning control to the test code. That way, we can write fast (in a couple of milliseconds we can cover seconds, minutes or hours of test-clock time!), predictable, reproducible tests.

Error handling

There isn't a lot of error handling going on in Saft, but there's a possibility of introducing a bug in a couple of places. The contract of Comms.send, the method that is supposed to send a message (such as RequestVote or AppendEntries) to another node, is that all exceptions have to be handled.

In Java, we could reify that requirement in the method's signature by the lack of checked exceptions. However, checked exceptions have major usability flaws and they didn't stand the test of time—they're often considered a weak part of Java's design. Scala doesn't have checked exceptions at all (although Scala3 experiments with another approach), so in the Loom implementation, all that we can do is include that requirement in a comment.

The ZIO implementation is once again in a better situation. The wrapper types that we use to represent effects include information on the possible errors that might occur. In the case of Comms, the signature of send is:

def send(toNodeId: NodeId, msg: Message): UIO[Unit]

where the UIO type by definition represents a computation where all errors are handled. Exceptions might still occur—but they are then considered defects in the implementation, or fatal errors (we can always divide by 0!). Hence the expectation is included in the signature, and upon normal usage of ZIO's API, the compiler will verify that we return the correct type. E.g. sending an HTTP request will return a Task[Response], which can throw arbitrary exceptions; only after adding an exception handler, this will become a UIO[Response].

As with the concurrency API, we've only scratched the surface of ZIO's error handling API, which aims to improve upon Java/Loom's try-catch to provide polymorphic abstraction and full inference. But that's a separate topic.

Read: Handling errors in direct-style Scala

Correctness of the implementation

Finally, do the two implementations differ in what kind of correctness properties are guaranteed? After all, that's what we are after. We can't really prove that the implementation is correct (for that, we'd need a much more sophisticated language), but there are some guardrails that help us out.

And here both implementations don't really differ. The properties that are checked are more or less the same; we've covered this in the previous article, but to reiterate the main points:

each request & response is handled (due to exhaustivity of pattern matching)
each request receives a response of the appropriate type (due to typing of handler methods)
edge conditions (empty log) are properly handled (due to Option usage)
state is safe from a concurrency point of view (due to immutability)

These stem from the fact that we are using a type-safe, immutable-first language (Scala) and functional programming, not from the way effects are represented. We've also identified some properties where the tools we've chosen—concurrency libraries and the construction of the code—help us in writing a correct implementation:

side-effects are properly sequenced (due to explicit ordering)
persistence always happens, and always before replies (due to the construction of the run method)
broadcasts happen in parallel (due to forking)
election / replication conditions are met (due to helper methods in State)

As mentioned earlier, proper sequencing needs a bit more mindfulness in the Loom than the ZIO implementation so that at one point, we don't inadvertently run the side-effects at the moment of their construction. Other than that, both implementations are similar here as well.

Pros & cons

loom_supervised

Supervising a loom, Dall-E

For the table lovers out there (I'm one), let's summarise the pros & cons of the Loom-based implementation, just as we did with the ZIO one:

	Pros	Cons
Functional programming	Immutability Local reasoning Access old & new state	Old, stale values might be used
Type safety	Code navigation & completion Exhaustivity checking Opaque types Optionals
Loom	Effortless concurrency Familiar syntax Supervision through `StructuredTaskScope` Using "normal" control-flow constructs	Testing Manual laziness, when needed Manual supervision
Other	Representing node role as a data structure

And finally, summarising Loom vs ZIO—but only in the scope of the Saft implementation! Keep in mind that we do not aim to run a comprehensive comparison here. Raft is a single, specific use-case that doesn't use both Loom's and ZIO capabilities to their full extent.

	The good	The bad
Loom	Effortless concurrency Familiar syntax Supervision possible Normal control flow constructs (`for`, `if`, `try`)	Testing Manual supervision Lazy-on-demand Interruption inherited with Java's problems
ZIO	Effortless concurrency Well-designed interruption Testing Automatic supervision Uniform effect representation (always lazy) Errors as values	Syntax overhead due to wrapper types Needs dedicated control flow methods (`ZIO.foreach` etc.)

ZIO vs Loom: the verdict

Loom has the upper hand when it comes to syntax familiarity and simpler types (no viral Future / IO wrappers). ZIO, on the other hand, wins in its interruption implementation, testing capabilities, and uniformity. When it comes to concurrency, to the degree that we've been using it, there haven't been significant differences.

As far as Saft—our Scala Raft implementation—is concerned, I'd say it's a tie. I'm happy with both implementations, and they are hopefully both readable and easy to relate various implementation fragments to the Raft paper.

We're just at the start of a discussion as to how to further evolve our effect systems. There's been a hot exchange on that topic on the Scala Contributors forum, you can find the summary over here. Another interesting related presentation is Daniel Spiewak's "Case for effect systems" where he argues that Loom obsoletes Future, but not IO.

But in the end, what matters is the code. The Saft repository awaits, with two implementations of Raft for you to compare and play around, using either the in-memory simulation or the HTTP+JSON based one. If you find any problems or places where it's not clear how things work—please do let me know!

Contents