Contents

Contents

Deterministic Concurrent Testing Using Fray

Deterministic Concurrent Testing Using Fray featured image

Testing concurrent code is hard. If you run a multi-threaded test and it passes, you never know if the code is correct or if it's just luck. The only way to be certain is through formal verification; however, despite significant progress in this area, writing and maintaining formally verified code remains very expensive and time-consuming. Hence, it's only applicable to a narrow spectrum of use cases.

That's where deterministic concurrent testing comes in. It doesn't give you a guarantee that your multi-threaded code is correct, but it also provides much more than unit or stress tests (that is, when you run your code in a loop, hoping that enough thread interleavings will occur to find any problems).

Fray is one such library, which enables writing concurrent tests for the JVM, deterministically simulating various thread interleaving, and if needed, replaying runs that failed, using a standard Java debugger.

Let's take a closer look at how Fray can be used and how it works.

What kind of bugs Fray detects

Before we begin, to reiterate: when writing a concurrent test using Fray, you won't get a guarantee that every concurrency bug will be found. Usually, the search space (the number of possible thread interleavings) is too large, as it grows exponentially with each synchronization point (such as acquiring a lock, invoking blocking I/O, or reading an atomic integer).

However, Fray explores the search space in a "probabilistically intelligent" way, implementing the latest research on the subject, and thus maximizing the probability that if a bug exists, it will be found.

On a more technical level, the test code is first instrumented at bytecode load-time. Then, Fray runs it in multiple iterations. It integrates well with build systems such as Maven or Gradle, testing frameworks such as JUnit, and any JVM-based languages. A concurrent test fails if there's a deadlock, or if any thread started during the test ends with an exception.

Here's a simple Fray-enabled test case, using JUnit:

@ExtendWith(FrayTestExtension.class)
public class FrayDemoTest {
    // Similar examples might include: rate limiters, CPU credits, ...
    class Bank {
        volatile int balance = 100;

        void withdraw(int amount) {
            if (balance >= amount) {
                balance = balance—amount;
            }

            assert(balance >= 0);
        }
    }

    @ConcurrencyTest
    public void bankTest() throws InterruptedException {
        var bank = new Bank();
        Thread t1 = new Thread(() -> {
            bank.withdraw(75);
        });
        Thread t2 = new Thread(() -> {
            bank.withdraw(75);
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
    }
}

Despite using volatile to protect memory from concurrent access, this code is not correct, as there exists a thread interleaving where the assertion fails, and we end up with a negative account balance:

timet1t2
1if (balance >= amount)true
2if (balance >= amount)still true!
3balance = balance - amountbalance is 25
4balance = balance - amountbalance is -50!

The problem is that if followed by an assignment is not an atomic operation. In fact, there are three separate operations (read in if, read old value, write new value).

Note that what we have above in the table is a sequential interleaving: only one thread runs at any given time. Despite this, we still have a failure. Here, the bug is straightforward to spot, and in fact, IntelliJ flags it with a warning. However, real-world bugs might be—and usually are—much better hidden.

If we run this test, we'll get a failure, with the following output:

[ERROR] Failures:
[ERROR] com.sofwaremill.test.FrayDemoTest.bankTest
[INFO]   Run 1: PASS
[INFO]   Run 2: PASS
[INFO]   Run 3: PASS
[INFO]   Run 4: PASS
[INFO]   Run 5: PASS
[INFO]   Run 6: PASS
[INFO]   Run 7: PASS
[ERROR]   Run 8: FrayDemoTest.bankTest java.lang.AssertionError

The first seven interleavings were fine; only the 8th one attempted ended with an exception. When you run this code on your machine, you may encounter the failure earlier or later—choosing the interleavings is, to a significant extent, random.

The target/fray/fray-report directory contains the details on the failing test run, including the random seeds:

[~/projects/fray-test/target/fray/fray-report]% cat recording/random.json
{"type":"org.pastalab.fray.core.randomness.ControlledRandom","integers":[148390750,743789262,418850112]}

We can now modify the test run so that it runs only this specific thread interleaving:

@ExtendWith(FrayTestExtension.class)
public class FrayDemoTest {
    class Bank {
        // …
    }

    @ConcurrencyTest(replay="~/projects/fray-test/target/fray/fray-report/recording")
    public void bankTest() throws InterruptedException {
        // …
    }
}

Thus, reproducing the bug makes it possible to debug it. Note that to debug the concurrent code, you'll need to use a Fray-instrumented JVM (currently Java 25). In IntelliJ, that amounts to changing the JVM in the Run/Edit configurations screen:

fray%20intellij

What kind of bugs Fray doesn't detect

Fray assumes (but doesn't require) that your code is free from data races. In practice, this means that your code should utilize atomics, volatile fields, locks, and other similar mechanisms to protect access to mutable data correctly. Data-race (i.e., concurrent access to unprotected memory) bugs are out of scope of Fray's verification.

For example, slightly modifying the previous example makes the test pass:

@ExtendWith(FrayTestExtension.class)
public class FrayDemoTest {
    class Bank {
        int balance = 100;

        void withdraw(int amount) {
            if (balance >= amount) {
                balance = balance—amount;
            }

            assert(balance >= 0);
        }
    }

    @ConcurrencyTest
    public void bankTest() throws InterruptedException {
        var bank = new Bank();
        Thread t1 = new Thread(() -> {
            bank.withdraw(75);
        });
        Thread t2 = new Thread(() -> {
            bank.withdraw(75);
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
    }
}

Note that the balance field is no longer volatile. Of course, the bug is still present—in fact, it has now become worse, as reading & writing balance from multiple threads is not synchronized. When reading, we can now get a stale value, and when writing, there are no guarantees that the new value will be visible to other threads (running on different cores).

You might say that this "new" bug isn't due to thread interleaving (different sequences of threads running alternately), but due to concurrently writing to the same memory location. Hence, Fray is a tool to detect bugs caused by incorrect thread interleavings; it is not a tool to detect bugs caused by concurrent memory writes. Fray verifies if concurrency & synchronization tools are used correctly. In yet other words, concurrency bugs are detected, but not parallelism bugs.

Concurrency is a structural property: it enables making progress on multiple threads without any of them having to run to completion. Parallelism is a runtime property, meaning that multiple threads actually run simultaneously.

How Fray works

The limitation—or requirement—described in the previous section comes from how Fray executes your code, and how it tests different thread interleavings. The threads started in the test are run concurrently, but not in parallel: that is, at most one thread executes at any time.

Fray implements this with a technique it's calling shadow locking. The instrumentation that Fray performs at bytecode load-time is inserting locks around all concurrency resources. For example, each volatile field has a dedicated lock. It is acquired before being used (read/written), and released immediately afterwards. Similarly, there's a lock around each semaphore, atomic, etc. Finally, each thread also has a dedicated lock, which is acquired when the thread starts. The situation is a bit more complicated with synchronized blocks, but it works similarly and takes nesting into account.

Initially, all locks are held by the Fray test runtime. Fray knows which & how many threads are waiting to acquire each lock, and releases one of them (which one—depends on a configurable strategy). The test-code thread is running until it's once again waiting on a lock. Fray is notified of this, and once again chooses a single thread to run (possibly a different one), releasing the appropriate lock after consulting the strategy once again.

That way, depending on which threads we choose to run at each synchronization point, we get a specific thread interleaving. Another test run yields another interleaving—as the strategy will most likely choose other threads (the strategies are randomized).

Hence, you can simulate several different thread interleavings (1000 per run by default), where each run corresponds to some sequential execution of the threads involved. You get concurrency, but you don't have parallelism.

fray

There are several strategies to choose from, such as random walk or Partial Order Sampling, which assigns and shuffles thread priorities in a predetermined way. Empirical studies have found that the POS strategy is currently the most effective one. This, however, depends on the specific use case. You might configure the strategy used as part of the @ConcurrentTest annotation.

The instrumentation that Fray performs must happen both on the JVM level (standard library, for example), as well as with your code. That's why an instrumented JVM is needed to run Fray's concurrency tests.

Fray in Jox

I discovered Fray during the recent ICFP/Splash conference, due to my interest in implementing better testing for the Jox Java library, which provides virtual-thread-based safe concurrency & streaming.

More specifically, Jox contains an implementation of Go-like, selectable channels, following a paper by Kotlin's developers.

Jox Channels are already thoroughly tested using a mix of unit & stress tests, but a more systematic approach to concurrency testing is exactly what I've been looking for. Fray is an ideal fit: Jox's channel implementation is a heavy user of volatile fields, atomic get/set/CAS operations, LockSupport.park(), and the like.

As a result, Fray-enabled tests are now run during every CI run using various channel sizes (0—rendezvous, 1, 10) and internal segment sizes (4 & 32) to boost both the confidence in the existing implementation, and provide better coverage when introducing new features:

fray%20ci%20build

Although Fray can be used as a drop-in replacement for existing tests, which might be as simple as replacing @Test with @ConcurrentTest, in the case of Jox, I created a separate Fray-only test suite.

There are two reasons for this: first, many existing Jox tests are unit-like, meaning they often test specific functionalities without involving multiple threads. Second, the tests that involve threads are written using virtual-thread-based structured concurrency, which Fray does not yet support. The dedicated test suite contains tests that use "traditional" Java threads.

More on concurrent testing

If you'd like to dive deeper into the rationale behind creating Fray, details on how it works, along with a comparison with other tools and case studies on applying Fray to existing software (such as Kafka and Guava), and the bugs that were found, take a look at the paper.

There are, of course, other similar tools, which might be worth exploring. One such tool, also presented at ICFP/Splash, is Lincheck. Lincheck can be used with arbitrary JVM-based code, but shines best when testing data structures written in Kotlin, especially ones that leverage Kotlin's coroutines. In fact, Lincheck is used to verify the implementation of channels, from which Jox is derived.

Fray and similar tools can increase our confidence in the concurrent code that we write; however, ultimately, that verification will be only as good as the test suite. Writing tests that provide sufficient coverage of possible thread-competition scenarios is still something a developer (or an AI) must do.

Blog Comments powered by Disqus.