envelopmenuskypeburger-menulink-externalfacebooktwitterlinkedin2crossgithub-minilinkedin-minitwitter-miniarrow_rightarrow_leftphonegithubphone-receiverstack-overflow

Race condition when caching using the get-compute-put pattern

TL;DR: Beware of using the get-if absent compute-put sequence to cache values in a multi-threaded environment. Use Caffeine & the atomic get-or-create operation.

Recently we encountered an interesting problem with the way we've been using a Guava cache, which caused a race condition and in effect "random" testsuite failures. Under some circumstances, when read & write operations were run concurrently, the cache ended up containing stale values.

We have two operations, which can be summarized as:

Cache<K, V> cache = ...

public V read(K key) {
  V result = cache.getIfPresent(key);
  if (result == null) {
    result = readFromDatabase(key);
    cache.put(key, result);
  }

  return result;
}

public void write(K key, V value) {
  writeToDatabase(key, value);
  // force the value to be re-read from 
  // the database on next read(key) invocation
  cache.invalidate(key); 
};

This get-if absent compute-put pattern is quite popular when using caches. However, it can cause stale values to be stored in the cache in a multi-threaded environment. Suppose we have two threads, T1 and T2. The first will invoke read, and the second will invoke write for the same key, on an initially empty cache. As these are threads, they might get interleaved in many ways, for example like this:

T1: read(key)
T1:   V result = cache.getIfPresent(key); // returns null
T1:   if (result == null) // it is null
T1:     result = readFromDatabase(key); // returns v1

T2: write(key, v2)
T2:   writeToDatabase(key, v2);
T2:   cache.invalidate(key);

T1:     cache.put(key, result)

Now, after this sequence, the cache will contain an incorrect mapping: key -> v1, while the database contains a key -> v2 mapping.

How to solve this? Well, we would need some kind of an atomic get-or-create operation, which would only run invalidate after creating a missing value for a key is complete and the result is in the cache.

As we were already using a Guava Cache, our first try was to use a cache loader, which you provide when constructing the cache (or, equivalently, you can provide a loader method as a second parameter to cache.get). Here's the full test code:

AtomicReference<Integer> db = new AtomicReference<>(1);

LoadingCache<String, Integer> c = CacheBuilder.newBuilder().build(
    new CacheLoader<String, Integer>() {
        public Integer load(String key) {
            println("Reading from db ...");
            Integer v = db.get();
            sleep(1000L);
            println("Read from db: " + v);
            return v;
        }
    }
);

Thread t1 = new Thread(() -> {
    Integer g = c.get("k");
    println("Got from cache: " + g);
});

Thread t2 = new Thread(() -> {
    sleep(500L);
    println("Writing to db ...");
    db.set(2);
    println("Wrote to db");
    c.invalidate("k");
    println("Invalidated cached");
});

t1.start();
t2.start();

t1.join();
t2.join();

System.out.println();
println("In cache: " + c.get("k"));

You can view the whole source here. The t1 thread reads a value from the cache, which will invoke the cache loader when the value is missing. We simulate a delay in writing the value back to the cache, after it's read from the DB using the battle-proven Thread.sleep. The t2 thread first makes sure that the read in t1 from the database already happened, again by using the highly sophisticated machinery of Thread.sleep, then updates the "database" (here, an in-memory atomic integer) and invalidates the cache.

Unfortunately, this doesn't quite work as expected, here's the output (format: [thread name] ([ms since start]): [msg]):

Thread-1 (123): Reading from db ...
Thread-2 (612): Writing to db ...
Thread-2 (612): Wrote to db
Thread-2 (613): Invalidated cached
Thread-1 (1142): Read from db: 1
Thread-1 (1145): Got from cache: 1

main (1146): In cache: 1

As you can see, the invalidate operation is invoked right away, despite an ongoing cache load for the k key.

Luckily, not all hope is lost, as we have Tomasz Dziurko in our team, who pointed me to Caffeine. The code using Caffeine is almost exactly the same, except for imports and possibility of using lambdas. The main difference is that instead of CacheBuilder.newBuilder(), you need to use Caffeine.newBuilder(). However, the output is radically different:

Thread-1 (122): Reading from db ...
Thread-2 (624): Writing to db ...
Thread-2 (624): Wrote to db
Thread-1 (1146): Read from db: 1
Thread-2 (1146): Invalidated cached
Thread-1 (1146): Got from cache: 1

main (1147): Reading from db ...
main (2149): Read from db: 2
main (2149): In cache: 2

As you can see, the invalidation operation in t2 is only run after the cache load completes for k, causing the value to be re-computed on subsequent cache gets (in the main thread). This causes the cache to contain the right mapping (to the updated value). How is that possible? Caffeine has key-scoped locks; when invoking cache.get with a cache loader defined, the lock for the given key is held until the value is computed.

As a bonus, Caffeine supports asynchronous operations, so the values to be cached can be computed asynchronously using CompletableFutures (and later converted to Scala Futures using scala-java8-compat, as was in my case). See the async test for details.

Update 1: as Ben Manes pointed out, there's a pending Guava bug. Not sure if it will get "fixed" though, as it's over 2 years old.

Update 2: Viktor Klang pointed out that it's possible to implement such a cache (with some functionality restrictions) using a ConcurrentIdentityHashMap and Scala's Futures, see this twitter thread for details.