Contents

Contents

Understanding Capture Checking in Scala

Understanding Capture Checking in Scala featured image

Capture checking is an upcoming Scala feature that allows you to track which designated values (capabilities) are captured (i.e., stored as references) by arbitrary other values. This tracking happens at compile time and is currently an opt-in mechanism that can be enabled via an import.

There are two good resources to get started with capture checking: the official Scala docs and Nicolas Rinaudo's article. Still, it took me some time to understand how and why capture checking works. Thanks to the help from Martin Odersky, and the OOPSLA talk by Yichen Xu, I think I now have a pretty good grasp of the mechanism. Hence, I hope the following will be a valuable companion to these two excellent sources.

Capture checking has been significantly improved in the upcoming Scala 3.8. We'll be using the 3.8.0-RC3 version in the examples below.

What's a capture?

A value is captured by another if it's somehow retained in its object tree. For example, given:

case class JsonParser(inputStream: InputStream, config: JsonConfig)

an instance parser: JsonParser captures the input stream & the config with which it was created. Now, certain captures are worthwhile tracking, while others are not. Here, we might want to track where the inputStream is used & retained, so that when we close it, we are sure that nobody is using it anymore. On the other hand, tracking config is probably useless.

Example: resources

Let's take a look at a couple of examples of capture checking in action. First, the use case we've talked about above—tracking the lifetime of an input stream.

We might define a utility function, which opens an input stream for a given file name, applies the provided closure, and ensures that the stream is closed (try running the examples with scala-cli!):

//> using scala 3.8.0-RC3

import language.experimental.captureChecking
import java.io.{FileInputStream, InputStream}

def withFile[T](name: String)(op: InputStream^ => T): T =
  val f = new FileInputStream(name)
  try op(f)
  finally f.close()

Such a function might be used legally, e.g., to read the first byte of the file:

@main def main =
  withFile("data.txt"): in =>
    in.read()

However, when we try to "leak" the input stream (note that after withFile completes, the stream is no longer valid—it's closed), we get a compiler error (which is … quite complex and opaque, but remember it's still an experimental, under-development feature):

withFile("data.txt")(identity)

[error] ./main.scala:17:24
[error] Found:    (x: java.io.InputStream^'s1) ->'s2 java.io.InputStream^'s3
[error] Required: java.io.InputStream^ => java.io.InputStream^'s4
[error]
[error] Note that capability cap cannot be included in outer capture set 's4.
[error]
[error] where:    =>  refers to a fresh root capability created in method 
                  main3 when checking argument to parameter op of method withFile
[error]           ^   refers to the universal root capability
[error]           cap is a root capability associated with the result 
                  type of (x: java.io.InputStream^): java.io.InputStream^'s4
[error]   withFile("data.txt")(identity)
[error]                        ^^^^^^^^

Why is that? The input stream is marked as a capability: its type is InputStream^. The ^ is crucial: it designates the value as tracked. Any type followed by a ^ is a tracked capability.

If a capability leaks outside its scope, it's visible (as here, it would leak outside the op closure), we get a compile error. We'll dive a bit deeper into how it exactly works shortly.

Example: Ox concurrency scope

Another example of capture checking is opening a concurrency scope, as in Ox. The scope, an instance of the Ox class, is passed to the provided code block via a context function, where fork invocations might occur. These forks require that a given Ox is in scope, otherwise there's a compile-time error.

As before, we would like to ensure that the scope doesn't leak outside of the code block. This legal usage compiles fine:

trait Ox
trait Fork[T]:
  def join(): T

def supervised[T](op: Ox^ ?=> T): T = ???
def fork[T](t: => T)(using Ox^): Fork[T] = ???

val result1 = supervised:
  val f1 = fork(2+3) 
  val f2 = fork(4*5) 
  f1.join() + f2.join()

While this gives a compiler error, as we would be able to fork a new thread by invoking result2, after the scope has completed:

val result2 = supervised:
  () => fork(println("thread 1"))

[error] ./main.scala:33:5
[error] Found:    (contextual$2: Ox^'s1) ?->'s2 () ->{contextual$2} Fork[Unit]^'s3
[error] Required: (Ox^) ?=> () ->'s4 Fork[Unit]^'s5
[error]
[error] Note that capability contextual$2 cannot be included in outer capture set 's4.

Expect a capture-checked Ox soon!

Example: transitive checking

What if we capture a capability in a value and try to leak it? Won't that fool the system? Well, of course not. Even more: we get fine-grained capture tracking in the code.

Imagine we have a streaming JSON parser, which captures an input stream:

trait JsonObject
class StreamingJsonParser(in: InputStream^):
  def nextObject(): Option[JsonObject] = ???

// counts the number of objects in the given file
withFile("data.txt"): in =>
  var count = 0
  val parser = new StreamingJsonParser(in)
  var obj = parser.nextObject()
  while parser.nextObject().isDefined do
    count += 1

  count

This is all legal and would compile just fine. But if we try to leak the parser, we get an error:

withFile("data.txt"): in =>
  val parser = new StreamingJsonParser(in)
  parser

[error] ./main.scala:52:25
[error] Found:    (in: java.io.InputStream^'s1) ->'s2
[error]   StreamingJsonParser{val in²: java.io.InputStream^{in}}^{in}
[error] Required: java.io.InputStream^ =>
[error]   StreamingJsonParser{val in²: java.io.InputStream^'s3}^'s4
[error]
[error] Note that capability in cannot be included in outer capture set 's4.

Types of captures, and the meaning of ^

What's the type of parser in the above example? As it captures a capability, it becomes a capability itself. So its type must be at least StreamingJsonParser^. But we can be more precise than that: we know exactly which capabilities the parser captures. This can be written down as a type:

val parser: StreamingJsonParser^{in}

Meaning that parser is a capability that captures the in capability. In general, you might enumerate any number of capabilities in the ^{...}. That's called a capture set (it's unordered).

Every capability has a capture set. Wait, but what about ^? What's the capture set here? Well, a type followed by ^ means that it's a capability with some, unknown capture set. It might capture arbitrary values: maybe none, maybe 1, maybe 67.

^ in function/ method/ constructor parameters

When a value with a ^ type appears as a method parameter, function parameter, or constructor argument, it means we can provide a capability with an unknown capture set.

It's not important what the exact capture set of that parameter is—it's only relevant that the value has some capture set, and that, because of the capture set, its usage should be tracked. You might also think of it as a capture-polymorphic function. For example:

def deserialize(parser: JsonParser^): Option[MyCaseClass]

Is equivalent to (yes, that's valid syntax):

def deserialize[C^](parser: JsonParser^{C}): Option[MyCaseClass]

The argument can be read as: a JsonParser which captures an arbitrary capability set. So you might say that whenever there's a ^ parameter in a function, it's capability-polymorphic. The actual parameter might capture, e.g., an input stream, or two input streams, but that's not a concern of the deserialize function.

Capture checking type hierarchy

^-types are new Scala types, and they not only fit into the inheritance hierarchy, but they also create an inheritance hierarchy of their own.

Firstly, we saw that ^ means that a value might capture some unknown capabilities. Hence, every value that captures, for example, in: InputStream^, also captures some unknown set of capabilities—but not the other way round. Hence, JsonParser^{in} <: JsonParser^. Another way of thinking about it: there are fewer instances of JsonParser that just capture in, than there are instances of JsonParser that capture arbitrary input streams / values.

Secondly, a capture set specifies what values might be captured, not which values are captured exactly. When we see a JsonParser^{in1}, we know that it might capture in1: InputStream^, but we are certain that it doesn't capture in2: InputStream^.

Hence, the type of a capture can be widened. A parser which might capture in1, is also a parser which might capture in1 and in2. But not the other way round! Hence, JsonParser^{in1} <: JsonParser^{in1, in2}.

Finally, we might have an in-memory parser implementation, which doesn't capture any values. Its type will be JsonParser—no capture set. This is equivalent to JsonParser^{}. Of course, we might also view this as a parser that might capture in1. Hence, JsonParser^{} <: JsonParser^{in1}.

Summing up, we get the following hierarchy:

JsonParser = JsonParser^{} 
          <: JsonParser^{in1} 
          <: JsonParser^{in1, in2} 
          <: JsonParser^

Of course, regular subtyping also applies, hence FileJsonParser^{in} <: JsonParser^{in}, if the FileJsonParser class implements the JsonParser trait.

An updated type hierarchy

You might remember the unified type hierarchy from Scala's docs. We can now create a variant that takes into account capturing types. Primitives can't be capabilities, so we'll restrict the diagram to AnyRefs:

hierarchy

The diagram, of course, depends on what capabilities are currently in scope. Here, I assumed that's c1, c2, and c3, but it might be an arbitrary other set.

You might notice that Any is still at the top of the hierarchy. That's because in the compiler, Any and Any^ are aliased. So both Any^ and Any^{c1} are still just an Any.

Why does withFile(identity) fail?

Equipped with the knowledge of the type hierarchy behind capture checking, let's now try to understand why withFile failed in our first example. As a reminder, here's the definition and problematic usage:

def withFile[T](name: String)(op: InputStream^ => T): T =
  val f = new FileInputStream(name)
  try op(f)
  finally f.close()

withFile("data.txt")(identity)

The type of f will be inferred to be FileInputStream. However, we need to provide an InputStream^ to op. Luckily, using our type hierarchy rules, we can widen the type as:

FileInputStream = FileInputStream^{} <: FileInputStream^ <: InputStream^

Also remember that since the InputStream^ is a parameter of the op function, in fact it means that it takes a capability with some unknown capture set. We can rewrite this (now using imaginary syntax) as a capability-polymorphic function:

def withFile[T](name: String)(op: [C^] => InputStream^{C} => T): T =
  val f = new FileInputStream(name)
  try op(f)
  finally f.close()

withFile("data.txt")(identity)

Now, when op is identity, as in the provided function, the T is inferred to be InputStream^{C} as well, for some unknown capture set C. However, the scope in which this C is visible is restricted to the invocation of op.

Through a mechanism called avoidance, capture sets might be widened to ones that are visible from the outside (if you are interested, it's covered in slightly more detail in the docs). However, here we have a set of unknown capabilities; hence, we cannot widen it, as we don't know what to widen! This triggers the error.

Mapping this to the error message that we got:

[error] ./main3.scala:17:24
[error] Found:    (x: java.io.InputStream^'s1) ->'s2 java.io.InputStream^'s3
[error] Required: java.io.InputStream^ => java.io.InputStream^'s4
[error]
[error] Note that capability cap cannot be included in outer capture set 's4.
[error]
[error] where:    =>  refers to a fresh root capability created in method main3 when checking argument to parameter op of method withFile
[error]           ^   refers to the universal root capability
[error]           cap is a root capability associated with the result type of (x: java.io.InputStream^): java.io.InputStream^'s4
[error]   withFile("data.txt")(identity)
[error]                        ^^^^^^^^

The "cap is a root capability associated with the result type" is our C, and "Note that capability cap cannot be included in outer capture set 's4." refers to the fact that C cannot escape its scope.

Guarantees

Capture checking enhances the compiler, providing more compile-time guarantees. It might be worth recapping what these guarantees exactly are.

When you see a method with a signature:

def deserialize(parser: JsonParser^): MyCaseClass

It's guaranteed, that the result (here, MyCaseClass), does not capture (store a reference to) the capability parser. If it did, the type would have to be at least MyCaseClass^{parser}.

If we have:

def createParser(is: InputStream^): JsonParser^{is}

We know that the returned value might capture the is capability. Still, it's guaranteed that it doesn't capture any other capability (that is currently in scope). is itself might capture other capabilities (since it is a capability), but which exactly is unknown to the method.

Tying capture-checked capabilities with given-capabilities

How does this tie in with effects and tracking when they are used? Let's say we have two "capabilities" (here, the term is used in the general sense, not the one discussed above, but soon they'll be united) that we want to track: network usage and filesystem usage.

Both of these can be accessed through instances of the Network and FileSystem classes. Using the proposed Scala model, these would be passed implicitly as given parameters to methods, which need them. For example:

def sendRequest(url: String, body: String)(using Network) = …
def writeToFile(path: Path, content: String)(using FileSystem) = …

The dependency on such a capability can also be expressed using context functions, as recently covered in another article by Nicolas.

Callers of sendRequest or writeToFile will, of course, need to somehow provide these dependencies, presumably from their implicit scope. And this chain might continue all the way up to main. So far, we don't need anything new from the Scala compiler: that's already working since Scala 3.0.

However, we probably do want to track the retention of these dependencies. Having a method with a signature:

def lookupUser(id: String)(using network: Network^): User^{network}

not only informs us that the implementation of lookupUser uses the network, but also that that capability (now we use the word in both the general and capture-checking senses) is retained. Maybe parts of the User are lazily loaded? Or, maybe there's a bug in our implementation.

Of course, to make this work, network must be a tracked capability beforehand, as witnessed by the ^ next to the type declaration. But don't worry if you forget the ^: as long as at the root (in main), the instance that you provide is a tracked capability, the compiler will happily inform you that you have incompatible types, and you'll have to add ^ in the appropriate places.

If you look at the second example with Ox, it uses this mechanism precisely. An Ox instance is an implicitly-passed capability that allows you to start new threads. And thanks to capture checking, we are certain it won't be captured incorrectly.

Function types

Function types deserve special attention, as I think they are the only ones where the meaning of the type changes with the introduction of capture checking.

That's because the function type (=>) now means a function that can capture an unknown set of capabilities. If we have val f: Int => String = …, that's a closure: a function which captures whatever values it uses from the surrounding environment. Hence, such a function can capture an unknown, arbitrary set of capabilities. Which makes it a capability itself! Every => function is tracked.

There are also untracked functions: they're called pure and are written A -> B. If you have a g: Int -> String = …, it's guaranteed that g does not capture anything. Of course, the usage of this type will be verified by the compiler.

Hence, A => B is the same as (A -> B)^ and can be written as A ->^ B.

Why this change? Well, it turns out that this design minimizes the number of changes needed to Scala's codebase (and hopefully to our codebases as well!) to leverage capture checking while ensuring correctness. Think about List.map and Iterator.map, and which one retains the argument, and which doesn't. There's more on this in the reference documentation.

Note that the -> only applies in the context of types. Its meaning on the value level, as a pair constructor, remains the same. So we have:

val f: String -> Int = s => s.length
val p: (String, Int) = "hello" -> 42

In practice, this means that if you have any class that captures a function, it will either become a capability itself, or you'll have to change the captured function type to be pure. For example:

// captures `f` which itself captures some (unknown) set of capabilities
case class Container(f: Int => Int)

// correct: f is in the capture set of c
val f: Int => Int = x => x + 1
val c: Container^{f} = Container(f)

// correct: the type of the provided function is inferred to be pure
val c2: Container = Container(x => x + 1)

// also correct: explicitly providing a pure function
val g: Int -> Int = x => x + 1
val c3: Container = Container(g)

// incorrect: the type of c4 should be Container^, and can't be more 
// specific, as the capability is anonymous
val c4: Container = Container((x => x + 1): Int => Int)

We've just touched the tip of the iceberg

There's more to the capture checker than we covered in this introduction. First of all, captures are transitive: if we know that v1 captures v2, and v2 captures v3, then v1 also captures v3. This further enhances the subtyping relationship.

Moreover, some types can always be considered a capability (when they extend SharedCapability). Also, capabilities in capture sets can be filtered by their classifier, e.g., as Control or Mutable.

These more advanced concepts are well covered in the documentation mentioned before.

Rust in Scala?

We're still not done! Given the foundation provided by the capture checker, the Scala team is now working on using it to implement separation checking. This will allow tracking the use of mutable values and the aliasing of references. Does it ring a bell? That's one of the headline features of Rust's borrow checker!

We've already seen how resource usage might be made safe, and separation checking might allow eliminating another category of bugs: unsafe sharing of mutable state between threads, which might lead to memory race conditions. So far this has been exclusive to the Rust compiler. Still, now we might get the same in Scala (while keeping automatic memory management, and avoiding the complications of the borrow checker).

Scala once again enhances the set of safety properties that can be practically checked by the compiler!

Try capture checking today

Scala 3.8 comes with an improved capture checker (still experimental) and a fully capture-checked standard library that uses all of the features mentioned above.

As capture checking is opt-in, by default, the changes in the standard library have no effect. But when you explicitly enable capture checking with an import:

import language.experimental.captureChecking

You can leverage the work done by the Scala team and enjoy the new safety benefits.

Reviewed by Jacek Bizub.

Blog Comments powered by Disqus.