Contents

Trying out Unison, part 1: code as hashes

Adam Warski

05 Sep 2022.6 minutes read

Trying out Unison, part 1: code as hashes webp image

Unison is an upcoming language & distributed runtime. It's functional, statically typed, and introduces some really interesting ideas, which make Unison different from "mainstream" languages.

If you're into Scala, you might have heard of Paul Chiusano and Rúnar Bjarnason, the co-founders of Unison Computing (a public benefit organisation). They're the authors of Functional Programming in Scala, which is one of the best introductions to modern FP.

The leading idea of Unison is that code is content-addressed. But what does it mean?

If you find the following interesting, take a look at the remaining parts of the series: part 2: organising code, part 3: effects through abilities and part 4: from the edge, to the cloud.

Content-addressed code

In all mainstream languages, code is stored as unstructured text. This text is then interpreted or compiled whenever we make any changes or want to run the program. Text is the main source of truth.

Unison is different: code is stored in a database ... somehow. We don't really have to know how exactly. Looking at the extension of the file in ~/.unison, it's using SQLite, but that's just a fun fact, not useful during development.

And it's not code in a textual form that is stored in the database. Instead, the database stores abstract syntax trees, or rather some representation of them. These trees are keyed by their hashes. That's where the "content addressing" comes from.

These hashes depend only on the structure of the code, not on the actual names used. For example, let's consider:

funnyAdd : Nat -> Nat -> Nat
funnyAdd x y = x + y + 1

The hash of that particular function is #g9l97dio. And it's the same as the hash of that function:

amusingAdd : Nat -> Nat -> Nat
amusingAdd a b = a + b + 1

Because the structure of the AST of both is the same. However, the function below has a different hash, and is considered to be different by Unison, even though it does the same thing (semantically, not syntactically, which is the key difference here):

comicalAdd : Nat -> Nat -> Nat
comicalAdd x y = y + x + 1

unison1

If you've encountered other functional languages, especially Haskell, Unison's syntax should be familiar. But even if you haven't, and you come from another corner of the programming world, you should be able to get up to speed quite quickly.

Names

Names are only labels associated with hashes. Each hash can have multiple labels at any time (as we did above, we created two names for the same hash), but you can also end up with hashes that don't have any names.

This makes the rename refactoring trivial: the only thing you change is the mapping between the hash to the label. No code, that is no syntax trees, is actually changed.

Writing code

If everything is in a database, how do you create and later edit the code? Everything happens through scratch files and the Unison Codebase Manager, ucm. The ucm is a REPL-like command line tool, using which you can query and modify your local code database as well as interact with remote code databases.

When you start ucm, it will monitor all *.u files in the current directory. Whenever you save such a file, it will be parsed and the ucm will either report syntax errors or suggest adding new or modified definitions to the codebase. You can also evaluate functions, run side-effecting code, and run tests, all from the ucm.

unison2

Editing is also done using scratch files, using edit [name] in the ucm. This command will render a textual representation of the function, using the names that are currently associated with the function's hash, into the scratch file. You can then go and edit the function and when you're done, save the scratch file. The changes are automatically picked up by ucm and the tool suggests adding them back to the database.

The Unison docs offer a really great "newcomer experience". Apart from the mandatory installation instructions, you'll find a quickstart, a tour, and a number of deep-dive topics that explain both what's innovative about Unison as well as the mandatory parts of each language (control and data structures). It's a young language, but others could certainly learn from the way Unison presents itself to the world!

What happens if the changes break the existing code? Let's say we have the following two definitions:

kiwi : Nat -> Nat
kiwi x = x * 2

orange : Nat -> Nat
orange x = kiwi (kiwi x)

Some time later, we ask the ucm to edit kiwi and change its definition so that it now requires an extra parameter:

kiwi : Nat -> Nat -> Nat
kiwi x a = x * a

If we save the file and update, we'll get our new definition into the code database and we can happily use our improved kiwi function. But wait … won't orange be broken now? After all, it used the kiwi variant with the single parameter.

Turns out that no code is broken. We've introduced a new function and gave it the kiwi label, but that doesn't remove the old function (both are stored as ASTs, with the label mappings on the side). However, the "old kiwi" lost its label, and it is now rendered as a nameless hash.

unison3

If this got you worried about refactoring, Unison has you covered as when updating a function (such as kiwi), it tracks its dependencies and determines the list of functions that might need an update. That's the case here. We just need to issue the todo command to see the list.

unison4

We can now ask ucm to edit 1 (the first and only element on our todo list) or edit orange and go fix the definition so that it works with the updated kiwi.

The downside is that you'll get the hashes in the scratch file:

orange : Nat -> Nat
orange x = #hmt4gnn927 (#hmt4gnn927 x)

That's a bit cryptic, and doesn't even compile if you try to re-add the same definition without changes, however, as I've seen in Unison's docs, that's one of the areas of the "developer experience" that they are working on now. We can now fix the orange definition. While we're at it, we can also add a watch expression using >, which will be evaluated by the ucm alongside our updated function whenever the scratch file is saved so that we can quickly verify whether things work correctly:

orange : Nat -> Nat
orange x = kiwi 2 (kiwi 3 x)

> orange 3

unison5

The UI

You've seen small cutouts above already, but Unison also offers another way of rendering the code that is stored in the database. By running ui from the ucm, we get a convenient, browsable and searchable representation of our code in our default browser. Just take a look:

unison6

Can YOUR language do that? We often get similar functionality in IDEs, but here it's baked into Unison's tooling. Speaking of IDEs: there are no dedicated Unison IDEs yet, but you can really use any text editor for the scratch files. There's syntax highlighting for VS Code, which provides the bare minimum to conveniently work with the language.

Next

Are we ready to let go of storing code as text files? It's hard to change people's habits. But then if you want to create a language that's really different, not just superficially different, some habits will need to be broken.

Content-addressed code is the "big idea" behind Unison, as its authors say. It has a lot of interesting implications, which I hope to cover in subsequent articles. We'll look at code organisation next, but before that, I hope Unison captured your interest and you're already running brew install unisonweb/unison/unison-language (or the equivalent for your platform) to try it out!

Blog Comments powered by Disqus.