Contents

Trying out Unison, part 2: organising code

Trying out Unison, part 2: organising code webp image

Dependency hell, package managers, publishing, supply chain attacks—these are just a few of the problems that might be solved by Unison! Let's see how.

In the first part of the series surrounding Unison, we've covered the "big idea" behind it: content-address code. We've looked at how Unison stores code in a database, where each function is keyed and referenced by the hash of its abstract syntax tree.

Luckily, when coding, we don't have to use the hashes directly. Instead, we can assign names to hashes; a single hash can have many names. Names can be reassigned, implementing "instant refactoring", without the need to recompile anything.

Still, working with one huge and messy bag of names would be rather hard, and probably not too pleasant. Not to mention cooperating with others. That's why Unison also has namespaces, using which our code can be neatly organised.

If you find the following interesting, take a look at the remaining parts of the series: part 3: effects through abilities and part 4: from the edge, to the cloud.

Names on a tree

Namespaces form a tree-like structure that we all know so well e.g. from the filesystem or, if you're a bit more modern, from Google Drive or similar. The easiest way to browse namespaces is using the UI, which we've briefly introduced in the last article as well. Jumping ahead a bit, you can browse the code created by the Unison team on Unison Share.

While the UI might be the easiest option, the simplest one is of course using Unison's REPL-like tool, ucm! That's where you'll create namespaces, change them, move code around, merge, fork, etc.

In fact, if you followed Unison's quickstart, you've probably created your first namespace already. There's not much to do, a simple cd will change the focus (absolute references start with .):

.> cd myapp

  ☝️  The namespace .myapp is empty.

.myapp>

As always, ucm tries to be helpful, telling us that we've entered a fresh namespace.

Adding a library

We have complete freedom as to the structure of the namespaces; however, Unison recommends a set of conventions that will ensure that our codebase is consistent with other Unison codebases out there.
The namespace structure for a simple application might look as follows:

myapp
  main
    (all the definitions of the development version of the app)
    lib
      base
      http
  prs
    aFeature
    aBug
  releases
    v1
    v2

Our code we develop on a day-to-day basis lives in myapp.main. That's also where we store all of our dependencies, in the myapp.main.lib namespace. The dependency that each project will most probably need is base—Unison's standard library.

The first step in each project is forking the base library into your project:

.myapp> fork .base lib.base

  Done.

It's not a problem if you have multiple applications in your code database, each using a different version of Unison's std lib. As each lives in its own namespace, it has its own name -> hash mappings. Updating the std lib in a namespace is a local operation—it doesn't influence any other namespaces.

Any other external dependencies also go into lib. We can either fork locally available namespaces into lib, or pull in remote ones:

.myapp> pull stew.public.projects.http.releases.v1 lib.http

  Downloaded 6208 entities.

  ✅

  The destination lib.http was empty, and was replaced instead of being merged.

Breaking out of library hell

You don't have to worry about the dependencies of your dependencies—they won't get in conflict with what you're using—something we all know too well from other programming languages as dependency hell.

All functions that are used by a library are referenced by hash. So nothing will get evicted, overwritten, replaced or otherwise silently broken just by adding a library as a dependency. The dependencies come with their own namespace and their own lib sub-namespaces, but that doesn't replace in any way the name mappings that you have for your application.

Keep in mind that the code database is still a global, single, giant map hash -> AST. However, using namespaces, we have local mappings of name -> hash, which are independent of one another. The same name might mean something totally different in different apps (namespaces). And different names in different apps might resolve to the same hash.

functions bag

Of course, if you want to pass data structures from one library to another, they must match. The essential complexity of data type migration is still here; we're only getting rid of the accidental complexity. However, here Unison also has some answers with the help of structural types.

Resolving names

That might sound trivial, but it's actually very important. When writing a function in the scratch file and referencing some other definitions by their name, how does Unison know what we have in mind?

That depends on the namespace we are currently in: Unison will try to resolve the names using what's available (recursively) in the current namespace. Locally defined names take precedence over names from libraries.

Additionally, the std lib is just another library. Hence, if we don't fork base, we won't even be able to use numbers!

It all works quite intuitively, but it's still good to know so that you can properly build your mental model of how Unison operates.

Publishing a library

If you ever wanted to publish a Java (or Scala) library so that others can use it, you know it's a painful and slow process. And it's not even that the publishing itself is slow (it is), but the whole ceremony of setting up the build tool, getting the right access rights, waiting for synchronisation to Maven Central, etc. makes you reconsider your career choices a couple of times.

With Unison, there's not really much to talk about. You just push a namespace and it ends up on your public Unison Share account. Unison sends the necessary hashes and name mappings:

.example1> push.create adamw.public.example1

  Uploaded 34 entities.

  View it on Unison Share: https://share.unison-lang.org/@adamw/code/latest/namespaces/public/example1

Others can pull the code right away. What about cutting releases? The process amounts to forking your main namespace into e.g. releases.v12 and pushing that. Being a maintainer of a couple of open-source Scala projects, this really is depressingly simple.

Updating a library

Let's do a simple case study of how a nested dependency can be updated. Let's say we have an app, headbook, which depends on a library, leftpad, which in turn depends on a logging library, log4u.

dependency chain

Completely hypothetically, suppose it turns out that log4u v1 has a bug—it allows remote code execution. We need to update all of our systems as fast as possible. However, adding the patched version of log4u as a library to our application won't solve the problem—we've just added some updated name -> hash mappings to our namespace, while leftpad is still referencing and using the hashes from the old log4u version.

You can find both versions of log4u, leftpad, and the headbook code on Unison Share. You might be disappointed that leftpad doesn't really do any padding, but it's an MVP, so we'll leave this detail as an exercise for the reader.

Now, if you pull headbook's code and try to run it, you'll see that it uses the vulnerable v1 version of log4u, which simulates some unpleasant side-effects:

.headbook.main> run openHeadbook
[INFO] Left padding Like me! (running `rm -rf /` now)

-- Adding the fixed `log4u` (v2) doesn't change anything
.headbook.main> fork .log4u.releases.v2 lib.log4u
.headbook.main> run openHeadbook
[INFO] Left padding Like me! (running `rm -rf /` now)

What we have to do is update all usages of log to the fixed version, recursively, in the entire namespace (including libraries). This is possible thanks to patches—a Unison feature that we haven't encountered yet.

Each time you update a definition in a namespace, the patch file is updated as well. The patch file is a series of mappings specifying which old hashes have been replaced by new hashes. When releasing a library (which as we've seen, amounts to a fork), the patch file is forked alongside all other definitions. After a release, you should remove the patch file from main to start accumulating new differences:

.headbook.main> view.patch lib.log4u.patch

  Edited Terms: 1. leftpad.lib.log4u.log -> 2. lib.log4u.log

Such a patch can be recursively applied to a namespace, updating the definitions as required and fixing the vulnerability:

.headbook.main> patch lib.log4u.patch
.headbook.main> run openHeadbook
[INFO] Left padding Like me!

Until next time

Unison provides a simple code organisation mechanism: namespaces, but it ends up being quite a powerful tool. There's no upfront structure (at least so far), which can be seen both as an advantage and a problem; for example, you can end up forking the whole project namespace instead of a single release quite easily. Luckily, all of this is reversible.

We've seen that organising code in an AST-hash-keyed database can have some surprising consequences: no more dependency hell and complicated eviction rules. Moreover, with the addition of tools such as Unison Share, we don't need package managers, and publishing your code is trivial.

Supply chain attacks are harder as well: you can't simply replace a version of a library, hoping that it will get used automatically by all dependent applications (as the hash would change). And even if one part of your program ends up using a compromised version, it won't automatically evict usages of the same library in other versions.

It's not all roses, of course. For example, Unison also has to re-invent the pull request flow, as the code is not text-based. There is a temporary solution using GitHub issues, but I'd suspect that we'll get an improved experience in Unison Share someday.

We also can't browse code on GitHub or other hosting platforms as we used to—as again, text is not the primary source of truth. However, we do get the local Unison UI and Unison Share, which fill some of those needs. Finally, there's also quite a lot missing on the IDE side—having e.g. in-editor auto-complete or go-to-definition is really convenient. Some of this is available through ucm, but requires explicit command line instructions.

However, these are all improvements that I'm sure Unison will gain as it matures. We still have some Unison features to cover, so until then, I hope you'll have fun exploring it on your own!

Blog Comments powered by Disqus.