Trying out Unison, part 4: from the edge to the cloud

05 Jan 2023. 10 minutes read

Trying out Unison, part 4: from the edge to the cloud featured image

In the previous installments of the "Trying out Unison" series, we've first explored its core feature, content-addressed code, thanks to which a function's identity is determined by what it does, not by its name (names are just labels). Then, we've examined how this impacts how you might organize your code and manage dependencies with the help of namespaces. Finally, we've looked at Unison's abilities, an algebraic effect system that provides features such as error handling or dependency injection.

These traits alone make Unison a language that improves on the status quo of mainstream programming languages: trivial rename refactorings, no dependency hell, and constraining effectful computations without sacrificing readability, to name just a few. Unison is different but simultaneously focuses on "programmer experience", which makes it an exciting subject to study.

However, Unison also has a "one more thing": Unison Cloud. Although not open-source and available only in a closed beta, the features we've just mentioned might make distributed programming way more approachable than currently and really fun to program!

And not only that: we can expect performance benefits as well, thanks to the "bring computation to the data" model, as opposed to "bring data to the computation", often seen in the wild.

Is that the cloud we were promised?

I've definitely been thinking about "the cloud we were promised" a lot lately. A few things come to mind

(A) Why can't I just run my code? (PaaS)
(B) Why is principal of least privilege so hard? (IAM)
(C) Why do I need 10+ daemon sets to keep my k8s nodes healthy? https://t.co/Kg74mqXKVq
— Danny Hermes (@bossylobster) October 7, 2022

Going remote

Unison's support for distributed and, more generally, networked computing comes in three parts. First is the user-facing programming interface, contained in the distributed namespace. The core abstraction is the Remote ability, which allows starting computations on remote nodes and awaiting their results.

The second part is a Unison Cloud client, which, apart from configuring access to the cloud service, provides the run function. That function handles the Remote ability, interpreting it in terms of an IO + Exception, which can be handled locally:

run : '{Remote, Http, Atomic, Channels, Scratch} a ->{IO, Exception} a

The final component is the implementation of Unison Cloud server: time-sharing, communication, maintaining cluster state, etc. The client and server components contain serialization and networking logic: all the boring but necessary things you would prefer not to worry about. The server is the part we can't use yet—but we can take a peek as to what to expect.

As a Unison distributed systems programmer, the most basic operation you'll encounter is forking a computation to happen on a remote node using forkAt:

forkAt : Location g -> '{g, Remote, Exception} t ->{Remote} Task t

This function takes a Location—an abstract representation of a single server or a set of servers (a region). That's where the computation will eventually be run.

The second parameter is the computation itself. It will be sent over to the remote node using the same mechanism we've seen before: anytime we're forking a library using Unison Share, Unison code is shipped over the network. This process uses function hashes and, more generally, the content-address code concept.

The remote node needs to obtain a function with the hash corresponding to whatever is provided as an argument to forkAt, along with any dependencies (again, same as during forking). This might involve multiple network round trips, as the target server might already have the entire code graph locally or only parts of it.

The computation can itself use the Remote ability (to communicate with other nodes), throw exceptions, and use any of the abilities supported by the location (more on this later). The above is encoded in the '{g, Remote, Exception} t type. The ' quoting here is crucial: it means that this is a lazily evaluated code fragment, or after desugaring, simply a () ->{g, Remote, Exception} t thunk.

Also, note that there's no danger of conflicts, eviction errors, or name clashes, as we're operating on the level of hashes. The same supervisor can run code that uses different versions of the base Unison library, and it doesn't care.

A task, but not a monad

As a result of forkAt invocation, we get a computation that uses the Remote ability, and returns a Task, which represents a running computation on some (remote or local) node.

Task might look familiar to a Future or Promise known from other languages. In some aspects, that's true. For example, you can create a new task using Remote.empty! and later complete it with a value or error, using Remote.tryComplete.

However, as Unison offers lightweight threads and programming in the direct style (instead of the monadic one), you won't see any flatMaps, or other operations which allow chaining multiple tasks together. Instead, you can await on a task, and when the result is there, use it to compute the next value:

await : Task a ->{Remote} a

Where's my task

An important aspect of distributed programming in Unison is that we can precisely control where computations happen. And while the general theme is to "bring the computation to the data", instead of "bringing the data to the computation", it's still possible (without much effort) to do things not as efficiently as possible.

Let's take a look at a simple example, right from the introductory docs to the distributed namespace:

example1 loc =
  t1 = forkAt loc '(1 + 1)
  t2 = forkAt loc '(2 + 2)
  await t1 + await t2

The code above will run two computations at the given location loc (most probably this will happen in parallel), and then await for the result of both. Note that the await is done locally—that is, on the node where the example1 function is being run. Hence, the results of both t1 and t2 tasks are shipped from the remote loc to the calling location. Here this data and the reduction of results (+) is trivial, so it's not a problem, but in general, it might be an issue.

Let's look at a slightly modified example that tries to minimize the amount of data shipped between nodes:

example2 loc =
  use Nat +
  t1 = forkAt loc '(1 + 1)
  t2 = forkAt loc '(2 + 2)
  t3 = forkAt (task.location t1) '(await t1 + await t2)
  await t3

You might wonder why we fork t3 at task.location t1—which corresponds to wherever t1 was being run—instead of just using loc. loc might be a general region where computations might be forked instead of a single specific node. In such a case, we make sure that t3 runs at least on the same node as t1. However, t2 might be a different one. Either way, the amount of data shipping between nodes is decreased.

It's all about what's not there

While there's no magic—specifying where each part of your computation should take place and how the results are combined might be essential complexity of the distributed programming problem domain—it's easy to overlook what's absent in the above snippets of code.

Note that we've just passed it to forkAt to run a function on a remote node. And to obtain its result, we've just called await. No serialization, hard choices between JSON or Protobuf (these are made for us), no establishing connections, etc. At the same time, a remote invocation is still distinct from a local one, as it requires the Remote ability. Hence we're not falling for the RPC fallacy. However, we're probably as close in making a remote call behave as a local one as possible, while retaining safety. Most (if not all) of the accidental complexity is gone.

I know what you're thinking: errors might happen, but these can be handled appropriately. Yes, you can add retries by introducing an ability handler for Remote which yields another Remote computation (but this time with network retries enabled, see Remote.retrying). Similarly, you can add timeouts, bulkheading, circuit breakers, etc.

And just think about that: making a rolling upgrade is simply using a new function hash in a service that's exposed to the outside world—no need for redeploying binaries across our server farms. Unison takes care of propagating the appropriate function hashes on the first invocation. And the old ones, which are already in progress, run unchanged.

Spark-like datasets

Unison offers a couple of higher-level constructs, building on top of Remote and Task. One of them is Seq—a distributed, lazily-evaluated data structure. It can represent data sets spanning many nodes, with the total size of the data exceeding the memory on any one node.

There's a great series of blog posts on how this data structure is designed, so I'll just recommend reading through it!

Scoped locations

An important note is that not all locations are equal. Instead, they are parameterized with the abilities that code running at that location might use.

For example, in the current (closed beta) Unison Cloud, we can run computations with the following abilities:

{Remote, Http, Atomic, Channels, Scratch}

This means, for example, that you can't do arbitrary I/O, as the IO ability is missing. However, we can communicate with other nodes in the cloud, perform HTTP requests, store some mutable state in memory, communicate using channels, and memoize requests in a cache.

You can do quite a lot given these abilities, while the platform can ensure security and full isolation of the functions that it's running.

And, of course, we might have other locations providing access to another set of abilities. These might give extra power, but also possibly additional cost.

The Unison computer

All of the above makes nodes in a Unison Cloud deployment behave like one, large computer. Too good to be true? Largely—yes. Unison is only available as a closed beta, and we all know that the devil lies in the details.

Distributed programming with Unison is elegant and programmer-friendly: it's the kind of computing you want to do. But then, you'll have to integrate with the outside world. And making sure all these integrations are in place will be a lot of work.

An important thing to keep in mind: Unison is a closed, gated system. It's so elegant partly because everything is written in Unison—which is also a weakness, as everything has to be written in Unison. So all of your favorite utility libraries will need a Unison version.

However, Unison might be ready to tackle that problem. At some point, there will be a need for a Foreign Function Interface (FFI) so that we can call e.g., a native numerical algorithm with some data available locally on a node. An ability with a dedicated handler is ideal for such integration.

Cloud at the edge

Apart from "the cloud", there's another trend: edge computing. This means bringing the computations close to the user—either on the user's device (such as a smartphone) or somewhere near the user (e.g., a cell phone base station or an edge server of a content delivery network). If you are interested in edge computing, you've probably heard about "bringing the computation to the data": exactly the concept that is implemented in Unison Cloud.

And that's no coincidence, Unison and its model is especially well suited for this use case. Computations can be easily shipped to the edge—it's the same mechanism once again—all we need is the hash of the function to run, and some peers from which the function might be requested. Moreover, Unison computations are sandboxed. This is similar to what WebAssembly offers, where it's considered a major security feature.

The sandboxing is implemented using Unison's abilities. Suppose we specify that a particular edge location can only run functions using specific abilities. In that case, we can be sure that no other operations are run (as long as there are no unsafe "backdoors" available from within the Unison language—so far, this holds).

And most importantly, the whole process is very lightweight and transparent to the programmer. We only send the code (hashes) necessary to run a given function: no less and not more! Moreover, we have access to a functional, garbage-collected programming language with direct syntax for effectful computations. Looks promising!

What's next?

This concludes our mini-series on Unison. While we are waiting for an open beta of Unison cloud, I can only recommend getting familiar with the language in its single-node version. It has undoubtedly been a very educational and inspiring experience.

I'm looking forward to Unison gaining more traction, and I hope to write some open-source Unison in the future!

Services overview

Why Choose Us

How we work

Portfolio

Technologies

About us

Join Us

Open Source

Scalar Conference

Services overview

How we work

Technologies

About us

Open Source

Why Choose Us

Why Choose Us

Why Choose Us

Why Choose Us

Join Us

Scalar Conference

Scalar Conference

Scalar Conference

Scalar Conference

Contents

Contents

Trying out Unison, part 4: from the edge to the cloud

Going remote

A task, but not a monad

Where's my task

It's all about what's not there

Spark-like datasets

Scoped locations

The Unison computer

Cloud at the edge

What's next?