Trying out Unison, part 4: from the edge to the cloud
In the previous installments of the "Trying out Unison" series, we've first explored its core feature, content-addressed code, thanks to which a function's identity is determined by what it does, not by its name (names are just labels). Then, we've examined how this impacts how you might organize your code and manage dependencies with the help of namespaces. Finally, we've looked at Unison's abilities, an algebraic effect system that provides features such as error handling or dependency injection.
These traits alone make Unison a language that improves on the status quo of mainstream programming languages: trivial rename refactorings, no dependency hell, and constraining effectful computations without sacrificing readability, to name just a few. Unison is different but simultaneously focuses on "programmer experience", which makes it an exciting subject to study.
However, Unison also has a "one more thing": Unison Cloud. Although not open-source and available only in a closed beta, the features we've just mentioned might make distributed programming way more approachable than currently and really fun to program!
And not only that: we can expect performance benefits as well, thanks to the "bring computation to the data" model, as opposed to "bring data to the computation", often seen in the wild.
Is that the cloud we were promised?
I've definitely been thinking about "the cloud we were promised" a lot lately. A few things come to mind
— Danny Hermes (@bossylobster) October 7, 2022
(A) Why can't I just run my code? (PaaS)
(B) Why is principal of least privilege so hard? (IAM)
(C) Why do I need 10+ daemon sets to keep my k8s nodes healthy? https://t.co/Kg74mqXKVq
Going remote
Unison's support for distributed and, more generally, networked computing comes in three parts. First is the user-facing programming interface, contained in the distributed namespace. The core abstraction is the Remote
ability, which allows starting computations on remote nodes and awaiting their results.
The second part is a Unison Cloud client, which, apart from configuring access to the cloud service, provides the run
function. That function handles the Remote
ability, interpreting it in terms of an IO
+ Exception
, which can be handled locally:
run : '{Remote, Http, Atomic, Channels, Scratch} a ->{IO, Exception} a
The final component is the implementation of Unison Cloud server: time-sharing, communication, maintaining cluster state, etc. The client and server components contain serialization and networking logic: all the boring but necessary things you would prefer not to worry about. The server is the part we can't use yet—but we can take a peek as to what to expect.
As a Unison distributed systems programmer, the most basic operation you'll encounter is forking a computation to happen on a remote node using forkAt
:
forkAt : Location g -> '{g, Remote, Exception} t ->{Remote} Task t
This function takes a Location
—an abstract representation of a single server or a set of servers (a region). That's where the computation will eventually be run.
The second parameter is the computation itself. It will be sent over to the remote node using the same mechanism we've seen before: anytime we're forking a library using Unison Share, Unison code is shipped over the network. This process uses function hashes and, more generally, the content-address code concept.
The remote node needs to obtain a function with the hash corresponding to whatever is provided as an argument to forkAt
, along with any dependencies (again, same as during forking). This might involve multiple network round trips, as the target server might already have the entire code graph locally or only parts of it.
The computation can itself use the Remote
ability (to communicate with other nodes), throw exceptions, and use any of the abilities supported by the location (more on this later). The above is encoded in the '{g, Remote, Exception} t
type. The '
quoting here is crucial: it means that this is a lazily evaluated code fragment, or after desugaring, simply a () ->{g, Remote, Exception} t
thunk.
Also, note that there's no danger of conflicts, eviction errors, or name clashes, as we're operating on the level of hashes. The same supervisor can run code that uses different versions of the base Unison library, and it doesn't care.
A task, but not a monad
As a result of forkAt
invocation, we get a computation that uses the Remote
ability, and returns a Task
, which represents a running computation on some (remote or local) node.
Task
might look familiar to a Future
or Promise
known from other languages. In some aspects, that's true. For example, you can create a new task using Remote.empty!
and later complete it with a value or error, using Remote.tryComplete
.
However, as Unison offers lightweight threads and programming in the direct style (instead of the monadic one), you won't see any flatMap
s, or other operations which allow chaining multiple tasks together. Instead, you can await
on a task, and when the result is there, use it to compute the next value:
await : Task a ->{Remote} a
Where's my task
An important aspect of distributed programming in Unison is that we can precisely control where computations happen. And while the general theme is to "bring the computation to the data", instead of "bringing the data to the computation", it's still possible (without much effort) to do things not as efficiently as possible.
Let's take a look at a simple example, right from the introductory docs to the distributed
namespace:
example1 loc =
t1 = forkAt loc '(1 + 1)
t2 = forkAt loc '(2 + 2)
await t1 + await t2
The code above will run two computations at the given location loc
(most probably this will happen in parallel), and then await
for the result of both. Note that the await is done locally—that is, on the node where the example1
function is being run. Hence, the results of both t1
and t2
tasks are shipped from the remote loc
to the calling location. Here this data and the reduction of results (+
) is trivial, so it's not a problem, but in general, it might be an issue.
Let's look at a slightly modified example that tries to minimize the amount of data shipped between nodes:
example2 loc =
use Nat +
t1 = forkAt loc '(1 + 1)
t2 = forkAt loc '(2 + 2)
t3 = forkAt (task.location t1) '(await t1 + await t2)
await t3
You might wonder why we fork t3
at task.location t1
—which corresponds to wherever t1
was being run—instead of just using loc
. loc
might be a general region where computations might be forked instead of a single specific node. In such a case, we make sure that t3
runs at least on the same node as t1
. However, t2
might be a different one. Either way, the amount of data shipping between nodes is decreased.
It's all about what's not there
While there's no magic—specifying where each part of your computation should take place and how the results are combined might be essential complexity of the distributed programming problem domain—it's easy to overlook what's absent in the above snippets of code.
Note that we've just passed it to forkAt
to run a function on a remote node. And to obtain its result, we've just called await
. No serialization, hard choices between JSON or Protobuf (these are made for us), no establishing connections, etc. At the same time, a remote invocation is still distinct from a local one, as it requires the Remote
ability. Hence we're not falling for the RPC fallacy. However, we're probably as close in making a remote call behave as a local one as possible, while retaining safety. Most (if not all) of the accidental complexity is gone.
I know what you're thinking: errors might happen, but these can be handled appropriately. Yes, you can add retries by introducing an ability handler for Remote
which yields another Remote
computation (but this time with network retries enabled, see Remote.retrying
). Similarly, you can add timeouts, bulkheading, circuit breakers, etc.
And just think about that: making a rolling upgrade is simply using a new function hash in a service that's exposed to the outside world—no need for redeploying binaries across our server farms. Unison takes care of propagating the appropriate function hashes on the first invocation. And the old ones, which are already in progress, run unchanged.
Spark-like datasets
Unison offers a couple of higher-level constructs, building on top of Remote
and Task
. One of them is Seq
—a distributed, lazily-evaluated data structure. It can represent data sets spanning many nodes, with the total size of the data exceeding the memory on any one node.
There's a great series of blog posts on how this data structure is designed, so I'll just recommend reading through it!
Scoped locations
An important note is that not all locations are equal. Instead, they are parameterized with the abilities that code running at that location might use.
For example, in the current (closed beta) Unison Cloud, we can run computations with the following abilities:
{Remote, Http, Atomic, Channels, Scratch}
This means, for example, that you can't do arbitrary I/O, as the IO
ability is missing. However, we can communicate with other nodes in the cloud, perform HTTP requests, store some mutable state in memory, communicate using channels, and memoize requests in a cache.
You can do quite a lot given these abilities, while the platform can ensure security and full isolation of the functions that it's running.
And, of course, we might have other locations providing access to another set of abilities. These might give extra power, but also possibly additional cost.
The Unison computer
All of the above makes nodes in a Unison Cloud deployment behave like one, large computer. Too good to be true? Largely—yes. Unison is only available as a closed beta, and we all know that the devil lies in the details.
Distributed programming with Unison is elegant and programmer-friendly: it's the kind of computing you want to do. But then, you'll have to integrate with the outside world. And making sure all these integrations are in place will be a lot of work.
An important thing to keep in mind: Unison is a closed, gated system. It's so elegant partly because everything is written in Unison—which is also a weakness, as everything has to be written in Unison. So all of your favorite utility libraries will need a Unison version.
However, Unison might be ready to tackle that problem. At some point, there will be a need for a Foreign Function Interface (FFI) so that we can call e.g., a native numerical algorithm with some data available locally on a node. An ability with a dedicated handler is ideal for such integration.
Cloud at the edge
Apart from "the cloud", there's another trend: edge computing. This means bringing the computations close to the user—either on the user's device (such as a smartphone) or somewhere near the user (e.g., a cell phone base station or an edge server of a content delivery network). If you are interested in edge computing, you've probably heard about "bringing the computation to the data": exactly the concept that is implemented in Unison Cloud.
And that's no coincidence, Unison and its model is especially well suited for this use case. Computations can be easily shipped to the edge—it's the same mechanism once again—all we need is the hash of the function to run, and some peers from which the function might be requested. Moreover, Unison computations are sandboxed. This is similar to what WebAssembly offers, where it's considered a major security feature.
The sandboxing is implemented using Unison's abilities. Suppose we specify that a particular edge location can only run functions using specific abilities. In that case, we can be sure that no other operations are run (as long as there are no unsafe "backdoors" available from within the Unison language—so far, this holds).
And most importantly, the whole process is very lightweight and transparent to the programmer. We only send the code (hashes) necessary to run a given function: no less and not more! Moreover, we have access to a functional, garbage-collected programming language with direct syntax for effectful computations. Looks promising!
What's next?
This concludes our mini-series on Unison. While we are waiting for an open beta of Unison cloud, I can only recommend getting familiar with the language in its single-node version. It has undoubtedly been a very educational and inspiring experience.
I'm looking forward to Unison gaining more traction, and I hope to write some open-source Unison in the future!