Free and tagless compared - how not to commit to a monad too early
With reactive and functional programming becoming popular, quite often we see values wrapped in monads in our method signatures: be it Future
, Task
or DBIOAction
. Which monad to choose as "the" monad which we will predominantly use is often an important design decision. Maybe it's a good idea to defer this commitment as long as possible? Free monads and final tagless encoding offer a solution.
First we'll take a look at a simple method which implements some business logic using Future
values, and then we'll see how to refactor the code without commiting to a particular monad. We'll then further compare the "free" and "tagless" approaches looking at how to express higher-level operations using lower-level ones, and how to mix distinct operation sets.
TL;DR: If you are familiar with how free monads & the final-tagless encoding works, you can jump to the last section for some considerations on when to choose which approach and a summary.
Initial code & problem statement
Here's a very simple LoyaltyPoints
class with an addPoints
method which operates on Future
s, probably similar to what you've seen many times:
case class User(id: UUID, email: String, loyaltyPoints: Int)
trait UserRepository {
def findUser(id: UUID): Future[Option[User]]
def updateUser(u: User): Future[Unit]
}
class LoyaltyPoints(ur: UserRepository) {
def addPoints(userId: UUID, pointsToAdd: Int): Future[Either[String, Unit]] = {
ur.findUser(userId).flatMap {
case None => Future.successful(Left("User not found"))
case Some(user) =>
val updated = user.copy(loyaltyPoints = user.loyaltyPoints + pointsToAdd)
ur.updateUser(updated).map(_ => Right(()))
}
}
// other methods ...
}
The code looks fine at first sight - we look up a user; return an error description if there's no user with a given id; and modify the user adding some points if the user is found, updating the user in the repository. We suspect that the user repository implementation is asynchronous and side-effecting as the results are wrapped in a Future
.
However, do we really use any of Future
s features anywhere? Is there any reason why this code uses a Future
and not, let's say, Task
? No! The only things we need are: map
, flatMap
and unit
(which is called Future.successful
here). In other words, any monad would do.
Our code is too specialized; the intent of the method is blurred by using a specific result container in the signature. Looking at the signature only, we don't know if the implementation uses Future
-specific operations or not. Testing requires waiting on the future values for an arbitrary period of time. Finally, we are constraining other parts of the code to only use containers which are Future
-compatible.
In other words, we have mixed the description of a solution to this (quite trivial) business problem (looking up a user, adding points), and interpretation of how side-effects should execute.
How to fix this? We'll take a look at two popular solutions: free monads and final tagless interpreters.
Refactorings
Both of the solutions have one primary goal in common: creating a domain specific language (DSL) using which we'll express a solution to the business problem; and then defining (potentially multiple) interpreters for that language. In other words, separating problem description from interpratation.
We'll be using Cats to provide the basic "functional" infrastructure, that is an implementation of Monad
, Free
etc. Of course, using Scalaz would work equally well.
All of the code below is available in compilable form on GitHub.
Free monad
Free monads have recently gained a lot of interest and a lot of good introductions. See for example John de Goes's Modern Functional Programming or Free monads - what? and why? on this blog.
When describing a solution to a problem using the free monad, we first need to define a set of basic instructions, which are represented as data types (they form an ADT). In our case, this will be:
sealed trait UserRepositoryAlg[T]
case class FindUser(id: UUID) extends UserRepositoryAlg[Option[User]]
case class UpdateUser(u: User) extends UserRepositoryAlg[Unit]
import cats.free.Free
type UserRepository[T] = Free[UserRepositoryAlg, T]
def findUser(id: UUID): UserRepository[Option[User]] = Free.liftF(FindUser(id))
def updateUser(u: User): UserRepository[Unit] = Free.liftF(UpdateUser(u))
The type T
specifies the result of the operation. Free[UserRepositoryAlg, T]
enriches our basic instruction set with the possibility to return pure values and sequence operations; it also provides a monad instance so that we can e.g. use for
-comprehensions to combine the basic instructions.
We also define helper methods findUser
and updateUser
which lift "bare" instructions into the free monad context.
The description of a program which adds points for a user can be now be expressed as follows:
def addPoints(userId: UUID, pointsToAdd: Int):
UserRepository[Either[String, Unit]] = {
findUser(userId).flatMap {
case None => Free.pure(Left("User not found"))
case Some(user) =>
val updated = user.copy(loyaltyPoints = user.loyaltyPoints + pointsToAdd)
updateUser(updated).map(_ => Right(()))
}
}
Note that it's not that different from the original method. The crucial difference is, however, that we return a data structure - a value - which uses abstract instructions, without specifying in any way how to interpret those instructions. Looking at the signature, we know that this method returns a description of a program which results in a Either[String, Unit]
and uses the UserRepository
instruction set.
This program-as-data representation is a value as any other; which allows us to e.g. compose multiple values into bigger programs; run them conditionally; interpret them multiple times, or not at all.
How about actually running the code? To do that, we need to provide an interpreter (which is separate from problem description):
import cats.~>
import cats.implicits._
val futureInterpreter = new (UserRepositoryAlg ~> Future) {
override def apply[A](fa: UserRepositoryAlg[A]): Future[A] = fa match {
case FindUser(id) =>
/* go and talk to a database */
Future.successful(None)
case UpdateUser(u) =>
/* as above */
Future.successful(())
}
}
val result: Future[Either[String, Unit]] =
addPoints(UUID.randomUUID(), 10).foldMap(futureInterpreter)
We need to provide an interpretation for every instruction, resulting in the target monad - here Future
. We can then interpret the whole program using the foldMap
method, which gives us the result (Either[String, Unit]
), wrapped in our target.
An imporant note is that using the above free monad interpreter we get stack-safety: regardless of how deeply nested our flatmaps are, we won't get an exception due to too deeply nested recursive calls. This does not depend on stack safety of the target monad.
To sum up, using the free monad:
- programs are data; built from constructors of an ADT
- each operation is reified as a value
- we can pattern-match on the programs (which are values) to transform & optimize them
- interpretation is deferred until an interpreter for the basic instructions is provided
- interpretation is stack-safe
Final tagless
The final tagless approach attracts quite a lot of interest as well. It's best described in a paper by its author in Typed Tagless Final Interpreters: Lecture Notes.
Before we go any deeper, it's important to note that ultimately the two approaches are equal in expressive power: it's possible to easily transform one representation to another. Hence, these are different encodings of the same general idea. (If you are into category theory, both are initial in that sense.)
With final tagless, we define our basic instruction set (the "algebra") as a trait, parametrized by the resulting container:
trait UserRepositoryAlg[F[_]] {
def findUser(id: UUID): F[Option[User]]
def updateUser(u: User): F[Unit]
}
And in the description of the solution, we simply use these operations:
class LoyaltyPoints[F[_]: Monad](ur: UserRepositoryAlg[F]) {
def addPoints(userId: UUID, pointsToAdd: Int): F[Either[String, Unit]] = {
ur.findUser(userId).flatMap {
case None => implicitly[Monad[F]].pure(Left("User not found"))
case Some(user) =>
val updated = user.copy(loyaltyPoints = user.loyaltyPoints + pointsToAdd)
ur.updateUser(updated).map(_ => Right(()))
}
}
}
This is almost identical to the original code! The major difference is that instead of using Future
, we have parametrized both the UserRepository
and LoyaltyPoints
classes with the resulting container.
If you'll be looking at other examples of how final tagless is implemented in Scala, you'll notice some differences. Here, we only required F[_]
to be a monad in the use-site. Sometimes, this constraint is added already in UserRepositoryAlg
. Also, we pass in an implementation of UserRepositoryAlg
as an explicit parameter. In some examples, this will be an implicit value. Another difference you might notice is returning a function UserRepositoryAlg[F] => F[Either[String, Unit]]
instead of the wrapped result directly. But, the idea remains the same.
In this encoding, instead of building a data structure, the solution to our business problem is an expression in the base language (Scala), built out from functions, which form the basic instruction set. We don't reify the program as a value, but express the results in the target monad. However, we still keep the separation of problem description from interpretation, as F[_]
is abstract, and we only know that it's a Monad
.
And now to run the code:
trait FutureInterpreter extends UserRepositoryAlg[Future] {
override def findUser(id: UUID): Future[Option[User]] =
Future.successful(None) /* go and talk to a database */
override def updateUser(u: User): Future[Unit] =
Future.successful(()) /* as above */
}
val result: Future[Either[String, Unit]] =
new LoyaltyPoints(new FutureInterpreter {}).addPoints(UUID.randomUUID(), 10)
An interpreter is simply an implementation of the algebra trait; when passed to the LoyaltyPoints
class, we obtain a result in the target monad. Quite straightforward!
Unlike with the free monad, here stack safety depends on the stack safety of the target monad (in case of Future
we're fine). It's possible to make the final tagless encoding partially stack safe regardless of the target monad; see Adelbert Chang's investigations.
Both here and with the free monad approach, we can quite easily define interpreters dedicated for testing, which would interpret side-effects e.g. into the identity (Id
) monad, making the test code easier to read and reason about.
To sum up, final tagless:
- programs are expressions built from generic functions
- operations are done directly in the target monad
- stack safety depends on the target monad
- interpretation is done at the moment of expression construction
- pattern matching and optimization is possible, but harder to implement
- potentially less overhead as operations are not reified as values
As a side not - where does the name come from? "Tagless", as there's no need to create "tags", that is reify operations as values. "Final", as the interpretation is done in the target monad, not deferred. At least that's my understanding!
Combining languages
So far we've been using a single set of instructions - a single domain-specific language. What if we have multiple sets of such operations, of multiple languages? How to compose them into a single program?
Building on the previous example, we'll now send an e-mail after points have been added for a user. In the original, Future
-based version, that's how we could implement this:
case class User(id: UUID, email: String, loyaltyPoints: Int)
trait EmailService {
def sendEmail(email: String, subject: String, body: String): Future[Unit]
}
trait UserRepository {
def findUser(id: UUID): Future[Option[User]]
def updateUser(u: User): Future[Unit]
}
class LoyaltyPoints(ur: UserRepository, es: EmailService) {
def addPoints(userId: UUID, pointsToAdd: Int): Future[Either[String, Unit]] = {
ur.findUser(userId).flatMap {
case None => Future.successful(Left("User not found"))
case Some(user) =>
val updated = user.copy(loyaltyPoints = user.loyaltyPoints + pointsToAdd)
for {
_ <- ur.updateUser(updated)
_ <- es.sendEmail(user.email, "Points added!",
s"You now have ${updated.loyaltyPoints}")
} yield Right(())
}
}
}
Notice that we now have two sets of basic operations: one operating on users (UserRepository
), and one operating on emails (EmailService
). We'll treat those as two instruction sets, using which we'll be building the business logic (LoyaltyPoints.addPoints
).
Combining using Free
As before, the basic instructions are represented as data (ADTs). We have two case class families:
sealed trait UserRepositoryAlg[T]
case class FindUser(id: UUID) extends UserRepositoryAlg[Option[User]]
case class UpdateUser(u: User) extends UserRepositoryAlg[Unit]
sealed trait EmailAlg[T]
case class SendEmail(email: String, subject: String, body: String)
extends EmailAlg[Unit]
If we want to build programs which use both languages, we need to combine these instructions sets into one. We can do this with a Coproduct
, which is Scala's Either
, but for containers (types of the form F[_]
):
type UserAndEmailAlg[T] = Coproduct[UserRepositoryAlg, EmailAlg, T]
The coproduct is yet another ADT which wraps one of two ADTs: either a user instruction on the left, or an email instruction on the right. The programs which we build will now have the type Free[UserAndEmailAlg, T]
. To wrap a basic instruction, e.g. FindUser
, into the target Free[UserAndEmailAlg, T]
type, we now not only need to wrap it in Free.liftF
, but also in Coproduct.leftc
. It would be quite tedious to do this by hand, especially if we'd combine a larger number of languages. That's where the Inject
typeclass comes into play; it automates the embedding of a combined language into the target type:
class Users[F[_]](implicit i: Inject[UserRepositoryAlg, F]) {
def findUser(id: UUID): Free[F, Option[User]] = Free.inject(FindUser(id))
def updateUser(u: User): Free[F, Unit] = Free.inject(UpdateUser(u))
}
object Users {
implicit def users[F[_]](implicit i: Inject[UserRepositoryAlg, F]): Users[F] =
new Users
}
class Emails[F[_]](implicit i: Inject[EmailAlg, F]) {
def sendEmail(email: String, subject: String, body: String): Free[F, Unit] =
Free.inject(SendEmail(email, subject, body))
}
object Emails {
implicit def emails[F[_]](implicit i: Inject[EmailAlg, F]): Emails[F] =
new Emails
}
What happens here is that for each instruction set, we create an implicit value, which exposes methods such as findUser
. These methods know how to embed a basic instruction in the target type F
- which is left abstract, so that we don't have to commit to a set of languages too early. Yes, that's boilerplate - that's one of the costs of using free in Scala.
Finally, we can implement the addPoints
method:
def addPoints(userId: UUID, pointsToAdd: Int)(
implicit ur: Users[UserAndEmailAlg],
es: Emails[UserAndEmailAlg]): Free[UserAndEmailAlg, Either[String, Unit]] = {
ur.findUser(userId).flatMap {
case None => Free.pure(Left("User not found"))
case Some(user) =>
val updated = user.copy(loyaltyPoints = user.loyaltyPoints + pointsToAdd)
for {
_ <- ur.updateUser(updated)
_ <- es.sendEmail(user.email, "Points added!",
s"You now have ${updated.loyaltyPoints}")
} yield Right(())
}
}
As you maybe noticed, the implementation is again almost the same as the non-refactored method, and again we managed to abstract from the specific target monad. The implicit Users[UserAndEmailAlg]
and Emails[UserAndEmailAlg]
values are created automatically by the implicit definitions we have defined before; this time, with a specific combination of languages given. The Inject
instances are also derived automatically.
What about interpretation? We can specify the interpreters for the instruction sets independently, and then combine them:
val futureUserInterpreter = new (UserRepositoryAlg ~> Future) {
override def apply[A](fa: UserRepositoryAlg[A]): Future[A] = fa match {
case FindUser(id) =>
/* go and talk to a database */
Future.successful(None)
case UpdateUser(u) =>
/* as above */
Future.successful(())
}
}
val futureEmailInterpreter = new (EmailAlg ~> Future) {
override def apply[A](fa: EmailAlg[A]): Future[A] = fa match {
case SendEmail(email, subject, body) =>
/* use smtp */
Future.successful(())
}
}
val futureUserOrEmailInterpreter = futureUserInterpreter or futureEmailInterpreter
val result: Future[Either[String, Unit]] =
addPoints(UUID.randomUUID(), 10).foldMap(futureUserOrEmailInterpreter)
The interpreters contain an or
method, which creates a coproduct interpreter from basic interpreters.
Combining using tagless
Similarly, we need to define the basic instruction sets. As before, each instruction set forms a trait, parametrized by the target monad:
trait UserRepositoryAlg[F[_]] {
def findUser(id: UUID): F[Option[User]]
def updateUser(u: User): F[Unit]
}
trait EmailAlg[F[_]] {
def sendEmail(email: String, subject: String, body: String): F[Unit]
}
However, combining the two languages is much simpler. Before we took an implementation of UserRepositoryAlg[F]
as a parameter, now we need to take an additional parameter, an implementation of EmailAlg[F]
:
class LoyaltyPoints[F[_]: Monad](ur: UserRepositoryAlg[F], es: EmailAlg[F]) {
def addPoints(userId: UUID, pointsToAdd: Int): F[Either[String, Unit]] = {
ur.findUser(userId).flatMap {
case None => implicitly[Monad[F]].pure(Left("User not found"))
case Some(user) =>
val updated = user.copy(loyaltyPoints = user.loyaltyPoints + pointsToAdd)
for {
_ <- ur.updateUser(updated)
_ <- es.sendEmail(user.email, "Points added!",
s"You now have ${updated.loyaltyPoints}")
} yield Right(())
}
}
}
The code doesn't differ much from the original, but the signature states clearly that we can run it using any monad. For interpretation, we now need to create two traits, each implementing the corresponding instruction set:
trait FutureUserInterpreter extends UserRepositoryAlg[Future] {
override def findUser(id: UUID): Future[Option[User]] =
Future.successful(None) /* go and talk to a database */
override def updateUser(u: User): Future[Unit] =
Future.successful(()) /* as above */
}
trait FutureEmailInterpreter extends EmailAlg[Future] {
override def sendEmail(email: String, subject: String,
body: String): Future[Unit] =
Future.successful(()) /* use smtp */
}
val result: Future[Either[String, Unit]] =
new LoyaltyPoints(new FutureUserInterpreter {}, new FutureEmailInterpreter {})
.addPoints(UUID.randomUUID(), 10)
As you can see, adding additional instruction sets is almost a trivial task when using the final tagless approach.
Compiling to a lower-level instruction set
Our domain-specific languages can form a hierarchy, some being higher-level, some lower-level. In such cases, we want to be able to "compile" our high-level instruction set to the lower-level instruction set, that is express the high-level concepts using a simpler, more basic language.
In our example, we'll express the user-related instruction set, in terms of a key-value store instruction set. Note that one instruction in the high-level language can translate to multiple instructions in the lower-level languages.
The basic, Future
-based implementation can take the following form:
case class User(id: UUID, email: String, loyaltyPoints: Int) {
def serialize: String = id.toString + "," + loyaltyPoints + "," + email
}
object User {
def parse(s: String): User = {
val parts = s.split(",")
User(UUID.fromString(parts(0)), parts(2), parts(1).toInt)
}
}
trait KVStore {
def get(k: String): Future[Option[String]]
def put(k: String, v: String): Future[Unit]
}
trait UserRepository {
def findUser(id: UUID): Future[Option[User]]
def updateUser(u: User): Future[Unit]
}
class UserRepositoryUsingKVStore(kvStore: KVStore) extends UserRepository {
override def findUser(id: UUID): Future[Option[User]] =
kvStore.get(id.toString).map(serialized => serialized.map(User.parse))
override def updateUser(u: User): Future[Unit] = {
val serialized = u.serialize
for {
_ <- kvStore.put(u.id.toString, serialized)
_ <- kvStore.put(u.email, serialized) // let's say we also maintain a by-email index
} yield ()
}
}
We'll omit the addPoints
implementation as it is unchanged; what's new is an implementation of UserRepository
which users a KVStore
. How can we express this using free monads or tagless final?
Compiling using free monads
To compile the high-level UserRepositoryAlg
instruction set (algebra) into a lower-level language, we have first to define it:
sealed trait KVAlg[T]
case class Get(k: String) extends KVAlg[Option[String]]
case class Put(k: String, v: String) extends KVAlg[Unit]
type KV[T] = Free[KVAlg, T]
def get(k: String): KV[Option[String]] = Free.liftF(Get(k))
def put(k: String, v: String): KV[Unit] = Free.liftF(Put(k, v))
Nothing new, just following the same procedure as before. What's new, is the way interpretation is done. First, we interpret UserRepositoryAlg
in terms of KV
(notice that we interpret the algebra - single instructions, in terms of KVAlg
-based programs, not in terms of single KVAlg
instructions):
val userToKvInterpreter = new (UserRepositoryAlg ~> KV) {
override def apply[A](fa: UserRepositoryAlg[A]): KV[A] = fa match {
case FindUser(id) =>
get(id.toString).map(_.map(User.parse))
case UpdateUser(u) =>
val serialized = u.serialize
for {
_ <- put(u.id.toString, serialized)
_ <- put(u.email, serialized) // we also maintain a by-email index
} yield()
}
}
That's an interpreter as any other: it interprets an instruction set in a target monad. In this case, this monad is also a free monad, but that doesn't matter as far as interpretation is concerned. Now we need a second stage interpreter, which interprets KVAlg
instructions in terms of our target monad, in our example that's Future
:
val kvToFutureInterpreter = new (KVAlg ~> Future) {
override def apply[A](fa: KVAlg[A]): Future[A] = fa match {
case Get(k) => /* go and talk to a database */ Future.successful(None)
case Put(k, v) => /* as above */ Future.successful(())
}
}
Note that we no longer need an interpreter from UserRepositoryAlg
to Future
- the intermediate language does that for us:
val result: Future[Either[String, Unit]] =
addPoints(UUID.randomUUID(), 10)
.foldMap(userToKvInterpreter)
.foldMap(kvToFutureInterpreter)
We need to call foldMap
twice, as the first results in a free monad using the KVAlg
instructions, and the second does the interpretation into the Future
monad.
Note that both the original program, and the program interpreted into the KV
free monad (first stage of interpretation) are regular Scala values. They can be composed, manipulated, passed to methods etc. as any other value. Moreover, we can potentialy perform optimizations on the structure by pattern matching on the result of the interpretation.
Compiling using tagless
Again, we first define the lower-level language, and it's interpretation to Future
:
trait KVAlg[F[_]] {
def get(k: String): F[Option[String]]
def put(k: String, v: String): F[Unit]
}
trait KvToFutureInterpreter extends KVAlg[Future] {
override def get(k: String): Future[Option[String]] =
Future.successful(None) /* go and talk to a database */
override def put(k: String, v: String): Future[Unit] =
Future.successful(()) /* as above */
}
The definition of the business logic and the user-related algebra remains the same. What changes, is the way interpretation is done: we have to create an interpreter for UserRepositoryAlg
in terms of KVAlg
:
class UserThroughKvInterpreter[F[_]: Monad](kv: KVAlg[F])
extends UserRepositoryAlg[F] {
override def findUser(id: UUID): F[Option[User]] =
kv.get(id.toString).map(_.map(User.parse))
override def updateUser(u: User): F[Unit] = {
val serialized = u.serialize
for {
_ <- kv.put(u.id.toString, serialized)
_ <- kv.put(u.email, serialized) // we also maintain a by-email index
} yield ()
}
}
Quite similar to the original version, but with the monad type abstract. Usage is as one can expect:
val result: Future[Either[String, Unit]] =
new LoyaltyPoints(new UserThroughKvInterpreter(new KvToFutureInterpreter {}))
.addPoints(UUID.randomUUID(), 10)
Again, compiling to a lower-level language is a rather straightforward task.
When to use free?
As you have probably seen from the previous examples, free is more complex in usage than tagless, and requires some amount of boilerplate (partly because we use Scala, partly because that's how it works). So why would you use free over tagless? As always: it depends, but I think one can come up with some general rules.
I think that the biggest advantage of free is that programs become values, which can be passed as an argument, returned, combined, sequenced etc. For example, in Slick, database queries and operations are expressed as instances of DBIOAction
, which is their custom free monad implementation. Because a DBIOAction
is only a description of what should be executed (no actions are performed when the action is created), a number of such values can be independently created, combined into a single large DBIOAction
, and then marked to be executed in a single transaction using the .transactional
method. This way of demarcating transactions is really convenient, makes it easier to re-use query descriptions in various contexts and retains the separation of problem solution description from execution.
While final-tagless requires less boilerplate, and makes it much easier to combine multiple languages, it requires making everything generic in the resulting container F[_]
. Free again benefits from the fact that it's just a value: it doesn't need to live in a parametrized "environment".
Maybe a good guideline could be as follows: for expressing higher-level business concepts, where there's a large number of languages (instruction sets), the final tagless approach will be much more convenient; it's much easier to combine languages in final-tagless than in free. For more cross-cutting concerns, free might be a better choice. It's also entirely feasible to compile a higher-level final-tagless program into a lower-level free monad, and then interpret it further (e.g. compile a final-tagless program which has User
/Product
/etc. instructions, into a free monad which has database instructions)! But of course, it's all a matter of what is the problem you are trying to solve, how you and your team approach the problem and what are your preferences.
Summing up
Here's a short summary of how free compares to tagless-final:
Free monad | Final tagless |
---|---|
Program is data | Program is an expression |
Programs are built from constructors (ADT) | Programs are built from functions |
Plain values | Expressions parametrized by a type constructor (`F[_]`) |
Values represent abstract syntax | Expressions are denontation of the program in the target monad |
Interpretation is deffered | Interpretation happens when the expression is created |
Stack-safe | Stack-safety depends on the target monad |
Each operation is reified as a value | Can have less overhead, without creating intermediate objects |
Easy pattern matching for inspection and optimization | Pattern matching possible, but harder |
Combining languages using `Coproduct`s and implicit `Inject` intances, some boilerplate | Combining languages using multiple algebras, less boilerplate |
There's a number of other blogs and presentations which try to compare the two appraoches, here's a few I used for my research:
- Alternatives to GADTs in Scala by Paul Chiusano
- Chris Birchall's Free vs Tagless final
- Functional web services by Markus Hauck
- Stop paying for Free Monads by Mark Hopkins
All of the code used in this artice is available on GitHub.
What are your experiences with using free monads and the tagless-final encoding? Maybe you have some advice on when to use one or the other? It would be great to find out your opinion! Please don't hesitate to comment :).