Contents

Comparing Scala relational database access libraries

Adam Warski

18 Apr 2017.33 minutes read

Relational databases might no longer be the only and primary storage option; but for sure, they aren't going anywhere. Accessing them is still a common task in our programming life, and that translates to many libraries which aim to help us write correct, readable and maintainable RDBMs-related code.

What are the options to access a relational database when you are using Scala? The main contender seems to be Slick, however other popular options include Doobie, Quill and Scalikejdbc. We will be comparing these four looking at some common use-cases. If you read Manuel Bernhardt's article on a similar subject from 3 years ago you will notice that some of the libraries stood the test of time and some are brand new.

The goal of this article is to provide an overview of the different approaches the above libraries take. The task of choosing which library is "best" (or whether you should just stick with Hibernate) is left to the reader :).

For the impatient, a very short TL;DR is available in the summary.

Updates

  • 2018/02/08: Updating Doobie to 0.5.0 (minor changes)
  • 2018/02/05: Updating versions: Slick to 3.2.1; Doobie to 0.4.4; Quill to 2.3.3; ScalikeJDBC to 3.2.0. Updating the Quill section.
  • 2017/06/28: Fixing Quill plain sql example; ScalikeJDBC supports reactive streams

Example model & db setup

In the comparison, we will be using a model with 3 domain classes in one-to-many relations: City, MetroSystem and MetroLine. Each city can have many metro systems, and each metro system can have many lines:

case class CityId(id: Int) extends AnyVal
case class City(id: CityId, name: String, population: Int, 
  area: Float, link: Option[String])

case class MetroSystemId(id: Int) extends AnyVal
case class MetroSystem(id: MetroSystemId, cityId: CityId, 
  name: String, dailyRidership: Int)

case class MetroLineId(id: Int) extends AnyVal
case class MetroLine(id: MetroLineId, systemId: MetroSystemId, 
  name: String, stationCount: Int, trackType: TrackType)

object TrackType extends Enumeration {
  type TrackType = Value
  val Rail = Value(1)
  val Monorail = Value(2)
  val Rubber = Value(3)
}

All of the code in runnable form is available on GitHub. The database is initialized using Flyway with schema corresponding to the above domain classes, and then filled with example data:

CREATE TABLE city(
    id SERIAL,
    name VARCHAR NOT NULL,
    population INTEGER NOT NULL,
    area FLOAT NOT NULL,
    link VARCHAR
);
ALTER TABLE city ADD CONSTRAINT city_id PRIMARY KEY(id);

INSERT INTO city(id, name, population, area, link) VALUES
    (1, 'Warszawa', 1748916, 517.24, 'http://www.um.warszawa.pl/en'),
    (2, 'Paris', 2243833, 105.4, 'http://paris.fr'),
    (3, 'Chongqing', 49165500, 82403, NULL);

; similarly for metro_system and metro_line

Last but not least, the code is setup to use PostgreSQL, but it should be fairly simple to switch to another database.

Slick

Slick is currently the most popular (at least in terms of GitHub stars) database library for Scala and is developed by Lightbend. What is Slick? As its website says:

"Functional Relational Mapping for Scala";
Slick is a modern database query and access library for Scala. It allows you to work with stored data almost as if you were using Scala collections while at the same time giving you full control over when a database access happens and which data is transferred.

Let's look at some aspects of working with the library when implementing common tasks.

Mapping

The first thing that we have to do is create a mapping for our model. The mapping (meta-model) is completely separate from the domain classes, which most commonly are case classes (as in our example as well) and don't have to know anything about the database layer. The mapping defines the table and column names:

class Cities(tag: Tag) extends Table[City](tag, "city") {
  def id = column[CityId]("id", O.PrimaryKey, O.AutoInc)
  def name = column[String]("name")
  def population = column[Int]("population")
  def area = column[Float]("area")
  def link = column[Option[String]]("link")
  def * = (id, name, population, area, link) <> (City.tupled, City.unapply)
}
lazy val cities = TableQuery[Cities]

Full mappings are available in SlickTest.scala. The * is a projection which defines how to map a tuple of column values to/from the base case class. As you've probably noticed, we have some custom types in our model. The first are AnyVal id-wrappers, such as CityId, so that we don't mix up ids of various domain objects. The second is the TrackType enumeration, which is mapped to an integer value in the database. For both, we need to create a new implicit column type with closures getting the "raw" value and converting from the raw value:

implicit lazy val cityIdColumnType = 
  MappedColumnType.base[CityId, Int](_.id, CityId)

implicit lazy val trackTypeColumnType = 
  MappedColumnType.base[TrackType, Int](_.id, TrackType.byIdOrThrow)

Quite clean and straightforward. Absence of a column type is a compile-time error.

DB Connection

Before we do any work, we need to let Slick know how to connect to the database. For that, we need to create two values. First, an instance of JdbcBackend#DatabaseDef and secondly, a JdbcProfile. The former can be created given database configuration: an address, username, password, connection pool config and so on. The second specifies the capabilities that we can use in our queries, which vary from database to database (but most are common):

val db = Database.forURL("jdbc:postgresql:sql_compare", 
  driver = "org.postgresql.Driver")
val jdbcProfile = PostgresProfile

import jdbcProfile.api._

The last import brings into scope implicits needed to determine the capabilities of the profile.

Queries

Now that we have the model mapped, we can run some queries. First, let's try a simple query, selecting all cities with over 4 million inhabitants (if you are following the source, you probably noticed there are more examples there):

val query: DBIOAction[Seq[City], NoStream, Read] = cities
  .filter(_.population > 4000000)
  .map(c => (c.name, c.population))
  .result

val queryResult: Future[Seq[City]] = db.run(query)

The queries are constructed using a collection-like interface (here we use filter, which translates to a WHERE clause, and map, which translates to columns which should be selected). Queries are completely type-safe, and operate on the "lifted" meta-model, that is the mapping that we have defined ealier. For example _.population references Cities.population, which is of the "lifted" type Rep[Int], not the City.population model field directly.

An important feature is that the query value is just a description of the query; nothing is executed at the point of creating a query. Only later, when we invoke db.run, the query is interpreted and sent asynchronously to the database, hence we are getting back a Future with the query results. As a side-note, although not spelled out in code, DBIOAction is in fact an implementation of the free monad, with a built-in interpreter to the Future monad. As a bonus, the type of the query description also reveals if it's a streaming query or not, and if it reads or writes to the database.

The query descriptions (DBIOActions) are immutable and re-usable, hence common query fragments can be easily factored out and later composed.

It's also worth noting that db.run shields us from all the ugly details of JDBC connection/session management, making everything resource-safe and making it impossible to leak connections.

A more complex query

What about a more complex query? Well, let's try selecting metro systems with city names and a total count of lines in each system, sorted by that count in descending order. In SQL that would be:

SELECT ms.name, c.name, COUNT(ml.id) as line_count
    FROM metro_line as ml
    JOIN metro_system as ms on ml.system_id = ms.id
    JOIN city AS c ON ms.city_id = c.id
    GROUP BY ms.id, c.id
    ORDER BY line_count DESC;

How to translate that to Slick?

case class MetroSystemWithLineCount(metroSystemName: String, cityName: String, 
  lineCount: Int)

val query: DBIOAction[MetroSystemWithLineCount, NoStream, Read] = metroLines
  .join(metroSystems).on(_.systemId === _.id)
  .join(cities).on(_._2.cityId === _.id)
  .groupBy { case ((_, ms), c) => (ms.id, c.id, ms.name, c.name) }
  .map { case ((msId, cId, msName, cName), lines) => (msName, cName, lines.length) }
  .sortBy(_._3.desc)
  .result
  .map(_.map(MetroSystemWithLineCount.tupled))

Looking at the query might at first raise some questions. Why the nested tuples, _2 & _3 and double map invocations? To explain: first, we need to join three tables. Each join results in a tuple of model classes. Hence, after the joins we will get a query returning nested tuples: ((MetroLine, MetrySystem), City). Then, we group the results by the specified columns, which yields a query returning tuples of the column by which we group (here: (ms.id, c.id, ms.name, c.name)) and a collection of the grouped rows.

The first map converts these grouped tuples to the values that we want to SELECT: system & city names, plus the count of the grouped rows. That's a database-level map, which influeces the generated SQL query. The results of this map must be one of the Slick-defined basic types, which can be extended, but that's a bit more complicated than just adding a custom column type.

The second map operates on the results after they are read from the database into memory, and converts a tuple to a case class instance.

In case of any query-generating embedded DSL, the question always is - what kind of sql does it produce? Here everything looks fine, just as we would expect:

select x2."name", x3."name", count(1) 
from "metro_line" x4, "metro_system" x2, "city" x3 
where (x4."system_id" = x2."id") and (x2."city_id" = x3."id") 
group by x2."id", x3."id", x2."name", x3."name" 
order by count(1) desc

Dynamic queries

How about constructing dynamic queries? Since we are only creating a description using immutable values, this should be quite straightforward, and indeed, here's how you could select dynamically constrainted metro lines:

val minStations: Option[Int] = Some(10)
val maxStations: Option[Int] = None
val sortDesc: Boolean = true

val query = metroLines
  .filter { line =>
    List(
      minStations.map(line.stationCount >= _),
      maxStations.map(line.stationCount <= _)
    ).flatten.reduceLeftOption(_ && _).getOrElse(true: LiteralColumn[Boolean])
  }
  .sortBy(l => if (sortDesc) l.stationCount.desc else l.stationCount.asc)
  .result

Plain SQL?

Finally, if we have a really complicated query, we can run it in plain sql and map its results to a case class; we need to provide an implicit GetResult instance which maps a raw database row to our model:

case class MetroSystemWithCity(metroSystemName: String, cityName: String, 
  dailyRidership: Int)

implicit val getMetroSystemWithCityResult = GetResult(r => 
  MetroSystemWithCity(r.nextString, r.nextString, r.nextInt))

val query = sql"""SELECT ms.name, c.name, ms.daily_ridership
                    FROM metro_system as ms
                    JOIN city AS c ON ms.city_id = c.id
                    ORDER BY ms.daily_ridership DESC""".as[MetroSystemWithCity]

Transactions

Combining several database operations to run in a single transaction is quite straightforward thanks to the separation of query description and exection. If you have a number of DBIOActions (query descriptions) which you'd like to run in a single transaction, first you need to combine them, usually using flatMaps, and then call .transactionally on the results:

val combinedAction: DBIOAction[Int, NoStream, _] = for {
  _ <- insertAction("Invalid", 0, 0, None) // has type: DBIOAction[..]
  _ <- action2 // has type: DBIOAction[..]
} yield deleted

val queryResult = db.run(combinedAction.transactionally)

The combined query descriptions are also a DBIOAction, and so can be combined further!

The above, and additional runnable examples are available in SlickTests.

Doobie

Our second contender, Doobie, is gaining popularity especially in the FP Scala community. Doobie introduces itself as:

doobie is a pure functional JDBC layer for Scala. It is not an ORM, nor is it a relational algebra; it just provides a principled way to construct programs (and higher-level libraries) that use JDBC.

Let's see what it means in practice!

Mapping

There's no meta-model in Doobie, you just use basic types (Int, String etc.), plain case classes, tuples or HLists. There's hence also no process of mapping your model (e.g. a case class) to a database table. Instead, you simply write the SQL queries. Yes, you read that right - the very thing that Hibernate, JPA, or the above described Slick try to abstract away, is here a first-class citizen.

However, we are not back to stitching SQL queries by hand, trying to remember to properly escape parameters to avoid SQL injection and precompiling queries using plain JDBC. Doobie offers substantial help in constructing queries, and mapping their results to our domain objects, as we'll soon see.

But first, we still need to make sure that our custom types can be properly handled by Doobie. Turns out, our type-safe ids (CityId etc.) which are value classes - that is, extend AnyVal, are supported out-of-the box. Hence, only TrackType needs special treatment:

implicit val trackTypeMeta: Meta[TrackType] = 
  Meta[Int].xmap(TrackType.byIdOrThrow, _.id)

As you can see, the mechanism here is quite similar to the one in Slick - for a type to be supported, there needs to be an implicit Meta instance, which provides conversions to/from another supported type (here: Int). Lack of a Meta object needed to handle a type is a compile-time error.

DB Connection

To connect to the database, we need to create an instance of a Transactor, which can be constructed provided configuration such as database connection string, username, password. There are different transactor implementations depending if we want connections to be pooled or not. Finally, when creating the transactor we also specify the monad, in which our queries will be run, but more on that later:

val xa = Transactor.fromDriverManager[IO](
  "org.postgresql.Driver", "jdbc:postgresql:sql_compare", null, null)

Queries

Let's start, as before, with a simple query to select all cities with over 4 million inhabitants (this and more examples are available in the sources):

val limit = 4000000

val program = 
  sql"select id, name, population, area, link from city where population > $limit"
    .query[City] 
    .to[List]

program.transact(xa).unsafeRunSync().foreach(println)

As you can see, queries are created with the sql string interpolator. This not only creates the Doobie data structure representing a query, but also provides proper escaping of values embedded in the query (here: $limit). We then specify that this is a query (as opposed to e.g. an update), which results will be mapped to the City class. Finally, we are interested in a list of all results, not a single value or such.

Similarly to Slick, Doobie creates a description of the query; the execution is separate. As the Doobie docs state, "doobie programs are values". However unlike in Slick, here we explicitly get back a scalaz or cats free monad (depending on the doobie flavor we choose), aliased to the ConnectionIO type.

This opens up the possibility to interpret doobie programs in many target monads (not only Future, as in Slick). In this example we are using the IO monad, but we could also any other which implements the typeclasses from cats-effect (e.g. Monix's Task). This gives a lot of flexibility when integrating with the rest of the system we work on.

Having the query description, we run it using .transact(xa), which in this case interprets the program in the IO monad. For this particular monad, to "unwrap" and actually run the side-effecting code we need to invoke unsafeRunSync(), which returns (synchronously) a List[City]. The unwrapping will be different for each target monad.

Doobie also shields us from any details of connection/session management, so that we don't have a chance to leak connections.

Complex queries

As Doobie queries are written directly in SQL, that's exactly what's sent to the database - there's no generation step; there's also no need to learn a DSL and its quirks to generate just the query we need. Here's how we would select metro systems with city names and a total count of lines in each system, sorted by that count in descending order:

case class MetroSystemWithLineCount(metroSystemName: String, 
  cityName: String, lineCount: Int)

val program = sql"""
  SELECT ms.name, c.name, COUNT(ml.id) as line_count
    FROM metro_line as ml
    JOIN metro_system as ms on ml.system_id = ms.id
    JOIN city AS c ON ms.city_id = c.id
    GROUP BY ms.id, c.id
    ORDER BY line_count DESC
  """.query[MetroSystemWithLineCount].to[List]

program.transact(xa).unsafeRunSync().foreach(println)

There are, of course, downsides to this SQL-first approach.

First of all, the column and table names aren't defined in a single, central place. That's a direct consequence of the lack of a meta-model. This means more work if you change the schema.

Secondly, the queries aren't type-safe are not type-checked in any way during compilation. You won't get a warning when trying to compare incompatible types, when there's a syntax error in your SQL query, or when the selected columns don't map to the target case class.

Type-safe queries are the whole reason why embedded DSLs for creating SQL queries are created in the first place; so what's the doobie solution for that problem? Doobie has built-in support for checking at runtime that queries are correct. When the .check method on a query is invoked, the query won't be run, but using the currently configured database a detailed analysis of the query will be performed, for example:

sql"select name from city".query[String].check.unsafeRunSync()

/*
Analyzing query for correctness

  select name from city

  ✓ SQL Compiles and Typechecks
  ✓ C01 name VARCHAR (varchar) NOT NULL  →  String
*/

These checks can be incorporated into a specs2/scalatest test suite, of course failing the test if the query is incorrect.

Dynamic queries

Queries can be also built dynamically using sql fragments. These fragments are concatenated using ++, but unlike stitching Strings together, you don't have to worry about whitespace and escaping parameters. Following the Slick example, here's how you could select metro lines with dynamic constraints:

val minStations: Option[Int] = Some(10)
val maxStations: Option[Int] = None
val sortDesc: Boolean = true

val baseFr = 
  fr"select id, system_id, name, station_count, track_type from metro_line"

val minStationsFr = minStations.map(m => fr"station_count >= $m")
val maxStationsFr = maxStations.map(m => fr"station_count <= $m")
val whereFr = List(minStationsFr, maxStationsFr).flatten.reduceLeftOption(_ ++ _)
  .map(reduced => fr"where" ++ reduced)
  .getOrElse(fr"")

val sortFr = fr"order by station_count" ++ (if (sortDesc) fr"desc" else fr"asc")

val program = (baseFr ++ whereFr ++ sortFr).query[MetroLine].to[List]

Transactions

Finally, what about transactions? Well, here again the case is very similar as in Slick. Since Doobie programs are values, it's just a matter of composing a number of query descriptions (programs) into a larger query description (program), which can be later composed further, and passing that to the interpreter:

val combinedAction = for {
  _ <- insertAction("Invalid", 0, 0, None) // has type: ConnectionIO[..]
  _ <- action2 // has type: ConnectionIO[..]
} yield deleted

val queryResult = combinedAction.transact(xa).unsafeRunSync()

All of the above examples and more, in runnable form, are available in DoobieTests.

Quill

The main focus of the third library we'll be looking at, Quill, are type-safe queries. As we can read on Quill's website:

Quill provides a Quoted Domain Specific Language (QDSL) to express queries in Scala and execute them in a target language. The library’s core is designed to support multiple target languages, currently featuring specializations for Structured Query Language (SQL) and Cassandra Query Language (CQL).

Of course here we'll be only looking at SQL.

Mapping

Quill takes a convention-over-configuration approach, and if you follow the convention, you won't need any database mapping code. Rows are mapped to case classes, with class names converted to table names, and field names converted to table columns. When creating Quill's context (which is the main entry point to the database), some global name transformations can be specified, such as lower-casing or converting CamelCase to snake_case (in fact, our running example follows exactly these conventions).

If needed, we can customize case-class-to-table mappings; for example, the entry point to querying a table following the convention would be query[MetroLine]. To customize the mappings, the entry point needs to be (not all columns have to be mapped):

querySchema[MetroLine](
  "metro_line",
  _.id -> "id", 
  _.stationCount -> "station_count"
)

What about our custom types? AnyVal-extending classes are again supported out-of-the box, and to support the TrackType enumeration, we need to provide two implicits, each of which maps one way (this is similar to Slick/Doobie, in which however there was one implicit containing mappings both ways):

implicit val encodeTrackType = MappedEncoding[TrackType, Int](_.id)
implicit val decodeTrackType = MappedEncoding[Int, TrackType](TrackType.byIdOrThrow)

As before, lack of a type mapper when a type is used in a case class used for querying is a compile-time error.

DB Connection

Configuring the database connection is most conveniently done through TypesafeConfig (can be done without it, but it's much more cumbersome). In our case, the configuration file is quite simple:

ctx.host=localhost
ctx.port=5432
ctx.user=postgres
ctx.database=sql_compare

We specify the name of the configuration key when creating a Quill context, as well as the type of the global name transformations:

lazy val ctx = new PostgresAsyncContext(SnakeCase, "ctx")
import ctx._

There are several implementations of the context. As with Slick's profiles, a context defines the capabilities we can use when creating queries. So the operations available when creating a CassandraSyncContext will be different in some cases from the ones available when using a MysqlJdbcContext. The context also specifies if the results are wrapped when executing queries: either in a Future, or not at all (synchronous calls).

For Postgres, there are two contexts available: PostgresJdbcContext and PostgresAsyncContext. The latter uses the relatively new non-blocking JDBC driver for fully asynchronous database access, and that's also the one we'll be using in the tests.

Queries

As in the previous sections, we start with a simple query to select all cities with over 4 million inhabitants (just in case, the full sources of Quill tests are here):

val bigLimit = 4000000

val q = quote {
  query[City].filter(_.population > lift(bigLimit)).map(_.name)
}

val result: Future[Seq[City]] = ctx.run(q)

The main mechanism for constructing queries is based on quoting: a Scala macro analyses the Scala code inside the quote, and constructs an AST, which can later be translated to target languages, such as CQL and SQL. This allows regular, collection-like Scala code to be compiled to queries.

To start a query, we need an entrypoint: here it's query[City]; we then constrain it using filter (which translates to SQL's WHERE), adding a condition which restricts the population. However, we want to use an external value (bigLimit). Normally, quotations shouldn't reference any values outside the quote's scope. But if a value needs to be embedded into a quote, it has to be lifted, as is the case here. Of course, only values of supported types can be embedded into queries this way.

To execute the quoted query, we need to call ctx.run. Here again we have a form of separating query description (Quoted[_]) and query execution. When calling run in a given context, the query will be generated at compile-time (you will in fact see the generated SQL query in the compiler logs). This sounds great - there is, however, one important catch. If we add an explicit type to the query:

val q: Quoted[Query[String]] = quote {
  query[City].filter(_.population > lift(bigLimit)).map(_.name)
}

val result: Future[Seq[City]] = ctx.run(q)

you will notice in the logs that the query is no longer generated at compile-time, but instead becomes a "dynamic query", which means that the query will be generated at run-time. While the query still works, in my opinion this severly limits the usefullness of the compile-time generation mechanism, as any attempt to extract e.g. common query creation code into methods which return queries (where it's considered good style to add type annotations) will cause queries to be generated at run-time, not compile-time. So this mechanism is quite brittle, and the programmer needs to be cautious not to fall into the run-time land.

It's important to note, that in Slick all of the queries are generated at run-time.

An alternative to ctx.run is using ctx.runIO which doesn't run the given query, but instead returns a description of the query, represented as an IO datatype (a free monad; similar to Slick's DBIOAction and Doobie's ConnectionIO). Such a query can be later executed (potentially multiple times) using performIO. IO also gives the possibility to track effects, very much like Slick's DBIOAction.

Complex queries

Here's the query to select metro systems with city names and the total count of lines in each system:

case class MetroSystemWithLineCount(metroSystemName: String, cityName: String, 
  lineCount: Long)

val q = quote {
  (for {
    ((ml, ms), c) <- query[MetroLine]
      .join(query[MetroSystem]).on(_.systemId == _.id)
      .join(query[City]).on(_._2.cityId == _.id)
  } yield (ml, ms, c))
    .groupBy { case (ml, ms, c) => (ms.id, c.id, ms.name, c.name) }
    .map { case ((msId, cId, msName, cName), aggregated) => 
      MetroSystemWithLineCount(msName, cName, aggregated.size) 
    }
}

val result = ctx.run(q)

Similarly to Slick, joining two tables/model classes yields a tuple, so when joining three tables we get nested tuples - hence the _2 in the join condition (as we need to "unpack" the first join result). We then formulate the group by condition on the tuple and map the results to our target case class (MetroSystemWithLineCount). Note that unlike in Slick, we can use a non-mapped case class inside the "query builder" (here - the quotation), without the need to first map to a tuple and only later to MetroSystemWithLineCount.

What's the generated SQL? As one would expect:

SELECT x15.name, x17.name, COUNT(*) 
FROM metro_line x14 
  INNER JOIN metro_system x15 ON x14.system_id = x15.id 
  INNER JOIN city x17 ON x15.city_id = x17.id 
GROUP BY x15.id, x17.id, x15.name, x17.name

Dynamic queries

When creating queries dynamically, quite naturally the queries are generated at run-time, not compile-time. Here's how our dynamic-query example looks like in Quill:

val minStations: Option[Int] = Some(10)
val maxStations: Option[Int] = None
val sortDesc: Boolean = true

val allFilter = quote {
  (ml: MetroLine) => true
}
val minFilter = minStations.map(limit => quote {
  (ml: MetroLine) => ml.stationCount >= lift(limit)
}).getOrElse(allFilter)

val maxFilter = maxStations.map(limit => quote {
  (ml: MetroLine) => ml.stationCount <= lift(limit)
}).getOrElse(allFilter)

val sortOrder = if (sortDesc) quote { Ord.desc[Int] } else quote { Ord.asc[Int] }

val q = quote {
  query[MetroLine]
    .filter(ml => minFilter(ml) && maxFilter(ml))
    .sortBy(_.stationCount)(sortOrder)
}

val result = ctx.run(q)

We first create quoted query fragments for the conditions & sort order, and finally compose them into a single query.

Plain SQL

How about falling back to plain SQL for really complex queries? Here we can use the infix interpolator to convert a string to a quotation; the mapping is done by convention as before:

case class MetroSystemWithCity(metroSystemName: String, cityName: String, 
  dailyRidership: Int)

val q = quote {
  infix"""SELECT ms.name as metro_system_name, c.name as city_name, 
    ms.daily_ridership as daily_ridership
    FROM metro_system as ms
    JOIN city AS c ON ms.city_id = c.id
    ORDER BY ms.daily_ridership DESC""".as[Query[MetroSystemWithCity]]
}

val result = ctx.run(q)

Transactions

What about transaction support, if we want to run a couple of queries together? When using the async drivers, this is possible by using a special ExecutionContext (which contains the currently running transaction's context), which needs to be in scope when running the queries:

def insert(c: City)(implicit ec: ExecutionContext): Future[City] = ctx.run {
  query[City].insert(lift(c)).returning(_.id)
}.map(id => c.copy(id = id))

def delete(id: CityId)(implicit ec: ExecutionContext): Future[Long] = ctx.run {
  query[City].filter(_.id == lift(id)).delete
}

ctx.transaction { implicit ec =>
  for {
    inserted <- insert(City(CityId(0), "Invalid", 0, 0, None))
    deleted <- delete(inserted.id)
  } yield deleted
}

This mechanism might be error-prone. If we have a "default" execution context in scope, and we omit the implicit parameter declaration e.g. in the header of the delete method, we won't get a compile-time error, and the delete code simply won't take part in the transaction.

In the synchronous drivers, the transactions are based on a thread-local value.

Note that this approach is quite different from what we've seen in Slick/Doobie, as the query description always maps to a single statement that can be executed. In other words, this query representation is not composable. However, as already mentioned, queries can be also represented using a custom IO monad. Using this approach, multiple queries can be composed and run together, also in a transaction, for example:

def insert(c: City)(implicit ec: ExecutionContext): IO[City, Effect.Write] = 
  ctx.runIO {
    query[City].insert(lift(c)).returning(_.id)
  }.map(id => c.copy(id = id))

def delete(id: CityId)(implicit ec: ExecutionContext): IO[Long, Effect.Write] = 
  ctx.runIO {
    query[City].filter(_.id == lift(id)).delete
  }

val combined = for {
  inserted <- insert(City(CityId(0), "Invalid", 0, 0, None))
  deleted <- delete(inserted.id)
} yield {
  println(s"Deleted $deleted rows")
  println()
}

println("Transactions with IO")
performIO(combined.transactional)

Similarly to Slick, a composite IO can be run as a single transaction by using the .transactional method.

All of the above examples in runnable form are available in QuillTests.

ScalikeJDBC

ScalikeJDBC, as the name would suggest, aims to be a Scala wrapper for the JDBC we all like/dislike. On the website, we can find out that:

ScalikeJDBC is a tidy SQL-based DB access library for Scala developers. This library naturally wraps JDBC APIs and provides you easy-to-use and very flexible APIs. What’s more, QueryDSL makes your code type-safe and reusable.

What it's like to use ScalikeJDBC?

Mapping

In its basic form, ScalikeJDBC doesn't offer support for mapping from a result set to e.g. case classes; and in some cases, reading individual fields and creating objects using them by hand is what needs to be done. However, there's also a convenience autoConstruct macro which generates code mapping a result set to a case class. As with Quill, the field names are transformed using a convention, which by default converts CamelCase to snake_case. An important thing to keep in mind is that the mapping is one-way only: from a result set, to a case class. To e.g. insert a new row to the table (which would require a mapping in the other direction), one needs to enumerate all the columns.

To centralize the table name and provide an entry point for creating queries, ScalikeJDBC provides the SQLSyntaxSupport class:

class CitySQL(db: NamedDB) extends SQLSyntaxSupport[City] {
  override def connectionPoolName: Any = db.name
  override def tableName: String = "city"

  def apply(rs: WrappedResultSet, rn: ResultName[City]): City = 
    autoConstruct[City](rs, rn)
}

We'll get back to NamedDB soon. To support custom types, again we need implicit values in scope - this time, both for our id value classes and the track type:

implicit val cityIdTypeBinder: TypeBinder[CityId] = new TypeBinder[CityId] {
  def apply(rs: ResultSet, label: String): CityId = CityId(rs.getInt(label))
  def apply(rs: ResultSet, index: Int): CityId = CityId(rs.getInt(index))
}

implicit val trackTypeTypeBinder: TypeBinder[TrackType] = new TypeBinder[TrackType] {
  def apply(rs: ResultSet, label: String): TrackType = 
    TrackType.byIdOrThrow(rs.getInt(label))
  def apply(rs: ResultSet, index: Int): TrackType = 
    TrackType.byIdOrThrow(rs.getInt(index))
}

Note that again, this is a one-way mapping: from the result set, to the target type. Converting the types the other way has to be done by hand.

DB Connection

Configuration in ScalikeJDBC is global, and is done through the ConnectionPool object. You can create several named connection pools:

ConnectionPool.add('tests, "jdbc:postgresql:sql_compare", "postgres", "")
def db: NamedDB = NamedDB('tests)

The NamedDB instance is used to execute queries, run transactions etc. It's important that the value is always created anew - hence a def, not a val in the definition - as it should only be used once. A NamedDB instance wraps a single session/connection taken from the pool.

Unlike the previous libraries described, ScalikeJDBC exposes also a lower-lever interface, making it possible to directly interact with JDBC connections and sessions; hence also making it possible to leak these connections, if not closed properly. However, we'll only deal with the resource-safe way of using the database objects.

Queries

ScalikeJDBC provides two ways of constructing queries: either by interpolating strings (similarly as in Doobie), or via a semi-typesafe API. We'll use the latter option. Here's the simple query selecting all cities with over 4 millions of inhabitants:

val bigLimit = 4000000

val c = citySQL.syntax("c")
val p = withSQL {
  select.from(citySQL as c).where.gt(c.population, bigLimit)
}.map(citySQL.apply(_, c.resultName)).list()

val result: List[City] = db.readOnly { implicit session =>
  p.apply()
}

As you can see the API tries to mirror the way you write SQL queries. Prior to writing the query, we need to declare the aliases that we'll use (here: c), which are later needed e.g. to select fields on which we want to filter in the where clause. When the query is complete, withSQL converts SQLBuilder to an SQL object, representing a query which can be executed. Here, again, we have a separation between the description of the query and its execution.

The built query results can then be mapped - here indirectly using the autoConstruct macro - to the desired shape. Finally, we use a brand new db object to run a read-only transaction. The implicit session is used in the p.apply() call, and using that session (and its connection) the query in synchronously executed.

Speaking of, currently all calls in JDBC are synchronous, and hence blocking. An async version, using non-blocking database drivers, is being developed.

The API is semi-typesafe, as only some errors will be detected at compile-time. For example c.population is a dynamic macro call, so the fact that population is a field of City will be checked at compile-time, but you won't get auto-complete in your favorite IDE. However, the types of the compared values (in .gt(c.population, bigLimit)) are not checked, and we could have used e.g. a string intead of an int without the compiler complaining.

Complex queries

Simple queries are always simple, what about a more complex one? Here's the fourth incarnation of our query in which we select metro systems with city names a total count of lines in each system:

case class MetroSystemWithLineCount(metroSystemName: String, cityName: String, 
  lineCount: Int)

val (ml, ms, c) = (metroLineSQL.syntax("ml"), metroSystemSQL.syntax("ms"), 
  citySQL.syntax("c"))

val p = withSQL {
  select(ms.result.column("name"), c.result.column("name"), 
    c.result.column("name"), sqls"count(ml.id) as line_count")
    .from(metroLineSQL as ml)
    .join(metroSystemSQL as ms).on(ml.systemId, ms.id)
    .join(citySQL as c).on(ms.cityId, c.id)
    .groupBy(ms.id, c.id)
    .orderBy(sqls"line_count").desc
}
  .map(rs => MetroSystemWithLineCount(rs.string(ms.resultName.name), 
    rs.string(c.resultName.name), rs.int("line_count")))
  .list()

val result: List[MetroSystemWithLineCount] = db.readOnly { implicit session =>
  p.apply()
} 

Again, this resembles quite closely what you would write in plain SQL, but, as with the simple query, is only partially type-safe. For example, there's no validation that we join all the tables, that we group by fields by which we can group by etc. However, the field names are checked.

To express the aggregation, and the sort column, we had to embed plain sql fragments, which in ScalikeJDBC are created using the sqls string interpolator.

Dynamic queries

How about creating queries dynamically? Here's our example:

val minStations: Option[Int] = Some(10)
val maxStations: Option[Int] = None
val sortDesc: Boolean = true

val ml = metroLineSQL.syntax("ml")
val p = withSQL {
  select.from(metroLineSQL as ml)
    .where(sqls.toAndConditionOpt(
      minStations.map(ms => sqls.ge(ml.stationCount, ms)),
      maxStations.map(ms => sqls.le(ml.stationCount, ms))
    ))
    .orderBy(ml.stationCount)
    .append(if (sortDesc) sqls"desc" else sqls"asc")
}.map(metroLineSQL.apply(_, ml.resultName)).list()

val result: List[MetroLine] = db.readOnly { implicit session =>
  p.apply()
}

There's some built-in support for creating dynamic queries in the sqls object, which we use here to pass a list of optional query fragments to toAndConditionOpt, which in turn will return the fragments which are defined joined with an AND.

One problem that I noticed when creating this query is the inability to assign query fragments to local values. If we convert the above to what might seem an equivalent form you get a compiler error:

val p = withSQL {
  val s = select.from(metroLineSQL as ml)
    .where(sqls.toAndConditionOpt(
      minStations.map(ms => sqls.ge(ml.stationCount, ms)),
      maxStations.map(ms => sqls.le(ml.stationCount, ms))
    ))
    .orderBy(ml.stationCount)

  s.append(if (sortDesc) sqls"desc" else sqls"asc")
}.map(metroLineSQL.apply(_, ml.resultName)).list()

/*
Compile error:
Error:(214, 21) type mismatch;
 found   : scalikejdbc.PagingSQLBuilder[Nothing]
 required: scalikejdbc.SQLBuilder[A]
*/

This is due to the fact that the sql builder has an unconstrained type parameter, which when assigned to a val is inferred to be Nothing, while when the whole query is written without local vals, the type parameter is still undetermined and can be passed to withSQL. I'm not sure what's the role of this type parameter, but in some cases it makes writing queries harder, and can cause weird compiler errors.

Transactions

To run a couple of statements in a single transaction, we need to use the same implicit session for all of them, and execute the statements one by one (execution is synchronous):

def insertCity(...)(implicit session: DBSession): City = {
  // ...
}

def deleteCity(id: CityId)(implicit session: DBSession): Int = {
  withSQL {
    val c = citySQL.syntax("c")
    delete.from(citySQL as c).where.eq(c.id, id.id)
  }.update().apply()
}

val deletedCount = db.localTx { implicit session =>
  val inserted = insertCity("Invalid", 0, 0, None)
  deleteCity(inserted.id)
}

As in Quill, and unlike in Slick/Doobie, we can't compose the values that describe queries into larger descriptions. The query description API is also not compositional.

All of the ScalikeJDBC tests (more than described above) can be found in ScalikejdbcTests.

Reactive?

Asynchronous

In several places we have mentioned that code is executed asynchronously, but what does it really mean in terms of database access? Well, there are two possible levels.

First, we can have non-blocking JDBC drivers that talk to the database, using which we can build a fully-asynchronous database access layer. However, most of the existing JDBC drivers are blocking (synchronous). Only Quill provides integration with async drivers for MySQL/PostgreSQL, while integration with ScalikeJDBC is in alpha stage. Hence, when using Quill's async driver, the Futures you are getting are truly non-blocking all the way down.

Secondly, you can build an asychronous execution layer on top of the blocking JDBC drivers. That's an approach taken by Slick. In Slick, there's a dedicated thread pool which executes the blocking JDBC actions, and a potentially larger connection pool, which is used to run the database actions. The connections aren't tied to threads, and there might be many more actions "in progress", which are processing data, than simultaneous database calls. If a transaction contains multiple database interactions, they might end up being executed on different threads. This way, Slick manages to achieve high parallelism with a relatively small thread pool for running the blocking JDBC code.

Of course, embracing the synchronous nature of the JDBC driver calls and running transactions synchronously is also an option. ScalikeJDBC runs all code synchronously; Quill as well, if you are using the "traditional" JDBC drivers. Doobie can interpret the program to run in the Task monad, which can be executed asynchronously, however the potentially multi-step ConnectionIO program description upon interpretation will get a connection allocated, execute all actions synchronously, and release the connection back to the pool; so even though the database interactions can be run in the background, they will execute synchronously.

Streaming

In most cases when interacting with a datbase we just want to read query results into memory and process them all at once; but what about the cases when we actually work on a large dataset, and would like to get and process the results incrementally - as a stream?

In Slick, any collection-valued query can be streamed by obtaining a Reactive Streams Publisher. ScalikeJDBC recently added support for Reactive Streams as well. This opens the possibility to integrate with one of the many frameworks and libraries which also implement the reactive streams protocol.

Doobie can also stream results - any query can be converted to a scalaz-stream/fs2 Process and, well, process results as they are emitted by the database.

ORMs?

Note that all of the above approaches are quite different from what we know from ORMs. There's no mapping to "objects" - rather, there's mapping to database row representations. There's no inheritance support, or navigating one-to-one, one-to-many etc. relations using the object graph. This of course creates some problems (or the need for more code) but also has benefits. But remember that you can always use Hibernate in Scala, if that's what you really need.

Summary

As you see, the libraries differ quite significantly in their approach to making your, the programmer's, life easier when accessing a relational database. Do you want to write type-safe queries, or do you prefer to hand-craft SQL? Are you willing to define a meta-model, is a convention-over-configuration approach fine to you, or do you prefer to map results to domain objects by hand? As always, there are tradeoffs and choices to be made.

To help out, here's a short summary of some of the features we've been looking at:

FeatureSlickDoobieQuillScalikeJDBC
Model representationCase classes/tuples, or hand-mapped typesCase classes, primitives, tuples, HLists, Shapeless recordsCase classesNone OR case classes w/ macros add-on
Meta-modelLiftedNoneBy conventionNone OR by convention when using macros add-on
Query languageEmbedded DSLSQLQuoted DSLSQL / embedded DSL
Type-safe queriesYesNoYesPartial
AsynchronousAsync layer on top of blocking driversNoNo OR fully asynchronous, depending on driverNo OR fully asynchronous (in development)
Results wrapped in (monad)FutureIOLite, Task (scalaz, monix, fs2) or any customNone (sync), Future (async) or IONone
TransactionsCompose queriesCompose queriesPass implicit ExecutionContext OR compose queriesPass implicit Session
Query representationCompositionalCompositionalSingle statement OR compositionalSingle statement
StreamingReactive streamsfs2NoReactive streams

By no means this comparison has an aim to be comprehensive. Rather, it tries to show differences among leading RDBMs access libraries for Scala. Hopefully using the above information picking your favorite one will be a bit easier, or maybe it just got harder? :)

It's highly probable that I got some of the details on how Slick/Doobie/Quill/ScalikeJDBC work incorrect or not precise. If so - please let me know!

Finally, if you'd like to browse and experiment with the code, it's available on GitHub.

Blog Comments powered by Disqus.