Contents

Developing your own Content Management System — Preface

Building a custom CMS

You can wonder why one would do such a thing, why not use something that’s already available on the market. Why do you want to reinvent the wheel? And you are absolutely right, there are tons of Content Management System (aka CMS) products on the market you can use out-of-the-box. You should be able to find something that fits your needs and be happy.

Yet, if you invest your time and energy in a wrong solution, you end up implementing your own CMS anyway and, in the end, it will be the best possible option. These are the notes from such a journey.

photo by Marco Djallo

Background

I joined a project which had already been in development for a long time, with already made architectural decisions and technology selected. My task was to take over the duties of teammates that were leaving the project and continue their work. Everything looked easy and reasonable.

The project is one of the Customer Communication Management systems with a strong emphasis on all-over-the-web approach. So all you need is a web browser to start preparing a communication over a wide range of channels (like PDFs, email, web pages, etc.).

To manage contents used in the communication, one of the top Java based open-source implementation of CMS was selected which, at a glance, seemed to be a perfect fit.

And then we hit the wall!

Dead-end

Soon it became clear that with the selected CMS solution, it won’t be possible to implement one of the very simple requirements:

  • content used in a template (or another content) cannot be deleted by a user

What does it mean? It means when you develop your communication template, you can combine different contents into a template that can be used to prepare the communication and, at the same time, no one should be able to delete any of the contents used by the template. Very simple and logical requirement. Yet this cannot be achieved with most CMSs as they treat contents as separated assets, independent of each other.

A standard CMS typically supports file-based like navigation and structures (files and folders), which is easy to use by users. Yet, none of them gives you the ability to check parent-child relations, which was another requirement to implement:

  • a folder based permission system

There were more requirements that we couldn’t achieve with the selected CMS solution. Then we decided to implement our own CMS.

Relations

We were looking into solution that would allow us easily get these relations to work:

  • Who is the parent of this asset?
  • Who are the children of this folder?
  • What uses this asset?
  • Where is this asset used?

Plus a few others, but these were the most important for us to evolve the project further.

NOTE: by asset, I mean any template, content, file, image that could be created in the system or uploaded into it.

If you take a closer look on these requirements, you come up with such relations:

  • parentOf & childOf — to represent folder-files relations
  • uses & usedBy — to represent asset to asset usage relation (content uses image, image is used by the content, template uses content, content is used by the template)

Now you should be confident that the only option to fill up these is to use a Graph Database. We chose to use Dgraph.

Storage

Since the relation part had been solved, now we had to think of how we would be storing content of the assets. Basically we could have a two kinds of them:

  • text — a JSON representation of a asset definition
  • binary — used for images & PDFs

Also, we expected more writes than reads, especially with the autosave functionality of our in-browser web designers. In the meantime, a Walmart case with using Cassandra to store images came out and we decided to use a similar approach.

Tie the things

Having a new technology stack selected, what left was to design a new CMS. Dgraph supports transactions, where Cassandra doesn’t, so our flow on performing updates had to be as follow:

  1. start a Dgraph transaction
  2. write data into Cassandra
  3. then write data into Dgraph
  4. finally commit the transaction

Using such an approach, we should have stable data in Dgraph with potential orphan data in Cassandra, an acceptable solution.

Relation model

Our first task was to learn how to use Dgraph, yet with their step-by-step tutorial, it was fairly easy to do. Dgraph supports GraphQL, but when we started using it, there was no such functionality. In two weeks we got basic knowledge about how to use Dgraph, how to write queries and mutations (queries that modify state). We got familiar with Dgraph’s triples, how to use native JSON support, and so on.

Frankly, after two years of using Dgraph I can finally tell that I’m proficient at it. No tutorial nor docs give such level of confidence but practice.

With basic Dgraph understanding, we had to implement a very simple model to represent the relations. In the Dgraph world, you operate on Nodes and Edges or Predicates. A node is your entity, where edge/predicate is a relation with other objects — the object can be another node or scalar value (a name of the node).

Nodes & Edges/Predicates

To represent folder-file structure, we just needed two entities:

  • a container node which can contain other nodes (eg. root folder, folders)
  • an asset node which represents an Asset (template, content, image, PDF)

Internally, Dgraph uses Uid (Universal IDentifier) to identify each node and to represents relations between different nodes. To meet the basic requirements, each node in our system should have such predicates:

  • parentOf: List[Uid]
  • childOf: List[Uid]
  • uses: List[Uid]
  • usedBy: List[Uid]

NOTE: we started with Dgraph 1.0 where there were only one-to-many relations, that’s why the childOf relation is using a List instead of a single Uid to represent one-to-one relation. With the next major release of Dgraph, support for one-to-one relations was added and it would be good to change our relation model someday.

Besides these relations, we also defined a few other scalar relations like:

  • uid
  • name
  • type of the node
  • who created it
  • when the node was created
  • state of the asset

and so on.

parent-child & uses-used relations

Just two entities were needed to represent all the possible assets and relations in the system. This model was still valid after two years.

Object-Relational Mapping Framework

As the whole model was just two entities and Dgraph supports JSON out-of-the-box, and we used Scala, implementing a simple ORM based on the existing Dgraph Java Client was quite an easy task. We made an assumption to fetch the whole folder structure at once, with all the children. This gave us a smooth way to easily navigate over the whole subtree of the folder.

Once we started implementing our ORM, we had also implemented a tiny DSL to construct DQL queries with compile-time type checking support instead of operating just on concatenating Strings.

At the base of the ORM were just three functions:

findBy

def findBy[T: Decoder](
  query: DgraphQuery // <-- our DSL
)(implicit tx: AsyncTransaction): Future[CmsResult[List[T]]] = {
  tx.queryWithVars(query.toQueryString, query.getValuesMap.asJava)
    .toScala
    .map { res =>
      implicit val dgraphResultDecoder: Decoder[DgraphResult[T]] =
         deriveDecoder

      val json = res.getJson.toStringUtf8
      decode[DgraphResult[T]](json)
    }
    .flatMap {
      case Left(error) =>
        CmsResult.failed(error).pure[Future]

      case Right(r) =>
        CmsResult(r.result).pure[Future]
    }
}

buildMutation

def buildMutation[T: Encoder](assets: List[T]): Mutation = {
  val setJson = assets.asJson.noSpaces
  Mutation
    .newBuilder()
    .setSetJson(
      ByteString
        .copyFromUtf8(setJson)
    )
    .build()
}

executeMutation

def executeMutation(
  mu: Mutation
)(implicit tx: AsyncTransaction): Future[List[Uid]] = {
  tx.mutate(mu).toScala map { result =>
    result.getUidsMap.asScala
     .values
     .foldLeft(List[Uid]()) { (acc, value) =>
       acc :+ Uid(value)
     }
  }
}

By combining the above functions with different set of Circe Decoders, we were able to retrieve any set of predicates from the nodes we were interested in. As you probably noticed, all the calls were wrapped with a transaction, which was implicitly passed in.

Summary

To get the final stable version of our CMS, two developers spent around three months on implementing and adjusting the base set of functions, extending the DSL and fixing some race conditions. All these were exposed as a microservice with REST endpoints to handle particular types of assets.

Sure, there were more bugs and issues discovered for the next two years, yet even the top CMS solutions have bugs and you can wait months on fixing them. Here you can fix it in a day or two.

As you probably noticed, the requirements were quite specific and that’s why implementing our own CMS was the best option after all.

Blog Comments powered by Disqus.