Content Management System: Versioning
In my previous post, I made an introduction into why we decided to implement our custom CMS solution. It didn’t explain all the things and variances, but should give you a glimpse of why it was the best choice for us.
In this post, I would like to focus on one of the most important features each CMS should have — versioning system.
Photo by Kelly Sikkema on Unsplash
You shouldn’t mistake Versioning with Version Control. The second is a system to support cooperative work of a few people in collaborative environments, avoiding stomping on each other’s work. CMS should also support Version Control but this can be implemented in various ways; it can support checking out & in to keep people working on their individual copy and then merging the results; or it can allow real-time collaborative work as it happens in Google Docs.
Versioning means keeping track of different versions of content when it’s changed by editors. This brings a few advantages.
Advantages of versioning
Firstly, you can simply roll back your changes. If you made a mistake, just hit Cancel and you can start from scratch. Or you can just revert to the latest saved draft version (depending how the CMS was implemented). It’s like using Undo but on the whole content.
Secondly, you can see changes in time, which can be very important in liability-conscious enterprises. Basically, you can easily identify which version of content was presented to the end user, you identify this by a Version ID or by a timespan when this version was in-use. Check your last bank account statement or insurance policy conditions and you will see the version identifier.
Thirdly, versioning can be used to audit and monitor modifications. You can easily figure out who modified/created a given version, what changes were made in comparison to the previous version, and when those changes were made.
Finally, you can control when a new version is available to be used, which other versions are in use, and so on.
NOTE: I found a very handy article about how to understand Versioning vs Version Control that gave me a better insights into how we should manage Versioning.
All the above features had been implemented in our home-made CMS.
Draft, Submitted, Published, Archived
In our communication preparation process, we had to support the following states of assets:
- draft — a work-in-progress version, only author or editor can modify it, this version cannot be used in other assets,
- submitted — a stable version that was submitted for review via a workflow process, it cannot be used in other assets yet, this will be possible after approving it,
- published — basically, it’s the same version as submitted, but it can be used by users to create a communication,
- archived — a previous version of an asset where a new submitted version already exists, it cannot be used in other assets, but it is still referenced from already published assets and used in communication.
This process is more complicated than what other CMS products can offer. Also this is not the final solution as there are plans to support more sophisticated flows and states.
To support the above states, we had implemented other relations in Dgraph:
- previous version — a pointer to a previous version if it exists,
- next version — a pointer to the next version if it exists.
So when you create a first version of an asset, which is a draft, and then submit it, you will get a node with neither previous nor next relation on it. Yet, when you create another version of an existing asset, then both nodes representing this asset will have next or previous relation:
You can easily imagine versioning as a linked list of versions with different state:
versioning presented as linked list
With the linked list implementation in place, we were able to show the whole history of an asset with changes in time. Also, please remember that with each version, we stored the whole definition of the asset itself, so we could easily compare changes (yet this wasn’t implemented yet).
Static and version ID
Having previous/next relations required us to implement a universal identifier of each asset. An ID that can be used in references to that asset and in our web designers when opening the asset to view or edit it.
We needed a way to group all the versions under one unmodifiable umbrella ID and to group all the versions with the linked list beneath.
We introduced two new terms:
- static ID — an id assigned to the first version of the created asset and it never changes during the whole lifetime of the asset,
- version ID — an unique ID that is used to identify each particular version of the asset in the linked list, you can also used it in Dgraph queries when looking for the exact asset version.
Both these IDs were implemented as UUIDs. In the below diagram, I have used ordinary numbers to better represent the relation.
Static and Version IDs representation
NOTE: As you can notice, there is no next version reference to the newly created draft version — this was on purpose to allow reverting changes in drafts and dropping drafts without messing with previous/next relations of stable versions.
You can probably struggle with an idea how we do identify which version to use when. Frankly saying, it’s very easy — it depends on the context:
- When creating a new version (a new draft), you can start from the latest submitted version only, when the draft already exists and you have created it, exactly this version (a draft version) will be used.
- When using one asset in another asset (a text content in another text content), you can only select a submitted version of the asset. And once the relations uses/used by were set and the outer asset was submitted, the versions were sealed.
Uses & Used By relations between different versions of assets
By implementing Static and Version IDs, with support of uses & used by and previous & next version relations, we were able to implement all the requirements that our CMS had to fulfill.
There is more
Another requirement that was discovered during development process was the ability to use different versions of assets used in a template to assemble a test communication.
As I mentioned in my previous post, each asset is represented by two structures: a node in graph database (a node in Dgraph) and a content definition (either JSON or a binary one stored in Cassandra).
In case of the JSON definition, we had been defining uses & used by relations using just a Static ID. We didn’t put a direct reference to the Version ID of the Static ID — such information was only stored in the graph as presented above. And which version to use dependent on which version of graph of assets we had been evaluating.
Relation in content definition
In our assembling process, the first step was to collect all the versions used in the graph of assets. This is just a simple map (we called it the Reference Map), where keys were Static IDs and values Version IDs. Having the Version IDs, we exactly knew which versions of assets to use. So when the assembler was processing a given definition and detected a reference to another asset, through the Reference Map it was fetching a proper version of the asset from our CMS.
You can easily imagine a situation when the Reference Map is modified on fly by a user to check if a new version of the asset will fit the old shoes, so their can test the communication before even publishing a new version.
Versioning is very hard to understand and it’s even harder to implement. We started with just simple parent/child relations, then we added previous/next relations, and finally implemented uses/used by relations. To be honest, if the order was different, we wouldn't have been able to do it.
Yet, by setting these relations, the Reference Map idea popped up and it happened in the middle of development, no one was expecting to have such a solution when we started our journey. These are the advantages of your home-made CMS you can discover in some time. That’s why I mean it was the best option.