What is CAP Theorem?
In short, CAP is a mathematical theorem describing how our application will behave in the event of network partitioning. It is one of the most important laws currently in existence. Through the course of this text, I will share more information on this theorem and why it is important. By the time you’re done reading, you’ll also know why CAP may not be enough for modern-day systems.
Before we start, because CAP is inseparably related with distributed systems, I would like to add a quick word about them.
What are Distributed Systems?
In general, this term can be used to describe all systems that are spread through more than one node connected over a network of any type. Work in such a system is split across all nodes existing inside the system. The nodes coordinate their efforts to be more efficient than a single node environment.
The biggest pro of distributed systems is the ability to scale the system horizontally by adding more and more nodes to the cluster (of course such scaling has its limitations, you can read about them here). Additionally, such systems greatly reduce (or at least they should) problems with a single point of failure, helping to increase resilience and fault tolerance of the system.
Despite such advantages, distributed systems bring us to a whole new level of problems. Especially because they significantly increase the complexity of software - occuring issues are more complex and errors become harder to recreate, we can start to notice that bandwidth of our internet connection is important, network is not as reliable as we expected and that for sure the network is not secure.
Knowing what distributed systems are, we can move further to the CAP theorem which is inextricably related to such systems.
What is CAP Theorem?
It is a theorem created by Eric Brewer, its first version appeared in 1998 and it was later published as CAP principle in 1999. That is why the CAP theorem is sometimes called the Brewer's theorem. The formal proof of the principle was published in 2002 by a team from MIT consisting of Nancy Lynch and Seth Gilbert. Their publication made Brewer’s CAP a fully qualifying theorem.
Originally, it was stated as: In any distributed system with state, we can only provide two of the following three guarantees:
C as Consistency
State should be inline across the system, so every read request to any node in the system should return the same, most recent value or error. In CAP proof, the MIT team used a linearized consistency model, quite a strong and specific one. So talking about consistency in terms of CAP theorem, we should have in mind that we are talking about linearized consistency.
A as Availability
System is available so every request to a non-failing node will receive a non-error response, without any guarantees that the received response is consistent with the state across the system.
P as Partition Tolerance
Network partitions are a fact of life and we have to expect their occurrence in our system. It must continue to work despite any number of communication breakdowns between its nodes. As for now, there are tools and techniques that can be used to recover gracefully from network partition so we do not have to spend much time on this particular part. However, we still have to spend some time over the topic of partition discovery and management inside our system.
Be Aware: Both C and A are totally different concepts in terms of CAP than in terms of ACID.
In 2012, Brewer added some more information regarding the “two out of three” rule, effectively clarifying the whole concept of choosing between A or C in the inescapable event of network partitioning. After this clarification, the theorem can be expressed as: In any distributed data store, we need to choose if we want to sacrifice consistency or availability in the presence of network partitioning.
Now you can notice that all trials to picture CAP as triangle or Venn diagrams are somewhat misleading and not quite in line with what the author wanted to present. I believe that the theorem can be better pictured by some image presenting a confrontation or choice.
Furthermore, to what was stated above, the author added additional clarifications over the topic of the difference in meaning between consistency and availability used in CAP to similar concepts used in ACID.
In general, the whole theorem is built around how our application reacts to network partitioning, which is a thing in almost all modern systems. We have to choose what we want to sacrifice and what to provide in our systems.
Now when we have more insights into CAP, we can talk about why it is important nowadays.
Why is CAP Important?
CAP is applied to systems with state distributed in any manner. What it exactly means is that if your system holds any type of internal state within (and is more complex than REST API with the in-memory store), then no matter if you want it or not, CAP can be applied to your system and by extension, your system is limited by CAP’s regulations.
So no matter if you have plain old monolith or fancy new microservices, if you do not pay enough attention to handling network partitioning, it will bite you hard sooner or later. As stated above, you have to choose between availability and consistency while designing your system. The choice between them can be known to some of you as the CA (AC) axis.
According to CAP, we have two (in theory three) ways of designing our systems:
AP → systems that decided to sacrifice consistency and choose to provide higher availability
CP → systems that decided to sacrifice availability in the name of providing stronger consistency guarantees
The theoretical third way of designing systems is not exactly in play after clarifications from 2012 and not exactly matching the CAP definition itself. To provide both C and A, systems need to be effectively one node, not quite being a distributed system.
CA → systems that decide to provide both consistency and availability while dropping support for handling network partitioning.
You would probably want to know which way will be best suited for your use case. Before something terrible happens to your system running on production and you wake up in the middle of the night with a message that everything is down while they wanted the nine nines availability. On the other hand, your system can have an inconsistent state whose recovery will take a long time, because some of the nodes were down and did not get/provide required data.
To be exact, probably it will not be that bad in any case but your next morning at work after such a breakdown will not be a nice one for sure.
Now before we start digging into why CAP is not enough for modern-day applications, I will add short info about the PACELC theorem proposed by Daniel Abadi from Yale, which was probably the first scientific paper around the topic.
What is the PACELC Theorem?
The theorem was proposed by Abadi in his blog post from 2010 and later clarified in one of his further works from 2012. The main point of PACELC is addressing the concern of choosing between Latency (L) and Consistency (C) when the system is normally up and running and no network partitioning occurs - that is the reason why we have (E) as Else in the theorem name.
The whole theorem can be described in such a way: In case of Network partition (P), you have to choose between Availability (A) or Consistency (C) or else (E), when no partition occurs, you have to choose between Latency (L) and Consistency (C). Et voilà, we made a fully-fledged sentence from a 6-letter acronym.
After pointing out why CAP may not be enough from a scientific perspective, we can start explaining why it may not be enough from a technological and business perspective as well.
Why is CAP Not Enough?
Firstly, even claims about CAP theorem are more complex than they appear. Even if some systems claim to be fully CP, they can still have areas where they are not consistent. On the other hand, AP systems may have parts where they are not available. Most of the systems try to balance between both, even making different areas of the systems supporting different features depending on exact business requirements implied in a particular area.
Moreover, with the growing amounts of data and to meet users’ expectations, companies strive to achieve better performance – capable of handling tens of thousands requests per second, making their software faster – with response time in milliseconds and more available – nine nines are the holy grail and target of today's apps. Companies want systems to meet all of these expectations staying as consistent as possible.
In some cases, it requires moving the borders of what is technically possible. CAP is simply not enough for present-day large scale applications and their requirements. We need to have both high availability and reasonable consistency in our systems just because users expect us to provide them and they will stop using our systems if we do not. This is the reason why concepts like eventual consistency, sharding, and CRDTs are becoming more and more popular over the years.
Despite not being the best suited for most modern-day applications, CAP is still one of the most basic and useful laws applying to distributed systems. What is more, the choice between availability and consistency can be helpful also in less complex applications, by giving some general directions to follow while designing your architecture. I hope that I gave you some new and useful insight into the topic of CAP theorem. Thank you for your time.