Is it time to exit the cloud?
In the last few months, there has been some buzz in social media generated by the publication of articles about leaving the cloud and buying their own bare metal servers by David Heinemeier Hansson. He described the motivation, wrote about the costs, and a kind of manifesto about values guiding to the cloud exit.
It’s always a good idea to rethink some common patterns and practices. Is the cloud always a good idea? Should we all think about running our own infrastructure?
Let’s think about it for a while.
What is Cloud, actually?
Let’s start with the important question: what is the Cloud? Is it only - as stated on quite popular t-shirts - only “someone else’s computer”?
source
I disagree.
Sure, if you go deeper and deeper there are bare metal machines underneath. But cloud computing is so much more. Actually, it’s mostly software that helps to manage the compute resources. In many cases, you don't even see the machines - like for serverless applications or Kubernetes with Fargate or Autopilot.
Fun fact - being a sysadmin/devops engineer/platform engineer/you-name-it for almost 30 years I don’t remember when I had to login to a server using ssh. That’s why I’d like to describe a few very important things that the cloud gives you and about which DHH doesn’t write a single word.
IAM
If your organization is bigger than only a few people, managing access to the infrastructure can be complicated. Sooner or later you’ll need a central place to add/remove users and grant proper access rights. By “proper” I mean following the least privilege principle. All big players give you such tools “for free”.
Multi AZ and multi regions
If your business is more than an MVP in a very early stage and you think about it seriously you probably don’t want to be stuck in one Data Center, even if it has multiple power and internet suppliers. There are so many things that can go wrong, starting with a blackout and finishing with a fire disaster which can destroy the DC completely.
Moreover, in many cases two DCs are not enough. If you run some distributed system you probably need the lowest odd number of AZs greater than 1.
Is running the 3 DCs with multiple power supplies, multiple internet connections, and redundant cooling systems an easy job? Absolutely not.
You may say - it’s a vendor's responsibility. Vendor’s? So what about independence, the most important value DHH writes about?
Multi-region is even more complicated. It’s ok if you live in a bubble and your customers are close to you. There are a lot of country-specific services. But if your business is located on the west coast and your customers are located in central Europe (like me) - believe me, this few dozen milliseconds of latency matters.
Network
All cloud vendors provide something similar to AWS VPC. What is a virtual private network? It’s a way to manage the network itself (IPs, routing, firewalling) and network appliances like routers, NAT gateways or load balancers with API calls. It’s easy to automate it and describe it using code using tools like terraform or pulumi.
Are you able to do it with a bunch of servers connected to a switch?
Similar story with load balancers - provisioning and configuring a LB in the cloud is only a few API calls, including L7 routing, SSL termination etc.
Storage
Every server can have storage, it’s not a big deal. But there is a huge difference between disks connected to a particular machine and a network block device which can be attached dynamically to the server. Such storage systems are super expensive and complicated. Sure, there are OSS projects like Ceph which try to provide relatively cheap storage using commonly used hardware but operating a full-scale, highly-available ceph cluster is not easy (been there, and spent multiple nights on it).
If you run a relatively simple, stateless application - it’s easy to move it from one server to another. But the storage complicates things a lot.
Scaling
If your business is mature and you have a stable number of users you don’t need to scale quickly. But for some applications - like e-commerce - quick scaling for a short period of time (think Black Friday) is a must-have. One can implement auto-scaling in the cloud quite easily. If you need to scale your infra out and in - it’s a blocker go use the bare metal only.
Kubernetes
Not everybody needs Kubernetes. But I personally love it. Why? Because it gives an abstraction not only over the hardware (like VM) but also over these resources I described above - network, storage, load balancing, and many others.
Of course, it’s possible to run the Kubernetes cluster on-premise but you’ll need to provide many resources which run in cloud OOTB.
You create a PVC? It’s the operator’s responsibility to provide a volume.
You create a service with a type of LoadBalancer? It's the operator's responsibility to provide one.
It’s your responsibility.
HA for a control plane? It’s on you. Certificates? The same. You’ve been warned.
Costs
7 million dollars over five years sounds great. But wait.. do we only talk about CPU/Memory/Internet/Power?
Where are all the resources I described above? Where is all this software which allows us to manage all the resources? It is doable to create a system which provides management automation, HA etc. But it’s super difficult to estimate such effort and cost. IMVHO it’s not right to compare the cost of servers amortization with cloud bills - there are different things.
DHH writes they will do all the things with the same size of ops team. But are these the same people? Is an AWS cloud architect able to manage the network on Cisco devices? Are they able to operate an HA storage cluster?
More questions than answers.
So - should we exit the cloud?
If you read these articles and wonder if it’s time to exit the cloud for your business - I’d say - think twice. Or more.
Are you able to replace your cloud resources with bare metal servers? I bet in most cases - no. Unless the only thing you use in the cloud are bare servers (like AWS EC2) - but who does it today? Probably nobody.
There are quite important questions to answer, though.
What with independence? Am I already vendor-locked? Am I able to move to another cloud with finite time and resources? Maybe using Kubernetes - which gives you an abstraction layer over the cloud vendor - is the way to go?
And what about the costs? If your business needs a lot of computing power it might be a good idea to use a hybrid solution.
I’ll repeat myself - it’s always a good time to rethink used solutions and look for some optimization.
Wrapping up
Are clouds considered harmful already? I strongly disagree. There is only one answer to the question asked in the title: it depends. Each business is different, and in most cases migrating the legacy infrastructure to the cloud may bring a lot of benefits. But by “migrating” I don’t mean “let’s run it on EC2 instead of bare metal”. Maybe it’s time to use containers? Or Kubernetes? Or maybe part of your application can run as super cheap lambda functions?
If you have such questions - we at SoftwareMill will be happy to find the answers and help you with the transition.
And what about the costs? Comparing the costs of the cloud and the bare metal servers is like comparing apples and oranges. These are different things. The cloud is not just someone else’s computer :).
Reviewed by: Adam Warski, Krzysztof Grajek, Rafał Maciak