Vertical Pod Autoscaling with GCP
Single project experience with Vertical Pod Autoscaler on Google Kubernetes cluster.
mini_malist @ Flickr
At Softwaremill, it is not uncommon that Software Developers manage, for our clients, Kubernetes clusters daily, without the need of hiring a full-time DevOps specialist. There are a lot of developers converting business rules into Scala or Java code daily, alongside having Kubernetes Administrator certifications under the belt. This reduces the costs for our clients significantly, making us better developers with the ability to speak the same language as DevOps when needed.
Of course, if we need to find out what the best solution for problem X is when talking infra, or how to upgrade some parts of the cluster without causing downtime, we have highly skilled DevOps engineers available too, whom we can always ask for advice, leveraging their rich experience gathered over the years — which, as you probably know, no certificate can provide.
Recently, when upgrading the GKE cluster for one of our clients, I noticed that some of our k8s services were not utilizing the resources they allocate, which could increase the overall cost of running the cluster on monthly basis. This is where I found information about VPA — Vertical Pod Autoscaling. Below, I will try to answer common questions about VPA: what it is, what it's not, and why it's good to use it (or better not :) ).
What is Vertical Pod Autoscaling?
According to Google Cloud Kubernetes documentation:
Vertical Pod autoscaling frees you from having to think about what values to specify for a container’s CPU requests and limits and memory requests and limits. The autoscaler can recommend values for CPU and memory requests and limits, or it can automatically update the values.
What it means is that you don’t have to explicitly set the limits
and request
sections for your Pods and VPA will take care of setting the values for you (actually, it can just print out the recommendations or it can set it for you, but more on that later).
This, of course, sounds great! But before you start creating your own VPA resources, I strongly advise you to read on :).
In theory, if you have horizontal autoscaling enabled for your Node Pool and vertical pod autoscaling enabled for your Deployments, you can stop worrying that you run out of resources or pay more than needed for your cluster. Sadly, as always, it depends.
How to enable VPA?
The process of setting up VPA for your k8s Deployments is pretty straightforward. First of all, you need to make sure that you have the Vertical Pod Auto-scaling
option enabled on your k8s cluster. If not, you can always enable it on the running cluster but bear in mind that for the time of upgrade (a couple of minutes), the cluster will not respond to kubectl
commands.
Enabling this option gives us the ability to create resources like VerticalPodAutoscaler
from autoscaling.k8s.io/v1
API.
You can double-check if you have those resources available by executing:
kubectl api-resources | grep autoscaler
horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler
multidimpodautoscalers mpa autoscaling.gke.io true MultidimPodAutoscaler
verticalpodautoscalercheckpoints vpacheckpoint autoscaling.k8s.io true VerticalPodAutoscalerCheckpoint
verticalpodautoscalers vpa autoscaling.k8s.io true VerticalPodAutoscaler
Usage
As mentioned before, the usage is pretty simple. Create a VerticalPodAutoscaler
YAML definition file like you do for all other k8s resources and deploy it to the cluster (with helm
or kubectl
apply ).
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: someContainerName-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: someContainerName
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: someContainerName
minAllowed:
cpu: "250m"
memory: "250Mi"
The resourcePolicy
section is non-mandatory but it's good to set the minimum allowed values for the reasons mentioned below.
The most important setting when you start playing around with the VPA is updateMode
, which you can set to Off
or Auto
.
Setting the update mode to Off
allows you to observe recommendations given by the VPA, without any automatic limits or request upgrades on Pods. With the Auto
flag, the VPA takes care of setting up the values for you.
Pros and cons
I have found using VPA with Off
update mode is a bit useless, especially when running your cluster on GKE, where you can easily see the resource utilization for your Pods or whole Services with a nice graph browsable by day, weeks, and even months.
Let’s move on to the Auto
update mode, which is a bit more interesting.
One of the first things you probably realize when using VPA in Auto
mode is that the new settings for limits and requests are set only by restarting the Pod. This can have consequences in situations like single pod services or actions executed on container restarts.
Assuming that you are not in the aforementioned situation, you are a sane person and have all your services backed by at least two pods running on different nodes :) you are still not yet fully covered.
The biggest con to using VPA is its inability to handle sudden increases in resource usage. For me, and the services we usually deploy to our k8s cluster, this is a huge no-go. The resources reserved by the VPA for pods that get hit at a particular hour by an external cron job running within our cluster makes the whole solution useless. The time it takes for a new pod creation with updated resources is too big and breaks the whole flow for external services utilizing the Pods under VPA.
On top of that, creating your VerticalPodAutoscaler
without resourcePolicy
minAllowed
section causes some of the Pods to never turn into running healthy state as the resources given by the VPA are too small for a Pod to respond with a simple healthcheck
call.
Summary
Summing up, I believe the VPA solution is a great attempt to solve the problem many k8s users have, and possibly VPA has some valid usage scenarios for services without sudden peaks in resources usage. This time, for us, it's just a nice toy to try within Kubernetes.