Project overview
Reco.se asked for our assistance to enhance their cloud spending management, primarily in response to the rising costs of cloud computing. With the expansion of their project, it became imperative to pinpoint the right metrics and implement measures to control cloud usage effectively. By strategically optimizing cloud costs, we delivered significant and measurable results. Our client experienced improved financial predictability, enhanced resource management, and increased scalability.
Team
- 1 DevOps Engineer
Duration
- 1 month
Team role
- Cloud Architect
Industry
- Review platform (Marketing)
Technology
- Google Cloud Platform
- Kubernetes
- Open Telemetry
- KEDA
Client Profile
Reco.se is a Swedish tech company behind the largest independent review site in Sweden where customers describe their previous experiences with a variety of companies.
Challenge
The primary challenge faced by the client was the escalating cloud computing costs that increased proportionally as their project expanded. Initially designed to be cost-efficient, the infrastructure struggled to keep up with the growing demands without incurring significant expenses.
The client’s goal was to minimise these cloud costs without compromising on performance, necessitating a strategic overhaul of their cloud management practices. The challenges were thus twofold: managing and optimising resource allocation in a dynamically scaling environment, and enhancing the visibility of the current resource allocation.
Solution
To tackle the escalating cloud costs while enhancing performance, the following strategic measures were implemented:
Cost analysis and identification: The first step involved a detailed analysis to identify the most expensive components of the cloud infrastructure. By pinpointing the areas with the highest costs, we could target our efforts more effectively.
Commitment to usage discounts: We introduced committed usage contracts for compute instances, Cloud SQL, and MemoryStore. These commitments allowed us to capitalize on substantial discounts offered by the cloud provider, reducing overall expenditures.
Enhanced observability: To gain deeper insights into system performance and pinpoint inefficiencies, we implemented detailed JVM observability. This was achieved through integrating open-source OpenTelemetry with the existing cloud monitoring tools, providing granular data on resource usage and system behaviour.
Resource allocation optimization: Based on the observability data, we reconfigured the resource allocations to better match the actual usage patterns. This optimization ensured that resources were not being underutilized or over-provisioned, thus cutting unnecessary costs.
Dynamic scaling with KEDA: To handle variable loads efficiently, we introduced horizontal autoscaling using Kubernetes-based Event-driven Autoscaling (KEDA). This solution adjusts resources dynamically based on actual usage metrics and the length of Pub/Sub queues, ensuring that the system scales efficiently during peak loads without incurring extra costs during quieter periods.
Results
The strategic actions implemented to optimize cloud costs brought significant and measurable results. Client gained:
Financial predictability crucial for the client’s budgeting and financial planning: The biggest win was the stabilization of the monthly cloud expenditure. Despite the increasing scale of the project, the cloud costs were successfully contained, ensuring that they did not escalate with the expansion of the workload.
Enhanced resource efficiency, particularly within the Kubernetes environment: By optimizing resource allocation and introducing advanced scaling mechanisms, we were able to accommodate more workloads on fewer Kubernetes nodes. This not only reduced the number of nodes required but also maximized the utility of each node, contributing further to cost reduction.
Scalability and readiness for future growth: the implementation of Kubernetes-based Event-driven Autoscaling (KEDA) allows the infrastructure to dynamically scale in response to real-time demand without manual intervention, ensuring that the system can handle increases in workload without performance degradation or unnecessary expenditure.