Kubernetes Rightsizing: Save Money and Improve Performance

Kubernetes rightsizing is the process you use to ensure that your Kubernetes cluster has the right amount of resources to run your workloads efficiently. K8s rightsizing includes CPU, memory, and storage, and it’s important to get right. It can be expensive to run Kubernetes, so you need to make sure you are not over-provisioning resources (and wasting money). On the flip side, if your K8s cluster doesn’t have enough resources, your workloads will suffer, leading to longer response times and more errors. It could even result in outages and downtime for your apps and services, which won’t make your end users very happy.

Gartner predicts that, by 2027, more than 70% of enterprises will use industry cloud platforms (ICPs) to accelerate their business initiatives (we typically refer to them as Internal Developer Platforms). In 2023, it was less than 15%, so adoption of cloud native technologies, including containers and Kubernetes, will increase very rapidly over the next few years. The time to understand Kubernetes rightsizing is now, before you’re already over-provisioning and overspending.

How to Get Kubernetes Rightsizing Just Right

To do Kubernetes rightsizing right, you need to understand a few key things:

Workloads: What workloads are you running on Kubernetes? What are their resource requirements? What are their performance and reliability requirements?
Kubernetes autoscaling: Is your cluster scaling up to meet demand? What resources are available to you?
Monitoring and logging: How are you monitoring and logging your Kubernetes clusters and workloads? This data will be essential in helping you identify rightsizing opportunities.

To truly get a good understanding of your workloads, you need the right data. Ideally, collect metrics over an appropriate amount of time in order to make informed decisions. Here’s some of the data you’ll need:

Resource usage metrics

CPU usage
Memory usage
Storage usage
Network usage
Pod restarts

Workload requirements

Minimum resource requirements (CPU, memory, storage)
Maximum resource requirements (CPU, memory, storage)
Startup time
Runtime characteristics
Dependencies

Cluster capacity

Node types and capabilities
Autoscaling minimums and maximums

Traffic patterns

Peak traffic
Average traffic
Traffic spikes
Traffic distribution across nodes and workloads

Additional data

Cost of running the cluster
SLAs for workloads
Risk tolerance

You can collect this information using a variety of tools; here are few that can help you:

Goldilocks (an open source utility that helps you identify a starting point for setting resource requests and limits)
Kubernetes Metrics Server (container resource metrics for Kubernetes metrics API)
Prometheus (open-source monitoring and alerting toolkit)
Grafana (open source analytics and interactive visualization)
Cluster Autoscaler (adapts the number of Kubernetes nodes in the cluster to your requirements automatically)
Kube-State-Metrics (an agent to generate and expose cluster-level metrics)

Identify Rightsizing Opportunities

Cloud spend is a significant cost for many organizations, so it makes sense to ensure that resources are allocated as efficiently as possible. Still, it can be challenging to get all the data needed to make informed decisions. It’s often a real challenge to get visibility into your Kubernetes environment at this level, particularly if you have multiple teams, multiple clusters, and multiple clouds at play. That’s why we created Goldilocks. It provides a dashboard that uses the Kubernetes vertical-pod-autoscaler in recommendation mode to provide a suggestion for resource requests for each of your apps by creating a VPA for each workload in a namespace and then querying them for information.

Resource requests and limits allow you to specify the minimum and maximum amount of resources that your workloads can request, which can help you prevent over-provisioning and ensure that your workloads have the resources they need to perform effectively. You should also use monitoring tools on an ongoing basis to track resource usage over time so you can identify workloads that are over- or under-provisioned. Combine this with cluster autoscaling to automatically scale your Kubernetes cluster up or down based on demand, so you only use resources when you need them.

Getting Started with Kubernetes Rightsizing

Something to note — Goldilocks is generally a good starting point for setting your resource requests and limits, but every environment is different, and you will still have to fine-tune your applications for your individual use cases. Here’s how to get started:

Start small. Don't try to rightsize every app or service in your entire Kubernetes environment at once. Start by using Goldilocks to identify a few over- or under-provisioned workloads. Once you have rightsized these workloads using these recommendations, you can work on fine-tuning them before moving on to the next ones.
Be iterative. Rightsizing is not a one-and-done process. Your workloads and clusters will change, and as they do, you’ll need to look at your rightsizing decisions again. Monitor your resource usage and adjust resource limits and requests as needed.
Test. Make sure that you test your changes thoroughly in a non-production environment before pushing them to production. Load testing in your non-production environment can also help you rightsize before shipping to production.

Automated Policy Management & Kubernetes Rightsizing

Because rightsizing is so important to your Kubernetes infrastructure, it’s considered a best practice to configure your resource requests and limits for all containers. Polaris is an open source policy engine that helps you validate and remediate Kubernetes deployments to ensure that configuration best practices are being followed. Otherwise, it’s just too easy for someone to deploy an app that doesn’t meet best practices for reliability, cost efficiency, and security.

Automated policy management solutions can analyze your workload and cluster data to identify workloads that are over- or under-provisioned. Automating the process of identifying rightsizing opportunities can help you to save time and resources. When workloads are over-provisioned, Kubernetes may scale up more than needed, while under-provisioned workloads will run out of memory or become CPU constrained (resulting in performance degradation, increased errors, reduced throughput, or outages). You can also monitor your resource usage over time and adjust your rightsizing policies as needed. This can help you to ensure that your Kubernetes workloads continue to be rightsized, even as workloads and clusters change.

Save Money & Improve Performance with Rightsizing

Kubernetes compute costs can be significant, especially for organizations deploying multiple clusters to production. Rightsizing can help to reduce these costs by ensuring that each workload has the resources it needs, but not more. Rightsizing can also improve performance by ensuring that each workload has the resources it needs to run efficiently, keeping your apps and services humming along.