How to Set the Right Kubernetes Resource Limits

Kubernetes is a dynamic system that automatically adapts to your workload’s resource utilization.

Kubernetes has two levels of scaling:

Horizontal Pod Autoscaler (HPA) - Each individual Kubernetes deployment can be scaled automatically using a Horizontal Pod Autoscaler (HPA). HPAs monitor the resource utilization of individual pods within a deployment, and they add or remove pods as necessary to keep resource utilization within specified targets per pod.
Cluster Autoscaler - handles scaling of the cluster itself by watching the resource utilization of the cluster at large and adds or removes nodes to the cluster automatically. The Cluster Autoscaler will have a hard time doing its job if your resource requests are not set correctly. It relies on the scheduler to know that a pod won’t fit on the current nodes, and it also relies on the resource request to determine whether adding a new node will allow the pod to run.

A key feature of Kubernetes that enables both of these scaling actions is the capability to set specific resource requests and limits on your workloads. By setting sensible Kubernetes requests and limits on how much CPU and memory each pod uses, you ensure smooth application performance and maximize the utilization of your infrastructure.

Setting Kubernetes Resource Limits Correctly

Resource requests and limits for CPU and memory are at the heart of what allows the Kubernetes scheduler to do its job well. If a single pod is allowed to consume all of the node CPU and memory, then other pods will be starved for resources.

It is critical to set Kubernetes resource requests and limits correctly. Setting your limits too low on an application will cause problems. For example, if your memory limits are too low, Kubernetes is bound to kill your application for violating its limits. Meanwhile, if you set your limits too high, you're inherently wasting resources by overallocating, which means you will end up with a higher bill.

Finding Missing Kubernetes Resource Limits

Your first step is to identify any missing Kubernetes resource limits. Fairwinds offers an open source tool Polaris that checks for missing requests and memory limits. Polaris works well when users are managing a few clusters.

If you are responsible for managing multiple clusters across many teams, you might want to look into Fairwinds Insights which can consistently apply Polaris across clusters.

Once you find missing Kubernetes resource limits and requests, these need to be set. But as mentioned earlier, setting them too low or too high can be a problem.

How to Set Kubernetes Resource Limits

It’s easy to say “just set Kubernetes resource limits”, but knowing how to set them so your app doesn’t crash or you waste money is important. The open source tool, Goldilocks, helps you identify a starting point for Kubernetes resource requests and limits.

By using the kubernetes vertical-pod-autoscaler in recommendation mode, you can see a suggestion for resource requests on each of our apps. This tool creates a VPA for each workload in a namespace and then queries them for information.

Once again, if you want to apply this across multiple teams and clusters, using Fairwinds Insights can help you in addition to providing more Kubernetes cost insights. The latest feature, Kubernetes Cost Allocation, allows you to configure a blended price per hour for the nodes powering your cluster. With that information, and tracking actual pod and resource utilization using Prometheus metrics, you can generate a relative cost of workloads. This move helps track your cloud spend, while also determining which features are impacting cloud costs.

The important thing overall is that you rightsize Kubernetes resource limits. Don’t get caught out, set them right!

You can tour Fairwinds Insights via our sandbox environment, read our documentation or use it for free at any time.