Kubernetes Best Practice: How to (Correctly) Set Resource Requests and Limits

One of my biggest pet peeves when managing Kubernetes is when there are workloads with no resource requests and limits. I was so frustrated by this that I created Goldilocks, an open source project, to make the process of setting initial resource requests and limits easier. In this blog, I’ll talk about Kubernetes best practices for correctly setting resource requests and limits.

Kubernetes is a dynamic system that automatically adapts to your workload’s resource utilization. Kubernetes has two levels of scaling. Each individual Kubernetes deployment can be scaled automatically using a Horizontal Pod Autoscaler (HPA), while the cluster at large is scaled using Cluster Autoscaler. HPAs monitor a target metric of individual pods within a deployment (often CPU or memory usage), and they add or remove pods as necessary to keep that metric near a specified target. Cluster Autoscaler, meanwhile, handles scaling of the cluster itself. It watches for pods that cannot be scheduled and adds or removes nodes to the cluster to accommodate those pods.

A key feature of Kubernetes that enables both of these scaling actions is the capability to set specific resource requests and limits on your workloads. By setting sensible limits and requests on how much CPU and memory each pod uses, you can maximize the utilization of your infrastructure while ensuring smooth application performance. To maximize the efficient utilization of your Kubernetes cluster, it is critical to set resource limits and requests correctly. Setting your limits too low on an application will cause problems. For example, if your memory limits are too low, Kubernetes is bound to kill your application for violating its limits. Meanwhile, if you set your limits too high, you’re inherently wasting resources by overallocating, which means you'll end up with a higher bill.

While Kubernetes best practices dictate that you should always set resource limits and requests on your workloads, it is not always easy to know what values to use for each application. As a result, some teams never set requests or limits at all, while others set them too high during initial testing and then never course correct. The key to ensuring scaling actions work properly is to dial in your resource limits and requests on each workload so that workloads run efficiently.

Setting resource limits and requests is key to operating applications on Kubernetes clusters as efficiently and reliably as possible.

How to Set Kubernetes Resources

The open source project, Goldilocks, by Fairwinds helps teams allocate resources to their Kubernetes deployments and get those resource calibrations just right. Goldilocks is a Kubernetes controller that collects data about running pods and provides recommendations on how to set resource requests and limits. It can help organizations understand resource use, resource costs, and best practices around efficiency. Goldilocks employs the Kubernetes Vertical Pod Autoscaler (VPA). It takes into account the historical memory and CPU usage of your workloads, along with the current resource usage of your pods, in order to recommend how to set your resource requests and limits. (While the VPA can actually set limits for you, it is often best to use the VPA engine only to provide recommendations.) Essentially, the tool creates a VPA for each deployment in a namespace and then queries that VPA for information.

To view these recommendations, you would have to use kubectl to query every VPA object, which could quickly become tedious for medium-to-large deployments. That’s where the dashboard comes in. Once your VPAs are in place, recommendations will appear in the Goldilocks dashboard.

The Dashboard presents two types of recommendations depending on the quality of service (QoS) class you desire for your deployments:

Guaranteed, which means the application will be granted higher priority over other workloads in order to guarantee available resources. In this class, you set your resource requests and limits to exactly the same values, which guarantees that the resources requested by the container will be available to it when it gets scheduled. This QoS class generally lends itself well to the most stable Kubernetes clusters.
Burstable, which means the application will be guaranteed a minimum level of resources but will receive more if and when available. Essentially, your resource requests are lower than your limits. The scheduler will use the request to place the pod on a node, but then the pod can use more resources up to the limit before it’s killed or throttled. This QoS class is granted a lower priority when deciding which workloads to remove when resources are lacking.

The dashboard provides recommendations for both the Guaranteed and Burstable QoS classes. In the Guaranteed class, we recommend setting your requests and limits to the VPA “target” field.

Note: a third QoS class, BestEffort, means that no requests or limits are set and that the application will be allocated resources only when all other requests are met. Use of BestEffort is not recommended.

Specializing Instance Groups for Your Cluster

If you are interested in fine-tuning the instances that your workloads run on, you can use different instance group types and node labels to steer workloads onto specific instance types.

Different business systems often have different-sized resource needs, along with specialized hardware requirements (such as GPUs). The concept of node labels in Kubernetes allows you to put labels onto all of your various nodes. Pods, meanwhile, can be configured to use specific “nodeSelectors” set to match specific node labels, which decide which nodes a pod can be scheduled onto. By utilizing instance groups of different instance types with appropriate labeling, you can mix and match the underlying hardware available from your cloud provider of choice with your workloads in Kubernetes.

If you have different-sized workloads with different requirements, it can make sense strategically and economically to place those workloads on different instance types and use labels to steer your workloads onto those different instance types.

Spot instances tie into this idea. Most organizations are familiar with paying for instances on demand or on reserved terms over fixed durations. However, if you have workloads that can be interrupted, you may want to consider using spot instances. These instance types allow you to make use of the cloud provider’s leftover capacity at a significant discount—all at the risk of your instance being terminated when the demand for regular on-demand instances rises.

If the risk of random instance termination is something that some of your business workloads can tolerate, you can use the same concept of node labeling to specifically schedule those workloads onto these types of instance groups and gain substantial savings.

How to Enable Kubernetes Resource Recommendations

Goldilocks is one of the tools Fairwinds Insights deploys to provide workload efficiency and performance optimizations. With Fairwinds Insights, Goldilocks can be deployed across multiple clusters so information is available to teams in a single pane of glass. Fairwinds Insights adds data and recommendations to Goldilocks, including potential cost savings. The dashboard that appears includes a list of namespaces and deployments with average total cost and cost recommendations.

Fairwinds Insights is available to use for free. You can sign up here.

Many organizations set their CPU and memory requests and limits too high, so when they apply the recommendations from Fairwinds Insights, they are able to put more pods on fewer Kubernetes worker nodes. When Cluster Autoscaler is enabled, any extra nodes are removed when they are unused, which saves time and money.

Using software like Fairwinds Insights or open source tools like Goldilocks empowers developers to remove the guesswork by automating the recommendation process for them. In turn, it opens the door for you to increase the efficiency of your cluster and reduce your cloud spend.