FinOps "Crawl, Walk, Run" Maturity Model Applied to Kubernetes

FinOps has become an increasingly popular goal of many organizations. It helps to unite financial teams and cloud operations teams across organizations to speak the same language, understand cloud costs and how they can be optimized.

The FinOps Foundation’s Maturity Model describes a “Crawl, Walk, Run.” The Foundation says by following this model “organizations [can] start small, and grow in scale, scope and complexity as business value warrants maturing a functional activity.”

The crawl, walk, run approach maps to the Kubernetes Maturity Model Fairwinds put together and that the CNCF leaned on heavily when producing the Cloud Native Maturity Model. As users adopt Kubernetes, they will take a similar crawl, walk, run approach to the technology, people, process and policy. The financial side should also be part of adoption.

Crawl

The maturity level characteristics of the FinOps Foundation crawl phase includes little reporting and tooling, basic KPIs, basic process and policies, and that measurements only provide insights into the benefits of maturing the FinOps function.

So how does this map to Kubernetes? Kubernetes itself can be a great unknown for engineering leaders. While the decision to adopt Kubernetes is strategic itself, the nitty gritty details of how resource allocation is configured can have major consequences. For example, one benefit of Kubernetes is its auto-scaling. But if no resource limits or requests are set, that benefit could turn into a $50k overspend quickly.

In the crawl phase of Kubernetes, it’s less about adoption of a complete FinOps culture and more about organizations attempting to get a handle on what’s actually happening in Kubernetes. Most organizations lack the visibility into what’s happening within their clusters

The most important part of the crawl phase is getting your CPU and memory settings right. Open source tools like Goldilocks help identify a baseline for setting Kubernetes resource requests and limits. This is sufficient in small one to three cluster environments where you most likely have a small number of engineers (one or two) managing Kubernetes. Having a tool like Goldilocks is important for a few reasons:

It helps provide some basic KPIs on resource allocation
It helps to implement some basic process and policies as teams can see what clusters have resource requests and limits configured properly
It provides some measurement into each application’s resource usage and enables application right-sizing

When crawling, it’s unlikely all of this is being communicated upstream to the financial team, but it is fundamental to establish some engineering processes and best practices as you move into the walk phase.

Walk

The FinOps Maturity Model describes the walk phase where the capability is understood and followed within the organization, automation and processes are in place, and difficult edge cases exist and are either not addressed or there are plans to resolve them. KPIs are being more specific to measure success.

It is likely that when you reach the walk phase and are using Kubernetes, you are running multiple clusters and have many engineers working with it. It’s where an open source tool doesn’t necessarily scale.

Kubernetes users need to investigate a platform to help them understand their environment and get alerting on when applications are not correctly configured. This understanding includes resource allocation for each workload or group of workloads–users should be able to allocate and group cost estimates by clusters, namespaces, or labels, making it easier for reports to align to business context. This will enable the engineering leadership to work alongside the financial teams to understand the true cost of Kubernetes usage and where there are idle or overhead costs. In addition, it helps improve KPIs as Kubernetes and individual applications can be prioritized based on their overall usage and estimated potential savings–often there are opportunities to optimize resource usage and reduce costs without impacting application performance.

As you are “walking,” the DevOps and platform engineering teams are looking into the cost problem. As you move to the run phase, this changes.

Run

FinOps says organizations in the run phase are all aligned and have adopted a robust model. There are very high goals/KPIs set on the measurement of success and automation is preferred.

This is where Kubernetes service ownership maps to the FinOps Maturity Model. Service ownership is the ability to enable devs to “code it, ship it, own it.” When reaching the run phase, DevOps and engineering leaders have armed devs with the ability to see for themselves the cost for each workload within a cluster and how to make improvements. This developer enablement is made possible by the adoption of software, like Fairwinds Insights, that automates the scanning of clusters to provide a breakdown of cluster capacity and usage across namespaces, workloads, and labels. It gives information on how much is spent on idle capacity, shared vs. app-specific resources.

Interested in using Fairwinds Insights? It’s available for free! Learn more here.

When running, it’s also possible to provide showback reports of Kubernetes usage costs to finance teams, allocating specific costs to development teams, and tracking savings over time. Advanced concepts like spot instances and batch-oriented tasks can also help you run more cost-effectively.

Decisio Health on Optimizing Kubernetes Resources

Fairwinds Insights provides Kubernetes cost optimization by monitoring resource usage to identify opportunities where cost can be reduced without impacting performance.

Glen Zangirolami, Principal DevOps Architect at Decisio Health said, “The cloud will hammer you to death with bills. A major benefit of Fairwinds Insights is its resource utilization and cost optimization. We are able to use Fairwinds’ resource recommendations to provide details on our utilization and price our solution appropriately.”

“Understanding our cluster size is an important part of pricing our solutions correctly. Using Fairwinds Insights, we were able to rightsize the number of nodes and databases per cluster. That understanding helped us reduce our cost per cluster by 25%. With 25+ clusters in production and growing, that cost saving is significant.”

Decisio is also monitoring its service usage and implementing CPU and memory limitations based on Fairwinds’ recommendations to avoid over allocating or starving resources. “It allows our team to tune application limits and requests as we continue to onboard new hospitals.”

A Final Thought

Understanding cloud spend is something many vendors have focused on. Understanding that spend on Kubernetes is often lacking or simply a bolt-on to what’s already been done. When you are using Kubernetes, getting expert insights from software built for it can make a huge difference in enabling your developers and truly embracing a FinOps model.