Many organizations recently started using Kubernetes in production and are just beginning to see what the Kubernetes and cloud costs really look like. It is not uncommon for teams to discover that Kubernetes costs more than expected. As more workloads move to production, many orgs find that they need to control those Kubernetes costs, or (at a minimum), understand those costs. There are strategies and tools available that can help organizations understand their Kubernetes costs, and how to implement Kubernetes cost avoidance strategies - implementing FinOps can help with those goals.
FinOps is the practice of trying to identify unit costs related to your cloud spend and pulling together the different silos in your organization, especially in larger organizations, to have an ongoing conversation about and better understanding of cloud costs — and how those costs factor into business decisions. Teams using Kubernetes find this especially challenging. Here we talk about the six core principles of FinOps identified by the FinOps foundation and how they relate when using Kubernetes:
To embrace FinOps, teams must work together. The finance team needs to move at the same speed and granularity that the IT team does. The engineering team must also think about cost in terms of efficiency and adopt cost as a metric they consider. The organization must continuously improve its FinOps practice to increase efficiency and innovation, as well as define cloud and Kubernetes governance and implement controls for cloud usage
When feature and product teams can manage their own cloud usage and measure it against their budget, they become empowered to make better, more cost-efficient decisions. A Kubernetes ownership model helps teams increase their visibility into cloud spend across workloads and improve accountability by tracking at the team level.
There are a few ways teams can govern and control cloud use centrally, using Committed Use Discounts, Reserved Instances, and Volume/Custom Discounts with cloud providers. By using a centralized process for discount buying, the engineering team no longer has to consider rate negotiations. A centralized approach to cloud usage enables granular allocation of costs, so the responsible teams and cost centers can see all costs, whether they are direct or shared.
Creating an environment where reports are timely and easy to access is critical, as it enables teams to make timely use of feedback on spend and efficiency. This can also help service owners evaluate whether resources are over- or under-provisioned. When resources are allocated automatically it drives continuous improvements in cloud usage.
Using analysis on trends and variance helps teams understand why costs increased and what they can do to avoid costs. Internal teams can also use benchmarks to show effective use of best practices and ensure that those practices are being enforced. Using benchmarking from industry peers can also help your organization understand how it is performing compared to others.
6. Leverage the cloud’s variable cost model
There is a lot of tooling that can help you rightsize your instances and services, which will help your organization ensure that it has appropriate resourcing levels. Knowing the right resourcing levels will help you make better decisions because you can compare pricing between different services and resource types based on your actual needs.
Right now, a lot of organizations are looking at what they are spending money on as we enter a potential recession. Many organizations recently shifted to cloud-native architecture and adopted Kubernetes. If you are among them, this is an excellent time to look at your tooling, the costs of your cloud footprint, and adopting a FinOps model for Kubernetes. Being able to understand why things cost what they do, what the actual costs are, and what the unit costs look like will help you make better, more informed decisions about your Kubernetes environment. There are several different metrics that you can look at to help you understand your spend, why costs may have increased, and whether those increases are aligned with your business goals.
For example, at Fairwinds, we look at cloud spend per cluster in our environment. We also look at spend per customer and then we slice the data in a couple of ways. That allows us to see whether our overall bill is going up. If it is, is the increase due to:
A larger feature set in our product?
A larger customer footprint?
An expanded customer base?
An individual customer now has a bigger footprint?
If the overall bill is going up and the other elements are staying flat or going down, that is a signal to pay attention to. Reviewing cloud spend per customer is an ongoing process that provides visibility on cloud costs across the organization.
The difference between cloud spend and Kubernetes spend — and taking a FinOps approach to it — really stems from containers and other types of abstraction. That abstraction removes some of the artifacts that people are familiar with. For example, in the past you could look at a bill and see the cost of an instance, an instance ID, a volume you are familiar with, or look for a tag in your cloud environment.
Kubernetes introduces another layer of abstraction, in part because things in Kubernetes are short-lived. Nodes that are here today are gone tomorrow, and that becomes difficult to track when you are looking at cloud costs and shared resources. That makes it hard to allocate spend to cost per customer, cost per team, or cost for different environments. Those allocations challenges increase complexity in the organization because they inject another abstraction layer between the cloud API and what you are deploying and running.
There are two challenges for FinOps in Kubernetes that can be difficult for teams to tackle. The first one is the way that clusters are separated today. There's a concept called a namespace, which is anisolation construct in your cluster. It allows you to isolate resources, whether they are applications, teams, or different environments. You can think of it as a virtual cluster. That concept is tricky, because it exists across different nodes and pieces you are tracking in your build. The more one to many relationships you have, the more difficult it is to track.
The other challenge is scaling. One of the advantages of Kubernetes is that it is continuously adding more resources as you need them to make the environment either more highly available or more scalable when traffic to it increases. That ability to scale drives up costs, though, because everything in the cloud is metered. When the finance team asks why costs increased, you need an easy way to explain what each application is costing. Having that information ready drives confidence within your organization, which is likely to also drive migration to Kubernetes because the finance team is confident that they understand what the related costs are for the organization.
Finance teams typically do not know how Kubernetes works, what Kubernetes is, or what the costs are. Especially in bigger organizations, old finance models say that to spend money, you make a business case, present it to finance or procurement, get approval, and then you spend it. The FinOps model forces a conversation between the business and engineering, where the finance team explains their constraints and the information needed for them forecast costs and budget requirements. The engineering side of the organization must be able to understand that larger context. Ideally, the engineering team understands and can explain what they can forecast, how they can create guardrails around spending, and the opportunity the cloud model presents in terms of ability to scale the product or grow the customer footprint. This approach encourages partnership across the organization.
How do you talk to your engineers about managing costs? Most engineers just want to deploy things; they want to write code that runs, and they are not concerned about costs. The most important thing for engineers, users, or developers to understand about deploying infrastructure is the impact of what gets deployed. If you are not allocating your resources correctly or overprovisioning, your engineers need to understand that their behavior drives the profitability of the organization. Often, if you're making incorrect cloud commitments it can actually cost you more than using on-demand orpay-as-you-go, which is dangerous in financial terms. You are spending far more than you should by hedging your bets incorrectly. You need to have a cadence of conversations between your engineering and finance teams so that together they can procure potential savings and understand some of the raw costs and the business impacts of those decisions for the organization.
Creating an ongoing process where you review what people have been working on, look at the numbers together, and look at the unit costs helps you ask important questions, such as:
Is this cost reasonable?
Why or why not?
Why are these costs happening?
What can we do to mitigate costs if we think there is a problem?
Is the cost of mitigation worth it?
Are you going to create savings that are equal to or better than the time and money spent to find those savings?
Reviewing those things and thinking about them in terms of long term costs and savings will help you make better Kubernetes cost avoidance decisions. It's an ongoing process that will help you build a FinOps practice that aligns Kubernetes costs to business decisions.
There is native Kubernetes tooling that can help you understand your costs. AWS Cost Explorer is a great tool, and there are others from the different cloud providers. These tools help you understand cloud usage and individual line items that are being consumed on the bill. You can use labeling and tagging to help you understand costs. Setting up namespaces and using them consistently will also help you better understand your costs.
Goldilocks is a tool that ties into the Vertical Pod Autoscaler (VPA), which is a Kubernetes add-on that allows you to automatically vertically size your pods (which means setting your resource requests and limits). An open source tool, Goldilocks layers on top of VPA and suggests changes based on the historical usage of your workloads.
When you look at a single cluster, it is easy to deploy a Helm chart or install and use a few tools, then review a dashboard or two. When you start adopting a FinOps approach across the organization and need to create standards across different environments, it is difficult to do that at scale. Kubernetes policies tend not to be consistent, and you may not understand all your costs across the organization. In that case, it is useful to have a platform that unifies that data and makes it easier to see what is happening in terms of Kubernetes and cloud costs across your organization. Fairwinds Insights uses open source tools, including Goldilocks, to form a platform that helps you deploy tooling across all your clusters, enforce policies automatically, and view that unified data easily for increased control and visibility.
Watch the webinar “Kubernetes cost avoidance: implementing FinOps” with Elisa Hebert, Vice President of Engineering Operations, Andy Suderman, CTO, and John Hashem III, Senior Solutions Architect, to learn more about how to implement FinOps at your organization.