Kubernetes Configuration Validation: Learn Why It's not One and Done

If you're using Kubernetes in production, it's critical that you're validating the configuration for each of your workloads. The smallest changes or omissions can lead to downtime, cost overruns, or worse, a security breach. So what do you need to be looking for when it comes to Kubernetes configuration validation?

Specifically, you should be checking for, at minimum:

For security issues, including over permissioned containers and Common Vulnerabilities and Exposures (CVEs)
For efficiency issues, such as not setting CPU limits or requesting excessive amounts of memory
For reliability issues, for example, not setting liveness and readiness probes

But it's not enough to just look for these things when doing code review — you should be checking them automatically at every step in the deployment lifecycle. Without a thorough program for validating Kubernetes configuration, mistakes are bound to fall through the cracks.

To keep your cluster healthy, you'll need to ensure configuration is checked:

During a CI process that runs on every pull request and merge
At admission time, as new resources enter the cluster
As a continuous scan on in-cluster resources

Step 1 - Stay Healthy with CI

If your company has implemented a mature DevOps program, you're storing all your configuration as Infrastructure as Code (IaC). So every change to your infrastructure is (ideally) tracked in Git, and goes through a code review process.

Code review is a great way to catch high-level issues, like a change that won't actually accomplish the business goal, or that will introduce a subtle security issue. But there are lots of common issues that are tedious to look for and easy to miss, like a misconfigured security setting or a missing health probe.

Having automated validation in CI is a great way to supplement code review and ensure that new changes adhere to a consistent level of quality. By automating the most rote tasks on a code reviewer's plate, you give them space to dig into the logic and think deeply about its impact.

Step 2 - A Bouncer for Your Cluster

Once a pull request (PR) has been approved and the tests are passing, you should be safe to deploy. But what if the deployment does something subtly different from what was tested in CI? Or worse, what if someone has circumvented the review process, either by force-pushing to Git, or by interacting directly with your Kubernetes cluster?

Adding an Admission Controller is an important safeguard to keep your cluster healthy. It works like a bouncer at the front door of your cluster — anything that doesn't adhere to your policies won't get in.

Often organizations will configure their Admission Controller to be less strict than CI, so that exceptions to non-critical rules can be made in the review process. Instead, they'll only enforce the most high-severity policies in the Admission Controller.

In a perfect world, issues would always be caught earlier in the development cycle, before they ever get sent to the Kubernetes API. But even the tightest DevOps programs have holes that an Admission Controller can plug.

Step 3 - When Good Workloads Go Bad

So your workload passed the CI process, and the Admission Controller let it through. Your app is running in production and everyone is happy. You should be done at this point, right?

Not quite. There are a few cases where a previously healthy deployment can start to rot:

Workloads can mutate subtly over time. For instance, if you're using a mutable image tag, the docker image used by your workload might get updated without any changes to your IaC repository or a pass through the Admission Controller. Or a Kubernetes controller like VerticalPodAutoscaler might alter its configuration.
New vulnerabilities (CVEs) are announced every day. An image that was deemed healthy before going into the cluster might now have a known exploitable vulnerability.
Your team will adopt new policies as you grow and mature. Those policies need to apply to existing workloads as well as new ones.

Scanning resources that are already in your Kubernetes environment is an important step in ensuring the long-term health of your cluster. It is the closest you'll get to a true sense of your cluster's health, security posture, and policy compliance.

Kubernetes with Confidence

Putting these sorts of guardrails in place might seem like a chore, or a risk to productivity. We all know the pain of having our work rejected by an overzealous CI system or security policy.

But it's much easier to move fast when you know you won't break things. Changing resources in your Kubernetes cluster is scary, and simple mistakes can lead to angry users, lost revenue, and reputational damage. The right guardrails will help you ship confidently and sleep peacefully.

Fairwinds Insights

If you're looking for help implementing policy and best practices throughout the Kubernetes lifecycle, get in touch! Fairwinds Insights can help you run the best open source validation tools for Kubernetes in CI, Admission Control, and Live Scanning.

Fairwinds Insights is available to use for free. You can sign up here.