3 Kubernetes Guardrails Every Ops Team Needs

In a true DevOps environment, Ops and Dev share responsibility for the Kubernetes environment. Ops ensures the core platform runs smoothly, while Dev is responsible for packaging their app and shipping it into the cluster.

It's not hard for Ops—assuming they're a team of experienced Kubernetes engineers—to build a cluster that's secure, efficient, and reliable. But once you let developers interact with the cluster, things can get hairy quickly. Developers typically want to focus on code and features, and are rewarded for doing so. Infrastructure is often an afterthought—they just want to "get it working."

It's up to the Ops team to put the right Kubernetes guardrails in place to guide their development teams towards best practices and policy compliance. This not only prevents critical issues like security breaches and major outages; it also helps development teams ship faster and more confidently, knowing their changes are being vetted for issues before getting into production.

Let's look at three major guardrails Ops teams should incorporate to help development teams succeed in a DevOps-oriented Kubernetes environment.

RBAC

The first guardrail any DevOps-driven organization should implement is Role-Based Access Control, or RBAC. RBAC determines exactly what actions each individual user or process is allowed to take in your environment.

At minimum, your organization should utilize three unique roles for different personas:

Cluster Admin - this is the highest level of privilege, and is capable of making any change to the cluster. This role should only be available to a few key people, and only used in "break glass" emergencies.
Ops Engineer - this role should have broad permission to modify the cluster with few restrictions. To do their job, Ops Engineers need a level of access that might be dangerous in other hands. However, you may want to cut back on permissions like create pods/exec or get secrets
Developer - this role should have a bare minimum of permissions. Often it is a read-only role, allowing developers to view the logs and status of their workloads. The role can also be bound to particular namespaces, so that one developer can't look at another team's logs.

In addition, you should create an "app deployment" role, which can be used inside of a continuous deployment process. This will need more permission than the developer role (since it will have to make modifications) but can be restricted from doing things like modifying the kube-system namespace.

RBAC is a great start for implementing guardrails. Done right, it can successfully contain the blast radius from some honest mistakes a non-expert (or even a tired expert!) might make. But it's not enough, developers still have free reign in their namespace, and can easily misconfigure their workloads to create gaping holes in security, massive wastes of resources, or a terribly unreliable application.

Policy

It's imperative that operations teams put guardrails in place to ensure all application workloads are properly configured. Kubernetes offers a lot of ways to tweak the security profile, resource usage, and resiliency of your application, and the defaults are often not adequate.

For example, by default, every workload is allowed to run as the root user. In a containerized environment this might seem safe, but there are cases where attackers are able to escape the container and gain access to the host. If the container is running as root, the attacker then has root access to the host node, and can wreak havoc. Ensuring all applications run as non-root is an easy way to shrink your attack surface.

Similarly, Kubernetes will happily allow you to deploy a workload with no memory and CPU settings, or no health probes. But it's imperative to set these options if you want things like zero-downtime deployments and reasonable autoscaling behavior.

At minimum, your organization should enforce the following policies for user-facing applications:

Specify memory and CPU settings (both requests and limits)
Specify liveness and readiness probes
Tighten security:

Disallow running as root, running as privileged, and privilege escalation
Don't add any additional Linux capabilities
Don't set hostIPC, hostPID, or hostNetwork

You might also want to some additional implement policies like:

Specific labeling schemes (e.g. making sure every workload has a costCenterCode)
Dropping all default Linux capabilities
Ensuring all workloads have a Horizontal Pod Autoscaler and a Pod Disruption Budget
Rejecting containers with known CVEs

These policies should be enforced as an Admission Controller. This will scan all modifications to the Kubernetes cluster, and can reject any modifications that would violate a policy. You'll likely also want to have some way for the Ops team to make exceptions in certain cases.

Infrastructure-as-Code Scanning

Having an Admission Controller in place is a great start, but soon you'll find development teams complaining that it's slowing them down. This is the opposite of what we want from good Kubernetes guardrails!

The issue is that developers will only get feedback from the Admission Controller after they merge their changes to the main branch of their Infrastructure-as-Code (IaC) repository. This means that if there's an issue, they'll have to go back in, make more changes, and try again. This loop can be slow and fatiguing.

Instead, the same policies that are being enforced in the Admission Controller should also be enforced on Infrastructure-as-Code as part of a continuous integration (CI) process. This way, developers get feedback whenever they make a pull request, before the changes get merged into the main branch. Developers can quickly fix any issues with their branch and merge confidently, knowing that if the CI pipeline passes, the Admission Controller will accept the deployment.

Fortunately, many open source and commercial Kubernetes policy tools do this well. In particular, Fairwinds open source project Polaris is able to run its policies (both built-in and custom policies) against Infrastructure-as-Code files as well as in an Admission Controller. Some flavors of OPA are able to do this as well.

Moving Safely and Quickly with Kubernetes Guardrails

Proper guardrails are imperative for any organization running Kubernetes. With the right RBAC, Admission Policy, and IaC scanning in place, development teams can ship faster, and Ops teams can be sure their clusters stay secure, efficient, and reliable.

If you're looking to implement Kubernetes guardrails at your organization, check out Fairwinds Insights. It comes with over 100 built-in policies (plus OPA and Polaris for custom policies) which can be enforced throughout the development lifecycle—they can run on IaC, in Admission Control, or scan your live environment on a schedule. Policies can even be customized and scoped to particular clusters, namespaces, labels, and more.

Get Kubernetes security, cost allocation and avoidance, compliance and guardrails in one platform for free with Fairwinds Insights.

Whether you use open source solutions or go with a commercial partner, setting up guardrails for your Kubernetes environment will help your Developers ship quickly and confidently.