Security Basics for your first Kubernetes Cluster

So you built a cluster. Maybe you have some apps deployed. You’re sold on this whole Kubernetes thing. Now what do you do?

There are a lot of things you can do next. You could install a bunch of tools to make your life easier (shameless plug for my previous article). You could just start deploying apps and running them in production. Or, you might be thinking to yourself “How on earth am I going to secure this massive complicated thing that I just put out there in the cloud!?!?” That’s a valid question, and I am going to make an attempt to answer it in a set of easy-to-follow practical steps.

Disclaimer: I’m not writing a thorough guide on securing your clusters, and I’m not making any claims that this information is perfect. It’s a starting point, a way to get the brain juices flowing and open the door to a full security plan.

GKE

I started my Kubernetes journey entirely in AWS thinking it was the Bees Knees™. Then I found out about GKE (Google Kubernetes Engine) and promptly changed my mind. Both platforms are awesome and have their drawbacks, but if you’re just starting out, GKE is a great place to do it.

The latest defaults in GKE are pretty good and they’re always being improved, but there are some settings you should choose when setting up your cluster that will give you the ability to do nice security bits later on.

VPC-Native Cluster

The first thing to do, is turn on a VPC-Native network configuration. This is relatively new in GKE, and will be the default soon. It puts your cluster inside a VPC, much like you would do in AWS. It’s only one checkbox away, and totally worth doing:

GKE VPC-Native network configuration — GKE VPC-Native Option

Cluster Access

After you’ve done that, there is a set of checkboxes related to cluster networking that I will go through. The first is here:

As you look at this, you might be thinking “Didn’t this guy just tell us a while back that IP Whitelisting isn’t a form of security, and we should just stop doing it?” (article)r You would be mostly correct about that, and I tend to think that this step is largely unnecessary. However, as part of a larger security posture, this setting can be a great idea. This setting restricts access to your master nodes (and the Kubernetes API) to a specific set of IP addresses. This doesn’t mean that anyone on those IP addresses can magically just access your API because they still have to authenticate with Kubernetes to do anything. What it gives you is peace of mind that if a critical vulnerability in Kubernetes (see CVE-2018–1002105) comes up, you have a little breathing room to fix it. This naturally has usability implications around public CI/CD tools and remote workers, but if all of your expected traffic to your cluster comes from inside a single network, then by all means turn on “Enable master authorized networks” and set it to that IP CIDR block.

Network Policy

The third checkbox we want to consider is Enable network policy:

This box sounds awesome, and it largely is. It enables a network policy implementation called “Calico” that allows you to create Kubernetes network policy. In turn, this provides the ability to heavily restrict what things can talk to other things in your cluster. It doesn’t give a lot of benefit out of the box, but it opens the door to a world of security. If you plan to do advanced network policy, then checking this box in the beginning will save you some hassle.

Disable Legacy Authentication

The last set of boxes control authentication with the Kubernetes API:

GGKE Settings Security Section — Auth — GKE Settings Security Section — Auth

Pretty soon, the default options are going to be sufficient, but until then you need to uncheck Enable Legacy Authentication, Issue Client Certificate, and Enable Legacy Authorization. This disables basic authentication and forces all users to use GCP authentication in order to gain access to the cluster. In addition, Google enables RBAC by default so that we can control who has access to what in the cluster. I will cover this in more detail later.

AWS (kops)

By default, kops creates a cluster that is internal to the VPC with nodes in a private subnet, masters in a public one, an ELB for API access, and ssh only available internally. In general, this is a fantastic network configuration and doesn’t need many changes. Hats off to the kops developers on that one. However, there are a couple of settings that need to be added when creating your cluster to maximize your security.

Cluster Access

First, you can enable a whitelist of IPs that have access to the masters (see the GKE section for my thoughts on this). You can do this in the kops cluster yaml by adding this tidbit and replacing your IP CIDR blocks:

spec:
 kubernetesApiAccess:
 - 203.0.113.0/24

Disable Anonymous Authentication

Another easy change you can make is to turn off anonymous authentication for the kubelet. From Kops Security:

spec:
 kubelet:
 anonymousAuth: false

Network Policy

If you want to enable network policy, you can do so by changing your network overlay. Calico or Canal are both good options that support network policy. I won’t cover it in detail here, but kops has a great bit of documentation on this subject.

Enable RBAC

Next, you want to make sure that RBAC is enabled in your clusters. This is the default in new kops clusters, but it’s worth checking the cluster spec for this:

spec:
 authorization:
 rbac: {}

Enable Audit Logging

And the last setting you want is to turn on audit logging. GKE gives you this by default, but in kops we have to add it ourselves. At Fairwinds, we use the following configuration, which comes from the Kubernetes Docs:

Cluster Security

Once you have the cluster built with all of these options, there’s even more you can do to enhance your security posture.

Use RBAC

It’s enabled now, so use it. The built-in policies can be useful, but you should think about your own as well. In addition, whenever you install a Helm chart, make sure that the very common rbac.enabled: true setting is on. This will create RBAC objects specific to that application. Go look at them and see what they do.

Use RBAC Manager to simplify the creation of RoleBindings and ClusterRoleBindings. This will enable you to create a simple CRD that binds Users/Groups/ServiceAccounts to Roles/ClusterRoles.

And finally, when you get deep in the weeds of RBAC and can’t for the life of you figure out why someone somehow does or doesn’t have access to something, use RBAC Lookup to look it up.

RBAC Demo — Brief Demo of rbac-lookup — https://asciinema.org/a/222327

Use Namespaces Liberally

To quote a co-worker, “Namespaces are cheap.” Use them to separate things like infrastructure tooling and applications. This allows you to restrict access easily using RBAC and to limit the scope of applications. I will also make your network policy creation easier when you decide to do it.

Container Security

Lastly, we need to to secure the workloads that we run in the cluster since these are what will be exposed to our (potentially hostile) users. There are a lot of things that are default or common when running containers in Kubernetes that provide less than ideal security. Here’s a non-exhaustive list of things you can do to get started.

Don’t run multiple processes per container

Docker general best practices suggest a single process per container simply for usability and size reasons, and this is actually a great practice to follow for security as well. It keeps your attack surface small and limits the number of potential vulnerabilities.

Don’t run containers as privileged

Running a container or pod as privileged gives it the ability to make modifications to the host. This is a large security issue that is super easy to mitigate by just not doing it.

Don’t run containers as the root user

By default, containers run as the root user inside the container. This is easy to avoid using the pod specification to set a high-numbered UID.

spec:
 securityContext:
 runAsUser: 10324

Don’t use the default list of capabilities

Docker runs containers with a significant set of Linux capabilities by default, many of which your app might not require. You can see them in the source code. The following config will drop all Linux capabilities by default, allowing you to add only the specific capabilities your app actually needs:

spec:
 containers:
 - name: foo
 securityContext:
 capabilities:
 drop:
 - ALL

Scan your containers for vulnerabilities

Known vulnerabilities account for a large portion of breaches. Use a tool to scan your containers for them and then mitigate them. Anchore, Clair, and Quay are among my favorite tools, but there are others out there.

Look At KUBSEC.IO

This tool will analyze your deployment yaml and tell you what you should change in order to improve the security of those pods. It even gives you a score that you can use to create a minimum standard. The score incorporates all of the best practices that I have outlined above, as well as several more.

Here’s an example of a deployment with a score of 9:apiVersion: apps/v1

Don’t Stop Here

There’s new stuff coming out all the time about Kubernetes and security. Go read it, and try it out. Just recently I read an article on the CNCF blog that had a good overview of security best practices and reminded me of some things that we should be doing better. Security is a complex web of multiple layers that can’t be solved with one tool, and it is never “done,” so please, don’t stop caring about it after you read this article.