Kubernetes Clinic: Learn How to Check Your Kubernetes Cluster Health

At the beginning of a new year, we all want to start doing things to make ourselves healthier, more productive, or happier. Your Kubernetes clusters deserve the same kind of love! This is a great time to determine whether your clusters are the kind of clusters that you want going into the new year. So, how can you check that and make changes that will improve your Kubernetes cluster health?

Open source tooling

There’s some great open source tooling you can use to check the health of your Kubernetes clusters. The Fairwinds Insights platform aggregates a number of really cool open source tools that you can use to see all of this information in a single place. And now Insights has a free tier available for environments up to 20 nodes, two clusters, and one repo, which allows you to run all of these tools across multiple clusters and get results in one centralized dashboard.

To get your cluster as part of the Insights ecosystem so you can manage and check it, you need to install the Insights agent. The free tier of Insights is fully featured so all the reports are available. When you sign up for the free tier, you’ll receive a confirmation email that will include a link to the setup page. There you will be able to create an organization and then add your first cluster by providing a desired cluster name. You will then receive instructions for installing the Insights agent. The installation consists of a Helm command to install the agent, and a values.yaml file that will control the components to install. You can adjust these values to alter the behavior or the settings of the application.

Insights agent & scanning

By default, the free tier automatically installs:

Polaris, an open source policy engine for Kubernetes
Open Policy Agent (OPA), a general-purpose policy engine
Nova, a command-line interface to find outdated or deprecated Helm charts
Pluto, a utility that finds deprecated Kubernetes apiVersions in code repositories and Helm releases

This setup is easy to work with and will give you plenty of information. The values.yaml file that Insights provides includes the token for identifying your account and the enabled open source tools (these are called Reports). Copy the contents of the values.yaml file into your own local directory. Run the provided Helm command; edit the -f flag so that it points to the values file in your local directory. The command declares the version of Insights that you're going to use, and creates an insights-agent namespace in which to install the various components.

Once that’s all set up, you’ll see that the Insights Agent will have already run Pluto, Polaris, OPA, and Nova Reports. You can go back to Insights and you will start seeing data in Action Items.

Insights showing data in Action Items

Right away, you can see that the report gives you a description of what it found and some remediation steps you can take. It provides data you can use to change your deployment manifest. Once you’ve made these changes, the report will clear these action items.

Scan containers and container images for security vulnerabilities

From a security standpoint, it’s important to scan your containers and container images

for security vulnerabilities. It’s easy to do that with the Insights free tier too — you just need to go to the Install Hub and add Trivy, which is an open source vulnerability scanner. Just click on the Available button and configure it.

Trivy available in the Install Hub

Click the “Add Report” button and you will be presented with a new version of the values.yaml file, which you can copy and paste over your local file.

Trivy takes a bit longer to run just because it goes through and scans every image that it finds in the cluster. Once it’s done, take a look at vulnerabilities and you can easily see the top impacted images, a breakdown by severity, and the top impacted images. You can sort by severity and click in to get more information about the vulnerabilities found and much more.

Top impacted images shown in Vulnerabilities tab in Insights

Check the health score of your Kubernetes clusters

Go to Fairwinds Insights and select the Cluster tab. The first thing you see is a health score; this is a ratio of passing to failing action items, weighted more heavily towards critical action items. Health score of "A" in Fairwinds Insights

Action items is a single location where you can view all of the information coming into Insights from Polaris, OPA, Nova, Pluto, Trivy, and any other reports you have installed. If you look at the action items table you can quickly see some interesting summary information.

Polaris

Polaris has about 30 built-in policies for best practices. A couple of security-related policies include:

Do not run your containers as root
Do not allow privilege escalation

Built into the platform, we have information about what it means to run as root. There are links to references for each action item and the platform also addresses remediation. There's quite a lot of detailed information about changing your container build to not run as root and then setting your security context in your pod definition. Insights provides as much context as possible so that you don’t look at the Action Items, see that something is running as root, and then wonder what you can do about it. Insights even provides some examples of the code needed to resolve the issue.

Jira integration

You can also connect Insights to Jira and click create a ticket, then send it straight to the team that's responsible for it to ask them to fix it. You can resolve an action item if it's something that you know is working as designed (even though it got flagged as an action item). And if there's nothing you can do about an action item, you can mark it as resolved. That won’t fix anything, but it will remove the action item. You can also snooze an item so that you don't get reminded about it again for another week if it’s something you are aware of but don’t want to tackle right now.

Review multiple reports

If you are curious about what report an action item came from, you can expose the report column and then filter down to that specific report. Or you can look at it by category.

Action Items filtered by reports in Fairwinds Insights

All action items belong to one of three categories: security, reliability, or

efficiency. If you're curious about one specific aspect of your cluster health, you can dive into those specific areas. So you can see all of the reliability action items, what report identified them, and how severe they are. This can help you prioritize the action items that you want to focus on.

Automation rules

Setting up and running automation rules is one of the more powerful things in Insights. You can create and edit rules, and there are also default rules you can get started with. Essentially, you name the rule (no spaces) and write a description. You can set the context, report, cluster, and action. Then you can save or update the rule. So for example, if you have a very high

security set of namespaces where every vulnerability in this namespace must get fixed right away, you can write an action item that says if Insights identifies a security vulnerability in the resource namespace, you want to increase the severity of the action item, and then create tickets off of that. It's a JavaScript-based engine, so you can write a rule to do whatever you want.

Keeping your Kubernetes cluster healthy

Kubernetes is complex, and staying up to date on all the possible health issues can be difficult. Fairwinds Insights integrates a lot of open source tooling to make it easy to check your cluster health in terms of security, reliability, and efficiency. Using the new free tier of Insights, you can get started quickly so you can keep your Kube clusters healthy in the year ahead.