How to Kubernetes: Use Conftest to Audit Infrastructure-As-Code

At Fairwinds, we leverage OPA (Open Policy Agent) in our open source product Polaris, as well as in our SaaS product, Fairwinds Insights. As our dev team implemented these features, it got me thinking about other ways we could leverage OPA at Fairwinds to enhance the reliability and security of our Kubernetes environments.

At the same time, I was trying to come up with a way for us to better audit our many infrastructure-as-code repositories. For each client we have, there’s a repo, and everything we do for that client goes into that repo. As you can imagine, standards across customers gradually slip out of date, and we need to know when they do. Currently we have an internal Python-based solution to do this, but it’s become a bit unwieldy and dated. The plan was to rewrite it, but EJ (our former CTO), had a better idea: “Why don’t we use OPA?” My response, of course, was to facepalm. Why didn’t I think of that? So I went looking around the OPA repositories, dusted off the little bit of knowledge I had of rego, and put together a proof of concept. I needed a way to run and test policies, so I updated my version of conftest which, it turns out, is designed to do exactly what I needed to do: audit configuration files.

Writing the Policy

The next bit, writing the policy, is the hard part. As someone who regularly develops in languages such as Python and Go, this was a lot harder than I expected. Rego isn’t a programming language, it’s a query language, which means the way I need to think about all this isn’t quite the way I normally think. Once you get into the groove of it, it’s extremely powerful, but it took me quite a while. I decided to start out by checking which version of a Terraform module we’re using and compare that to a list of approved modules. There’s good examples of this out in the wild, so I was able to source those to get moving.

The most basic policy I could write was something like this:

package terraform

import data

violation[{"msg": msg}] {
    m := input.module[name]
    source := m.source
    not source == "git::https://github.com/FairwindsOps/terraform-vpc.git?ref=v5.0.1"
    msg = sprintf("%s module version %s is out of date", [name, source])
}

The first line, package terraform, just allows me to divide up the policy into multiple parts. Eventually, I’m going to end up with a bunch of different modules, such as terraform,kops,and kubernetes.When using this policy with conftest, we will target the terraform module using the --namespace=terraform flag.

The next part of the policy reads in all modules in the terraform data passed to it, and compares each one to a static string. If there’s a match, it returns the msg as a violation.

This policy is functional, but it doesn’t quite achieve the results we want. We need to add the ability to check different module names against different strings. This requires a map variable that we can use to specify multiple module names, we’ll call it module_allowlist. This gives us the following:

package terraform

import data

module_allowlist = {
    "vpc": "git::https://github.com/FairwindsOps/terraform-vpc.git?ref=v5.0.1",
    "bastion": "git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.6.0",
}

violation[{"msg": msg}] {
    m := input.module[name]
    source = m.source
    not source == module_allowlist[name]
    msg = sprintf("%s module version %s is out of date", [name, source])
}

Here we are comparing the actual module source to the corresponding source in the module_allowlist map. This is better, but we have another problem: What about modules with multiple allowed versions? The bastion module above is a great example; we expect different versions of bastion depending on which cloud provider you’re using. Let’s change the map format to module_name: [list of possible versions].The new policy looks like this:

package terraform

import data

module_allowlist = {
    "vpc": ["git::https://github.com/FairwindsOps/terraform-vpc.git?ref=v5.0.1"],
    "bastion": [
        "git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.6.0",
        "git::https://github.com/fairwindsops/terraform-bastion.git//gcp?ref=gcp-v0.1.1",
    ],
}

violation[{
    "msg": msg,
}] {
    m := input.module[name]
    source = m.source
    not contains(module_allowlist[name], source)
    msg = sprintf("%s module version %s is out of date", [name, source])
}

contains(sources, elem) {
    source := sources[_]
    source == elem
}

Now that we have a list of possible versions, we introduce a function contains to check and see if our source module is in the list of allowed sources for that module. This means that by populating the map with more information, we can check for all sorts of module versions, not just one.

The last thing we want out of this policy is to return more data when outputting the results as JSON. Our goal is to be able to store and present or analyze this data, so it helps to add more details. Luckily, the mechanism is already there, we just need to utilize it.

The final product is this file, called terraform.reg:

package terraform

import data

module_allowlist = {
        "vpc": ["git::https://github.com/FairwindsOps/terraform-vpc.git?ref=v5.0.1"],
        "bastion": [
                "git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.6.0",
                "git::https://github.com/fairwindsops/terraform-bastion.git//gcp?ref=gcp-v0.1.1",
        ],
}

violation[{
        "msg": msg,
        "family": "terraform",
        "check": "module version",
        "module": name,
}] {
        m := input.module[name]
        source = m.source
        not contains(module_allowlist[name], source)
        msg = sprintf("%s module version %s is out of date", [name, source])
}

contains(sources, elem) {
        source := sources[_]
        source == elem
}

In the definition of the violation, we have added several pieces of data:

"msg" - The message I am going to return in the variable msg. This was already in place
"family" - A string indicating where the data came from, in this case terraform
"check" - I want to be able to tell what I am checking. Here, it’s "module version"
"module" - This will be set to the name of the module that triggered the violation, via the variable name

The end result is that we can now run:

conftest test <path-to-terraform-file(s)> --policy terraform.rego -ojson --namespace  terraform

and get output like this:

[
  {
    "filename": "bastion.tf",
    "namespace": "terraform",
    "successes": 0,
    "failures": [
      {
        "msg": "bastion module version git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.5.0 is out of date",
        "metadata": {
          "check": "module version",
          "family": "terraform",
          "module": "bastion"
        }
      }
    ]
  }
]

Or, if you prefer to view the results on the CLI, drop the -o json flag:

FAIL - bastion.tf - terraform - bastion module version git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.5.0 is out of date

Adding Tests

rego has another really awesome capability that I wanted to leverage: Unit Testing. This is really powerful for writing policy, as well as for maintaining it in the long term. In order to try out this functionality, I wrote a test for the terraform.rego policy from the last section. Here’s my new terraform_test.rego file:

package terraform

empty(value) {
    count(value) == 0
}

no_violations {
    empty(violation)
}

test_module_source {
    violation[{"check": "module version", "family": "terraform", "module": "bastion", "msg": "bastion module version git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.0.0 is out of date"}] with input as {"module": {"bastion": {"source": "git::https://github.com/fairwindsops/terraform-bastion.git//aws?ref=aws-v0.0.0"}}}
}

Essentially, I expect there to be a violation if I pass in a module called bastion that has a version that is not in my list. This test can be run using the command conftest verify.

Conclusion

Now I have the ability to write an entire suite of audit policy, run it against all of our infrastructure-as-code, and generate structured JSON data that will allow me to analyze the results and report the findings. Even better, since Fairwinds Insights supports OPA, I can plug these policies in and have Insights collect and manage that data for me. This will likely become an integral part of our workflow, enabling our SREs to better manage many clients and gather helpful information along the way.