Two months ago at Fairwinds(formerly ReactiveOps), we launched Polaris, an open source dashboard for auditing Kubernetes workload configuration. We saw a great response from the community, and have been hard at work keeping up with all the interesting ideas and use cases that have come in since launch.
For example, a common piece of feedback was that while Polaris was great at auditing workloads in a live cluster, many folks wanted to be able to catch issues before they were checked into their infrastructure-as-code repositories. So we modified Polaris to be able to audit local YAML files, and added a couple new flags for easier CI/CD. Now you can run something like this in your CI/CD pipeline:
and the pipeline will fail if Polaris detects any error-level issues, or if your Polaris score drops below 90%.
However, there was one useful feature we found difficult to implement: an easier way to share Polaris reports. To help address this need, we’re excited to announce a new service: Polaris Snapshot. Polaris Snapshot will let you generate a report, hosted at polaris.fairwinds.com, that you can share with your team. We see this as a great way for teams to sync on how to make their Kubernetes workloads more stable, efficient, and secure.
At Fairwinds, we’ve worked with dozens of development teams shipping hundreds of apps into Kubernetes clusters. Typically, we take on work related to the cluster itself(e.g. handling cluster upgrades, maintaining tooling like ingress and cert management, responding to outages) while the development team takes responsibility for everything that requires application context(e.g. writing Deployment configurations, setting up CI/CD). But for a developer who’s not entirely familiar with Kubernetes, it’s easy to neglect some critical pieces of your application’s Kubernetes configuration.
For example, your deployment may seem to work just fine without readiness and liveness probes in place, or without resource requests and limits. But without this configuration in place, it’s hard for Kubernetes to do what it does best - make sure your application stays healthy and scales efficiently. Many of the outages and performance degradations we help teams respond to are caused by problems that could easily be prevented with a small effort to build sturdier app configurations.
Polaris is designed to ease this hand-off between platform engineering teams and app development teams. We specifically targeted issues with app configuration that could lead to problems with scaling, stability, efficiency, and security. With every point you add to your Polaris score, you reduce your chances of experiencing an outage, security breach, or performance degradation.
Why Polaris Snapshot?
Once we finished Polaris, it came time to actually run it in our customer’s clusters. It was easy to set up a temporary dashboard before a meeting, but we didn’t want to leave an unnecessary process running in each customer’s environment, no matter how small and cheap it might be. And the fact that the report would change in response to new deployments could lead to miscommunication.
What we needed was a way to generate a one-off report, a snapshot of the cluster’s state at a particular point in time. This report could then be saved, passed around, and discussed as we set up a plan to prioritize and address the issues Polaris found.
Unfortunately, safely maintaining persistent state can be complicated in Kubernetes, so a feature like this seemed out of scope for the open source project. We tried simply saving the dashboard as a PDF, with every alert expanded. The result was OK, but for some clients led to reports that were dozens of pages long, and quite difficult to navigate.
Finally, we settled on Polaris Snapshot as a way to host Polaris reports. This not only solves our immediate problem - it also furthers Fairwinds’ mission of making it easy for organizations to operate production-grade Kubernetes clusters.
How does it work?
Polaris Snapshot runs at polaris.fairwinds.com(note:you’ll need to enter your email address to generate a report. While we can’t offer you a free lunch, we do promise to respect your inbox). When you arrive, Snapshot will generate a new session for you, which is a long, unique ID. You’ll then see a kubectl command to run, something like:
This command will download Polaris, run it once, send the results back to polaris.fairwinds.com, and then uninstall itself. We recommend inspecting the YAML file before passing it to kubectl apply - you’ll see that Polaris only requires minimal RBAC permissions, and runs in a temporary, unique namespace.
Once the audit is finished and Polaris has uninstalled itself from your cluster, the results of your report will automatically appear at https://polaris.fairwinds.com/session/YOUR_SESSION_ID. You can share this link with your teammates to help coordinate and prioritize remediation efforts. The URL will stay active for 90 days, after which we’ll completely delete your data. You can also manually delete the data at any time.
We’ve got big plans for Polaris in the future. We’ve just published the Q3 Roadmap, which includes a few exciting new features:
We’re going to start checking more types of resources than just Deployments - we’ll support StatefulSets, Jobs, CronJobs, and DaemonSets.
We’re going to add support for exemptions, which will allow you to specify that, for example, one of your deployments really does need to run as root. If you give it an exemption, this will no longer affect your Polaris score.
We’re going to investigate ways to let you create your own custom Polaris checks
We’re also quite interested in expanding Polaris Snapshot to include features that are difficult to implement as part of an open source project - things like saving report history, multi-cluster management, and email/Slack alerts for degradations.
If you’re interested in Polaris and would like to help us plot the course forward, reach out to firstname.lastname@example.org or get in touch on GitHub!