It’s a common topic of conversation among leaders of technical organizations...
Should SRE be a separate discipline itself?
Should I hire DevOps people?
Should the rest of my team manage all of infrastructure?
And it makes sense. DevOps is a buzzword that’s been around for many years now. So, it only makes sense that organizations wrestle with notions of whether they're doing it right.
It is my firm belief that great organizations should treat their platforms like any other service—which is to say, platforms should be a well-documented API.
This will look different for different stage companies, but in a lot of ways, the concept should stay the same throughout. If you’re a small organization just getting started, use a well-known platform as a service (PaaS) such as Heroku or one of those run by a cloud provider. Why? Because that’s exactly what a great PaaS is—a well-documented API.
As your organization grows, you may need people to build and maintain your own custom-built PaaS. I still believe in the words of Kelsey Hightower on a recent Twitter post:
At any notable size or scale, an organization is going to want to build their own platform. Rather than having an operations team who's required to keep every service up and running, separate from the development team (the old way to do things and very very difficult to scale), organizations are increasingly adopting service ownership. This notion suggests engineers doing development on a service should be responsible for it all the way through to production, including responding to downtime of that particular service.
There is no reason the platform team should be different from this model.
Let's consider a very simple e-commerce organization which has two services: an online storefront and the cart service. Both are maintained by different teams. At a small organization, it might be that they just use something like Heroku to solve all the infrastructure pieces. But as the organization grows, it must decide if it wants more granular control over the scaling, internal networking etc… And so, it decides to adopt something like Kubernetes. Asking two teams that aren’t experts at infrastructure (let alone Kubernetes) to build and maintain infrastructure would be as much a mistake as asking Kubernetes engineers to maintain a Haskell service.
Instead, it probably makes more sense to hire a few engineers to go and build out a well-documented API of an infrastructure—a customized PaaS. It will take time, and it will require work to maintain. But the infrastructure team shouldn’t be any more responsible for the store-front service than the cart team is. And likewise, the cart team shouldn’t be any more responsible for the infrastructure than the store-front team is. Each has clear expectations of what input should look like, what the output is, and what success looks like.
Stop spending your time worrying about whether you should build an SRE organization, a platform team, a DevOps team, or whether you should ask all your application engineers to build and maintain things. Instead, think of your platform in the same way you’d think of any service in your organization. The platform should be a well-documented API that enables others to pass things to it, provides good error messages when things go wrong, and successfully works through how to improve what's broken.
Great tooling makes all the difference in making this a reality. A tool like Fairwinds Insights can help your software engineers understand what they can deploy into Kubernetes and what appropriate configuration looks like. As a solution, Insights can stop deployments that are running over-permissioned or over-provisioned workloads—and the documentation on how to do things right is built right into the platform.
Build your platform teams the same way you build software teams. Well-documented APIs are the gold standard—for a service, for an application big or small, and for the infrastructure. And if you’re using Kubernetes, leverage Fairwinds Insights to make it easy.